Skip to content

Latest commit

 

History

History
260 lines (202 loc) · 25.5 KB

File metadata and controls

260 lines (202 loc) · 25.5 KB

Linter Architecture Design

Status: Active Owner: flowscope-core Last updated: 2026-02-21

Context

FlowScope ships 72 lint rules across 9 families (AL, AM, CP, CV, JJ, LT, RF, ST, TQ), each implemented in a dedicated one-rule-per-file module under linter/rules/.

The linter started with a handful of core AST rules plus a monolithic parity.rs that used regex/heuristic matching for SQLFluff compatibility. That monolith has been fully decommissioned — all rules are now in dedicated modules using AST-driven or token-stream-driven implementations.

Goals

  • Robust correctness across dialects and real-world SQL.
  • Sound architecture with explicit semantics and low false positives.
  • Maintainable implementation with clear rule ownership and minimal coupling.
  • Scalable rule engine that can grow without a monolith.
  • Deterministic outputs and stable spans suitable for editor and CI usage.

Non-Goals

  • One-shot rewrite of all existing rules.
  • "AST-only" implementation of purely lexical formatting rules.
  • Perfect SQLFluff behavior clone across every dialect from day one.

Architecture Principles

  1. AST-first semantics
  • Semantic rules must be driven by parsed AST plus scope/resolution context, not regex.
  • Examples: aliasing semantics, reference qualification, join logic, set operation checks.
  1. Token-aware style
  • Formatting and trivia rules must use token stream data, not AST-only approximations.
  • Examples: whitespace, newlines, comments, casing style, quoting style, Jinja padding.
  1. Parse once, tokenize once
  • Build a single lint document model per SQL input and reuse it across all rules.
  • Avoid repeated parsing and repeated ad hoc string scans in each rule.
  1. Stable rule contract
  • Each rule gets structured input from the engine, not direct access to ad hoc helpers.
  • Rule output must include deterministic code, message, severity, statement index, and span.
  1. Dialect-explicit behavior
  • Rule decisions must be dialect-aware and must not silently assume generic SQL semantics.
  • When parser fallback is used, confidence should degrade explicitly.
  1. Deterministic and testable
  • Same input and config must always produce the same ordered issue set.
  • Rules must be independently testable with focused fixtures.
  1. Regex is migration glue, not architecture
  • Existing regex heuristics can remain temporarily for parity continuity.
  • New semantic rules must not be implemented with regex.

Key Design Decisions

Decision 1: Introduce a LintDocument model

The linter engine should construct a normalized input model once:

  • sql (full source text)
  • dialect and parser/fallback metadata
  • parsed statements with statement ranges
  • token stream with token spans and token kinds
  • optional scope/resolution metadata for semantic rules

This becomes the only rule input surface.

Decision 2: Split rules into 3 engines

  1. Semantic engine
  • Input: AST + scope/resolution context.
  • Handles semantic correctness and structural SQL logic.
  1. Lexical engine
  • Input: token stream + token spans.
  • Handles formatting/casing/quoting/comment-aware style rules.
  1. Document engine
  • Input: whole file/document metadata.
  • Handles file-level checks (EOF newline, leading blank lines, batch separators).

Decision 3: Replace parity monolith with one-rule-per-file modules

  • Move from parity.rs monolith to rules/<code>.rs modules.
  • Keep shared traversal and token utilities in common helpers.
  • Preserve existing lint codes for API stability.

Decision 4: Standardize span generation

  • Primary span source: parser or tokenizer spans.
  • Secondary span source: scoped fallback search only when necessary.
  • No free-form "best guess" spans without explicit fallback path.

Decision 5: Add rule metadata and confidence

Each issue should carry internal provenance metadata:

  • engine type (semantic, lexical, document)
  • confidence (high, medium, low)
  • fallback source (if parser fallback or heuristic logic was used)

This supports telemetry, triage, and quality gates.

Decision 6: Define fixability as a rule capability

Rule metadata should include whether a deterministic fix is supported.

  • No inferred fix logic from message text.
  • Fix support should be explicit and tested per rule.

Execution Pipeline

  1. Parse SQL into statements with selected dialect.
  2. Tokenize full source with token spans.
  3. Build LintDocument with statement ranges and shared metadata.
  4. Optionally build scope/resolution context once for semantic rules.
  5. Execute semantic, lexical, and document engines.
  6. Normalize, sort, and deduplicate issues.
  7. Emit final issues with deterministic ordering and stable spans.

Migration Plan

Phase 0: Foundation [COMPLETE]

  • LintDocument model, tokenization pass, and document-level lint execution path are live.
  • Token stream provider propagated through rule context (parse once, tokenize once).

Phase 1: High-risk semantic migrations [COMPLETE]

All semantic-heavy rules migrated to AST-driven implementations:

  • references (RF_001RF_006)
  • structure checks (ST_001ST_012)
  • ambiguous join/reference rules (AM_001AM_009)
  • convention rules (CV_001CV_012)
  • aliasing rules (AL_001AL_009)

Phase 2: Lexical/style migrations [COMPLETE]

All style-oriented checks migrated to dedicated modules:

  • capitalization (CP_001CP_005) — tokenizer-driven
  • layout (LT_001LT_015) — tokenizer/line-aware checks
  • jinja padding (JJ_001) — delimiter scanning
  • TSQL checks (TQ_001TQ_003) — AST/token-driven

Remaining work: SQLFluff configuration-depth parity gaps for some CP/LT/JJ rules.

Phase 3: Decommission parity monolith [COMPLETE]

  • parity.rs retired and deleted.
  • All 72 rules live in one-rule-per-file modules under linter/rules/.

Progress Snapshot

  • Phase 0 foundation shipped: LintDocument model, tokenization pass, and document-level lint execution path are live.
  • Engine split is active in linter orchestration: semantic + lexical + document passes run with deterministic sort/dedupe.
  • Issue provenance metadata is implemented (lint_engine, lint_confidence, lint_fallback_source).
  • Phase 1 AST migrations landed for: AM_001-AM_009, CV_001-CV_012, RF_001-RF_006, ST_001-ST_012, AL_001-AL_009.
  • LINT_AM_009 now follows SQLFluff AM09 semantics via AST query-clause analysis, flagging LIMIT/OFFSET usage without ORDER BY across top-level and nested SELECTs.
  • LINT_AM_004 now follows SQLFluff AM04 semantics via AST output-width analysis, flagging queries whose result column count is unknown due to unresolved wildcard expansion (*/alias.*) across CTE/subquery/set-operation scopes, and now resolves wildcard width through declared CTE column lists, table-factor alias column lists (AS alias(col1, ...)), and aliased nested-join factors (including USING(...) width deduction plus NATURAL JOIN overlap deduction when both sides expose deterministic output column names).
  • LINT_AM_002 now follows SQLFluff AM02 core semantics by flagging bare UNION (without explicit ALL/DISTINCT), with CLI fixer behavior inserting explicit DISTINCT through AST set-operation quantifier rewrites (text-regex path removed), and dialect-scoped execution aligned to SQLFluff-supported dialects available in FlowScope.
  • LINT_CV_002 now follows SQLFluff CV02 semantics and fixer behavior by flagging IFNULL/NVL function usage and rewriting to COALESCE.
  • LINT_CV_005 now follows SQLFluff CV05 semantics and fixer behavior by flagging = NULL/<> NULL comparisons and rewriting to IS [NOT] NULL.
  • LINT_CV_008 fixer parity is now AST-driven across both simple and chained/nested RIGHT JOIN patterns, rewriting them to LEFT JOIN form by swapping join operands and normalizing join operators.
  • LINT_ST_004 now follows SQLFluff ST04 semantics via AST CASE analysis, flagging flattenable nested CASE expressions in ELSE clauses (instead of depth-based heuristics); fixer parity now flattens eligible nested ELSE CASE branches into a single CASE.
  • LINT_ST_007 now includes SQLFluff ST07 fixer parity via AST join-constraint rewrites, converting JOIN ... USING (...) to explicit ON predicates (including multi-column USING lists).
  • LINT_ST_009 now includes SQLFluff ST09 fixer parity via AST expression rewrites, swapping reversed qualified equality sides in JOIN ... ON predicates.
  • LINT_ST_006 now follows SQLFluff ST06 detection semantics via AST SELECT projection analysis (simple targets after leading complex expressions) and includes fixer parity via AST reordering.
  • LINT_ST_002 now follows SQLFluff ST02 detection semantics via AST CASE analysis (repeated equality checks on a common operand) and includes fixer parity via AST CASE rewrites.
  • LINT_ST_008 now follows SQLFluff ST08 detection semantics via AST SELECT analysis for DISTINCT(<expr>) and includes fixer parity via AST SELECT rewrite to SELECT DISTINCT <expr>.
  • LINT_ST_010 now aligns closer to SQLFluff ST10 by covering equivalent-expression predicate comparisons across =/!=/</>/<=/>= (e.g. x = x, x < x) with operator-side guardrails (including equivalent concat/arithmetic expression detection such as 'A'||'B' = 'A'||'B' while deferring nested comparison-expression operands), preserving SQLFluff-style literal handling (1=1/1=0 allowed and non-equality literal-vs-literal comparisons deferred), traversing SELECT/UPDATE/DELETE/MERGE predicate contexts, and reporting per-occurrence violations rather than collapsing to one statement-level hit.
  • LINT_ST_011 now aligns closer to SQLFluff ST11 by scoping candidate checks to explicit OUTER joins, tracking only joined relations (not the base FROM source), deferring on unqualified references (RF02-style), accounting for references in other JOIN ON clauses, DISTINCT ON (...), query-level ORDER BY, CLUSTER BY/DISTRIBUTE BY, LATERAL VIEW, CONNECT BY, named WINDOW clause expressions, and later JOIN relation expressions (e.g. UNNEST(g.nested_array)), evaluating multi-root FROM clauses, treating both qualified wildcards (alias.*) and unqualified wildcard projections (*) as table references (including Snowflake qualified wildcard EXCLUDE forms), and normalizing quoted joined-source names across MySQL backticks and MSSQL brackets.
  • LINT_AL_009 now follows SQLFluff AL09 core detection semantics via AST projection analysis for identifier/qualified-identifier self-alias patterns (col AS col), with quote-aware case matching and alias_case_check configuration support including mode-accurate quoted_cs_naked_upper / quoted_cs_naked_lower behavior.
  • LINT_AL_001 now uses AST-driven table-factor alias traversal with token-aware AS detection, replacing regex-based matching, and now includes SQLFluff AL01 parity for MERGE target/source aliases.
  • LINT_AL_002 now uses AST-driven SELECT projection alias traversal with token-aware AS detection, replacing regex-based clause extraction, and now excludes TSQL assignment-style projection aliases (SELECT alias = expr) from AL02 violations.
  • LINT_AL_004 now also checks implicit table-name aliases (no explicit AS) plus parent-scope collisions across both nested FROM/JOIN subqueries (excluding wrapper aliases) and expression subqueries (WHERE/IN/EXISTS), and supports quote-aware alias_case_check configuration with mode-accurate quoted_cs_naked_upper / quoted_cs_naked_lower behavior.
  • LINT_AL_008 now checks duplicate projected output names from both explicit aliases and unaliased column references (e.g., foo, schema.foo) in SELECT clauses, with quote-aware alias_case_check configuration support and mode-accurate quoted_cs_naked_upper / quoted_cs_naked_lower behavior.
  • lint.ruleConfigs now supports per-rule configuration objects keyed by canonical/shorthand/dotted rule references; LINT_AL_001 and LINT_AL_002 use this for SQLFluff-style aliasing=explicit|implicit.
  • LINT_AL_006 now runs as a dedicated AST rule via table-factor alias traversal and supports min_alias_length / max_alias_length via lint.ruleConfigs (default behavior now aligns with SQLFluff by leaving max_alias_length unset unless configured).
  • LINT_AL_003 now supports allow_scalar via lint.ruleConfigs with SQLFluff-aligned default behavior (allow_scalar=true).
  • LINT_AL_007 now runs as a dedicated AST rule over base-table factors in FROM/JOIN, flagging unnecessary aliases in both single-source and multi-source scopes while allowing aliases for repeated self-join table references.
  • LINT_RF_004/LINT_RF_005/LINT_RF_006 are now split out of parity.rs into dedicated core modules (rf_004.rs-rf_006.rs); all three now use AST-driven traversal (RF04 identifier/alias analysis with identifier-policy/ignore config support, RF05 identifier/special-char analysis with identifier-policy and allowed-character config support, RF06 identifier quoting analysis with identifier-policy/keyword-preference/ignore config support).
  • LINT_ST_012 and LINT_TQ_001-LINT_TQ_003 are now split out of parity.rs into dedicated core modules (st_012.rs, tq_001.rs-tq_003.rs); LINT_TQ_001/LINT_TQ_002 are AST-driven (CreateProcedure name/body analysis), and LINT_ST_012/LINT_TQ_003 now use token-driven sequencing checks.
  • LINT_CV_001, LINT_CV_007, and LINT_CV_009-LINT_CV_011 are now split out of parity.rs into dedicated core modules (cv_001.rs, cv_007.rs, cv_009.rs-cv_011.rs); LINT_CV_007, LINT_CV_009, LINT_CV_010, and LINT_CV_011 are now AST-driven, and LINT_CV_001 now uses token-aware operator scanning (plus preferred_not_equal_style config support) instead of regex.
  • LINT_CV_004 now supports SQLFluff-style COUNT preference knobs (prefer_count_1 / prefer_count_0) via lint.ruleConfigs while keeping AST expression traversal for detection; default fixer behavior now rewrites both COUNT(1) and COUNT(0) to COUNT(*).
  • LINT_CV_006 now supports multiline_newline / require_final_semicolon via lint.ruleConfigs while keeping statement-boundary aware terminator checks.
  • Lint execution now propagates the single document token stream through rule context (parse once, tokenize once), and LINT_CV_006 consumes that shared stream before fallback tokenization.
  • MSSQL statement parsing now splits batches on GO separators before best-effort parsing, enabling LINT_CV_006 final-semicolon checks to run on post-GO statements without parser dropouts.
  • LINT_CV_009 now supports configurable blocked_words / blocked_regex via lint.ruleConfigs (AST traversal scope unchanged).
  • LINT_CV_010 now supports preferred_quoted_literal_style via lint.ruleConfigs and uses mixed-style (single + double) detection for consistent mode (current behavior remains narrower than full SQLFluff literal semantics).
  • LINT_CV_011 now supports preferred_type_casting_style via lint.ruleConfigs (including consistent/shorthand/cast/convert preferences).
  • LINT_LT_005 now supports max_line_length, ignore_comment_lines, and ignore_comment_clauses via lint.ruleConfigs.
  • LINT_LT_009 now supports wildcard_policy (single/multiple) via lint.ruleConfigs.
  • LINT_LT_011 now supports line_position (alone:strict/leading/trailing) via lint.ruleConfigs.
  • LINT_LT_015 now supports maximum_empty_lines_inside_statements / maximum_empty_lines_between_statements via lint.ruleConfigs.
  • LINT_LT_003 now supports operator line-placement configuration via lint.ruleConfigs (line_position=leading|trailing, plus legacy SQLFluff operator_new_lines=after|before mapping).
  • LINT_LT_004 now supports comma line-placement configuration via lint.ruleConfigs (line_position=trailing|leading, plus legacy SQLFluff comma_style mapping).
  • LINT_ST_005 now supports forbid_subquery_in (both/join/from) via lint.ruleConfigs, with SQLFluff-aligned default behavior set to join, and now exempts correlated JOIN-derived subqueries that reference outer query sources (SQLFluff ST05 parity for correlated cases).
  • CLI lint mode now supports SQLFluff-style config/template parity plumbing: --rule-configs JSON for per-rule options, explicit --template passthrough in lint mode, and Jinja fallback retry for parse-erroring templated SQL (enabling config-aware fixture replay parity checks for AL05/ST05/ST11).
  • LINT_ST_009 now supports preferred_first_table_in_join_clause (earlier/later) via lint.ruleConfigs.
  • LINT_RF_001 now supports force_enable via lint.ruleConfigs.
  • LINT_RF_002 now supports force_enable via lint.ruleConfigs, aligns closer to SQLFluff projection-alias semantics (self-alias projections are flagged while later references to earlier aliases remain allowed), and avoids false positives on common datepart function-argument keywords (e.g., timestamp_trunc(..., month), datediff(year, ...)).
  • LINT_RF_003 now supports single_table_references (consistent/qualified/unqualified) and force_enable via lint.ruleConfigs, and treats qualified wildcards (alias.*) as qualified references for mixed-style detection.
  • LINT_RF_006 now supports prefer_quoted_identifiers / case_sensitive via lint.ruleConfigs.
  • LINT_AL_007 now supports force_enable via lint.ruleConfigs and is disabled by default to match SQLFluff behavior (rule logic runs when force_enable=true).
  • LINT_AL_005 now supports alias_case_check (including SQLFluff-style casefolding modes, with mode-accurate quoted_cs_naked_upper/quoted_cs_naked_lower normalization) via lint.ruleConfigs, includes dialect-aware quoted/unquoted alias normalization in default (dialect) mode (e.g., Postgres/Redshift lower-folding vs Snowflake upper-folding, plus case-insensitive quoted identifiers for dialects such as DuckDB/Hive/SQLite), tracks alias usage across additional AST clauses (QUALIFY, named WINDOW, DISTINCT ON, PREWHERE, CLUSTER BY/DISTRIBUTE BY/SORT BY, LATERAL VIEW, CONNECT BY) plus join relation table-factor expressions (LATERAL subqueries, UNNEST(...), Snowflake LATERAL FLATTEN(...), etc.), now ignores derived-subquery wrapper aliases and value-table-function aliases per SQLFluff AL05 parity, recursively checks nested derived-query scopes for inner alias usage/violations, applies to single-table scopes as well as multi-source joins, and now includes dialect-aware parity for BigQuery TO_JSON_STRING(<table_alias>), SQLFluff Redshift QUALIFY ordering behavior (QUALIFY alias references count only when QUALIFY follows FROM/JOIN directly, including unqualified alias-prefixed identifiers such as ss_sold_date), BigQuery/Redshift implicit array-relation alias usage (FROM t, t.arr / FROM t, t.super_array AS x), repeated-self-join alias handling where sibling aliases on the same base relation are exempt when one is referenced, and Snowflake DELETE ... USING subquery parity for inner CTE alias usage.
  • Parser fallback now normalizes escaped quoted-identifier edge cases for BigQuery/ClickHouse (plus ClickHouse trailing-comma-before-FROM fallback normalization), removing supported-dialect fixture parse blockers that previously masked LINT_AL_005 parity checks.
  • LINT_AM_005 now supports fully_qualify_join_types (inner/outer/both) via lint.ruleConfigs, with AST-driven outer-mode detection for unqualified LEFT/RIGHT joins and token fallback kept only for FULL JOIN keyword-form disambiguation.
  • LINT_AM_006 now supports group_by_and_order_by_style (consistent/explicit/implicit) via lint.ruleConfigs.
  • LINT_CP_001 now supports capitalisation_policy, ignore_words, and ignore_words_regex via lint.ruleConfigs.
  • LINT_CP_002-LINT_CP_005 now support extended_capitalisation_policy, ignore_words, and ignore_words_regex via lint.ruleConfigs; LINT_CP_002 additionally supports SQLFluff-style unquoted_identifiers_policy.
  • LINT_CV_003 now uses token/depth-aware SELECT-clause analysis for trailing-comma detection, replacing regex scanning, and supports SQLFluff-style select_clause_trailing_comma (forbid/require) via lint.ruleConfigs.
  • LINT_JJ_001 and LINT_LT_010/LINT_LT_011/LINT_LT_012/LINT_LT_013/LINT_LT_015 are now split out of parity.rs into dedicated core modules (jj_001.rs, lt_010.rs, lt_011.rs, lt_012.rs, lt_013.rs, lt_015.rs); LINT_JJ_001 now uses delimiter scanning (including %}/#} close checks and trim-marker-safe tags), LINT_LT_010/LINT_LT_011 now use tokenizer line-aware checks, LINT_LT_012 now enforces a single trailing newline at EOF, and LINT_LT_013/LINT_LT_015 now use direct newline-run scanning instead of regex matching.
  • LINT_LT_002/LINT_LT_003/LINT_LT_004/LINT_LT_007 are now split out of parity.rs into dedicated core modules (lt_002.rs, lt_003.rs, lt_004.rs, lt_007.rs); LINT_LT_002 now performs configurable indent-width checks (indent_unit / tab_space_size) with mixed tab/space detection, LINT_LT_003/LINT_LT_004 now use tokenizer-based operator/comma layout checks, and LINT_LT_007 now uses deterministic CTE sequence scanning instead of regex matching.
  • LINT_LT_001/LINT_LT_005/LINT_LT_006/LINT_LT_008/LINT_LT_009/LINT_LT_014 are now split out of parity.rs into dedicated core modules (lt_001.rs, lt_005.rs, lt_006.rs, lt_008.rs, lt_009.rs, lt_014.rs); LINT_LT_001 now uses deterministic layout-pattern scanners, LINT_LT_006 uses token-stream spacing detection for function-like calls, LINT_LT_009 uses tokenizer-located SELECT-line target counting, and LINT_LT_014 uses token/line-aware major-clause placement checks instead of regex masking.
  • LINT_CP_001-LINT_CP_005 are now split out of parity.rs into dedicated core modules (cp_001.rs-cp_005.rs); LINT_CP_004 was migrated to tokenizer-driven literal detection, LINT_CP_001/LINT_CP_003/LINT_CP_005 are tokenizer-driven (keyword/function/type token analysis), and LINT_CP_002 now uses shared AST identifier-candidate traversal (with SQLFluff-style identifier-policy filtering), replacing regex + manual masking paths.
  • LINT_AM_003 now follows SQLFluff AM03 semantics via AST ORDER BY analysis, flagging mixed implicit/explicit sort direction (including NULLS ordering cases) across nested query scopes; fixer parity now adds explicit ASC to implicit items in mixed clauses.
  • LINT_AM_005 fixer now follows SQLFluff AM05 config-aware behavior: default/inner rewrites bare JOIN to INNER JOIN, and outer/both modes also qualify LEFT/RIGHT joins plus rewrite bare FULL JOIN keywords to FULL OUTER JOIN (outside string literals) after AST rewrites.
  • LINT_AM_006 now follows SQLFluff AM06 default (consistent) semantics via AST traversal of GROUP BY / ORDER BY clauses, including nested-query precedence and rollup-style references.
  • LINT_AM_008 now follows SQLFluff AM08 semantics via AST join-operator analysis (implicit cross join detection, with WHERE deferral to CV12 and UNNEST/CROSS/NATURAL/USING exclusions); fixer parity now rewrites eligible implicit joins to explicit CROSS JOIN.
  • LINT_CV_012 now broadens AST join-operator handling to include INNER JOIN forms represented as JoinOperator::Inner without ON/USING, and now aligns closer to SQLFluff CV12 chain semantics by flagging only when all naked joins in a join chain are represented via WHERE join predicates.
  • LINT_AM_007 now performs AST set-expression branch-width checks with deterministic wildcard resolution for CTE/derived sources (including declared CTE column lists and table-factor alias column lists) and aliased nested-join factors (including USING(...) width deduction plus NATURAL JOIN overlap deduction when both sides expose deterministic output column names), while unresolved wildcard expansions remain non-violating (SQLFluff-aligned behavior).
  • Parity monolith decommission is complete: migrated rule registrations and parity tests are removed, and crates/flowscope-core/src/linter/rules/parity.rs has been retired.
  • [~] SQLFluff fixture adoption is in progress; AM, CV, ST fixture cases adopted for most semantic rules. Additional rule-level coverage is still being expanded.
  • [~] SQLFluff parity quality gaps remain for a subset of rules. See docs/sqlfluff-gap-matrix.md for the current status of per-rule parity deltas.

Quality Gates

Each migrated rule must pass:

  • correctness: fixture and regression coverage for trigger/non-trigger cases
  • span quality: stable and accurate primary highlight span
  • precision guardrails: false positive threshold on curated corpus
  • performance: no meaningful regression on representative workloads
  • parity continuity: no unintentional code/message regressions unless documented

Risks and Mitigations

  1. Parser limitations and missing AST locations
  • Mitigation: token spans become first-class; fallback span logic remains explicit.
  1. Dialect edge cases not fully supported upstream
  • Mitigation: dialect-specific behavior tables and confidence downgrade on fallback paths.
  1. Migration churn and temporary duplicate logic
  • Resolved: phased rule-by-rule migration completed; parity monolith retired.

Success Criteria

  • All semantic rules run through AST/scope engine.
  • All style/layout rules run through lexical/document engines.
  • parity.rs no longer acts as a rule home.
  • Rule additions are modular, testable, and engine-scoped by default.
  • [~] Lint output quality and determinism improve while preserving stable public rule codes.
  • Close remaining SQLFluff parity gaps (see docs/sqlfluff-gap-matrix.md).