Linter Architecture Design

Status: Active Owner: flowscope-core Last updated: 2026-02-21

Context

FlowScope ships 72 lint rules across 9 families (AL, AM, CP, CV, JJ, LT, RF, ST, TQ), each implemented in a dedicated one-rule-per-file module under linter/rules/.

The linter started with a handful of core AST rules plus a monolithic parity.rs that used regex/heuristic matching for SQLFluff compatibility. That monolith has been fully decommissioned — all rules are now in dedicated modules using AST-driven or token-stream-driven implementations.

Goals

Robust correctness across dialects and real-world SQL.
Sound architecture with explicit semantics and low false positives.
Maintainable implementation with clear rule ownership and minimal coupling.
Scalable rule engine that can grow without a monolith.
Deterministic outputs and stable spans suitable for editor and CI usage.

Non-Goals

One-shot rewrite of all existing rules.
"AST-only" implementation of purely lexical formatting rules.
Perfect SQLFluff behavior clone across every dialect from day one.

Architecture Principles

AST-first semantics

Semantic rules must be driven by parsed AST plus scope/resolution context, not regex.
Examples: aliasing semantics, reference qualification, join logic, set operation checks.

Token-aware style

Formatting and trivia rules must use token stream data, not AST-only approximations.
Examples: whitespace, newlines, comments, casing style, quoting style, Jinja padding.

Parse once, tokenize once

Build a single lint document model per SQL input and reuse it across all rules.
Avoid repeated parsing and repeated ad hoc string scans in each rule.

Stable rule contract

Each rule gets structured input from the engine, not direct access to ad hoc helpers.
Rule output must include deterministic code, message, severity, statement index, and span.

Dialect-explicit behavior

Rule decisions must be dialect-aware and must not silently assume generic SQL semantics.
When parser fallback is used, confidence should degrade explicitly.

Deterministic and testable

Same input and config must always produce the same ordered issue set.
Rules must be independently testable with focused fixtures.

Regex is migration glue, not architecture

Existing regex heuristics can remain temporarily for parity continuity.
New semantic rules must not be implemented with regex.

Key Design Decisions

Decision 1: Introduce a `LintDocument` model

The linter engine should construct a normalized input model once:

sql (full source text)
dialect and parser/fallback metadata
parsed statements with statement ranges
token stream with token spans and token kinds
optional scope/resolution metadata for semantic rules

This becomes the only rule input surface.

Decision 2: Split rules into 3 engines

Semantic engine

Input: AST + scope/resolution context.
Handles semantic correctness and structural SQL logic.

Lexical engine

Input: token stream + token spans.
Handles formatting/casing/quoting/comment-aware style rules.

Document engine

Input: whole file/document metadata.
Handles file-level checks (EOF newline, leading blank lines, batch separators).

Decision 3: Replace parity monolith with one-rule-per-file modules

Move from parity.rs monolith to rules/<code>.rs modules.
Keep shared traversal and token utilities in common helpers.
Preserve existing lint codes for API stability.

Decision 4: Standardize span generation

Primary span source: parser or tokenizer spans.
Secondary span source: scoped fallback search only when necessary.
No free-form "best guess" spans without explicit fallback path.

Decision 5: Add rule metadata and confidence

Each issue should carry internal provenance metadata:

engine type (semantic, lexical, document)
confidence (high, medium, low)
fallback source (if parser fallback or heuristic logic was used)

This supports telemetry, triage, and quality gates.

Decision 6: Define fixability as a rule capability

Rule metadata should include whether a deterministic fix is supported.

No inferred fix logic from message text.
Fix support should be explicit and tested per rule.

Execution Pipeline

Parse SQL into statements with selected dialect.
Tokenize full source with token spans.
Build LintDocument with statement ranges and shared metadata.
Optionally build scope/resolution context once for semantic rules.
Execute semantic, lexical, and document engines.
Normalize, sort, and deduplicate issues.
Emit final issues with deterministic ordering and stable spans.

Migration Plan

Phase 0: Foundation [COMPLETE]

LintDocument model, tokenization pass, and document-level lint execution path are live.
Token stream provider propagated through rule context (parse once, tokenize once).

Phase 1: High-risk semantic migrations [COMPLETE]

All semantic-heavy rules migrated to AST-driven implementations:

references (RF_001–RF_006)
structure checks (ST_001–ST_012)
ambiguous join/reference rules (AM_001–AM_009)
convention rules (CV_001–CV_012)
aliasing rules (AL_001–AL_009)

Phase 2: Lexical/style migrations [COMPLETE]

All style-oriented checks migrated to dedicated modules:

capitalization (CP_001–CP_005) — tokenizer-driven
layout (LT_001–LT_015) — tokenizer/line-aware checks
jinja padding (JJ_001) — delimiter scanning
TSQL checks (TQ_001–TQ_003) — AST/token-driven

Remaining work: SQLFluff configuration-depth parity gaps for some CP/LT/JJ rules.

Phase 3: Decommission parity monolith [COMPLETE]

parity.rs retired and deleted.
All 72 rules live in one-rule-per-file modules under linter/rules/.

Progress Snapshot

Quality Gates

Each migrated rule must pass:

correctness: fixture and regression coverage for trigger/non-trigger cases
span quality: stable and accurate primary highlight span
precision guardrails: false positive threshold on curated corpus
performance: no meaningful regression on representative workloads
parity continuity: no unintentional code/message regressions unless documented

Risks and Mitigations

Parser limitations and missing AST locations

Mitigation: token spans become first-class; fallback span logic remains explicit.

Dialect edge cases not fully supported upstream

Mitigation: dialect-specific behavior tables and confidence downgrade on fallback paths.

Migration churn and temporary duplicate logic

Resolved: phased rule-by-rule migration completed; parity monolith retired.

Success Criteria

All semantic rules run through AST/scope engine.
All style/layout rules run through lexical/document engines.
parity.rs no longer acts as a rule home.
Rule additions are modular, testable, and engine-scoped by default.
[~] Lint output quality and determinism improve while preserving stable public rule codes.
Close remaining SQLFluff parity gaps (see docs/sqlfluff-gap-matrix.md).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linter Architecture Design

Context

Goals

Non-Goals

Architecture Principles

Key Design Decisions

Decision 1: Introduce a `LintDocument` model

Decision 2: Split rules into 3 engines

Decision 3: Replace parity monolith with one-rule-per-file modules

Decision 4: Standardize span generation

Decision 5: Add rule metadata and confidence

Decision 6: Define fixability as a rule capability

Execution Pipeline

Migration Plan

Phase 0: Foundation [COMPLETE]

Phase 1: High-risk semantic migrations [COMPLETE]

Phase 2: Lexical/style migrations [COMPLETE]

Phase 3: Decommission parity monolith [COMPLETE]

Progress Snapshot

Quality Gates

Risks and Mitigations

Success Criteria

FilesExpand file tree

linter-architecture.md

Latest commit

History

linter-architecture.md

File metadata and controls

Linter Architecture Design

Context

Goals

Non-Goals

Architecture Principles

Key Design Decisions

Decision 1: Introduce a LintDocument model

Decision 2: Split rules into 3 engines

Decision 3: Replace parity monolith with one-rule-per-file modules

Decision 4: Standardize span generation

Decision 5: Add rule metadata and confidence

Decision 6: Define fixability as a rule capability

Execution Pipeline

Migration Plan

Phase 0: Foundation [COMPLETE]

Phase 1: High-risk semantic migrations [COMPLETE]

Phase 2: Lexical/style migrations [COMPLETE]

Phase 3: Decommission parity monolith [COMPLETE]

Progress Snapshot

Quality Gates

Risks and Mitigations

Success Criteria

Decision 1: Introduce a `LintDocument` model