Add CoreMatch dialect: surface pattern matching for Strata Core#1161
Open
julesmt wants to merge 8 commits into
Open
Add CoreMatch dialect: surface pattern matching for Strata Core#1161julesmt wants to merge 8 commits into
julesmt wants to merge 8 commits into
Conversation
added 8 commits
May 8, 2026 14:10
translateSyntaxDef collapsed any single-ident syntax to .passthrough,
but the parser path (Parser.opSyntaxParser) hardcodes the passthrough's
target as argDecls[0]. For rules like
fn matchExprArms_one (tp : Type, a : MatchExprArm tp) : MatchExprArms tp =>
a;
the actual target is argDecls[1] (because tp is an implicit type
parameter at index 0). Passthrough sent the parser to argDecls[0],
which is the implicit Type slot, producing nonsensical syntax trees
and triggering parsing-ambiguity panics in elabExpr.
Fix: only enable passthrough when the single ident references
argDecls[0]. Otherwise emit .std atoms prec — the parser will route
through liftToKind with the correct ident index.
translateSyntaxDef now also folds bound type variables in the result
type into the implicit-arg map, so e.g.
fn match_expr (dt : Type, tp : Type, scrut : dt, ..., body : tp) : tp =>
"match " scrut " { ... " body " ... }";
is accepted: `tp` is reachable through both `body : tp` (already worked)
and the result `: tp` (new). Matches the existing convention that
implicit type params must appear *somewhere* in the rule's signature.
Lays the programmatic foundation for the dialect. Two modules: * `Strata/Languages/CoreMatch/CoreMatch.lean` — the `MExpr` / `MArm` / `MStmt` / `MStmtArm` / `MFunction` / `MProcedure` / `MDecl` / `MProgram` types. Constructor arms carry a *pre-curried case function* `caseFn = λb₁:T₁. … λbₖ:Tₖ. body` in de Bruijn form, and statement-level arms carry pre-lowered bodies that already include the `var b : T := D..f(scrut);` binder inits. Both choices let the lowering pass collapse to a single straightforward translation. * `Strata/Languages/CoreMatch/ToCore.lean` — lowering an `MProgram` to a `Core.Program`. Expression-level matches lower to `D$Elim scrut case₁ … caseₙ` with cases reordered into the datatype's declaration order; constructors with no explicit arm fall through to the wildcard arm (padded with `λ_:T.` lambdas to match arity), or to a `__coreMatch_missing_<ctor>` sentinel fvar when no wildcard is present (unreachable when the redundancy checker has run). Statement matches lower to a nested-if chain over the auto-generated testers `D..isCᵢ`; uncovered constructors fall through to the wildcard arm or to `assert false` with a `coreMatch_nonexhaustive` label. This commit is self-contained: no DDM grammar, no source parser, no verifier integration. The next commits add the redundancy checker (ArmCheck) and the source pipeline.
`Strata/Languages/CoreMatch/ArmCheck.lean` — a small pure module that inspects a list of arm "keys" against a datatype's constructor list and reports any of: * `unknownConstructor` — constructor name not in the datatype * `duplicateConstructor` — same constructor named twice * `multipleWildcards` — more than one `_ => …` arm * `redundantWildcard` — wildcard appears though every constructor is already covered explicitly * `nonExhaustive` — at least one constructor has no arm and there's no wildcard `ArmKey = Option String` is a deliberately abstract view of an arm (`some ctor` or `none` for a wildcard) so the checker is shape- independent: both the expression-level `MArm` and statement-level `MStmtArm` project to it via `MArm.key` / `MStmtArm.key` and run through the same `checkAgainst`. The checker doesn't depend on the source pipeline; it can be called on hand-built `MProgram`s as well, which is why it lands separately from the DDM-side translator.
The source-side of the dialect, on top of the AST, lowering, and
ArmCheck modules from the previous two commits.
* `DDMTransform/Grammar.lean` declares the `CoreMatch` dialect (which
imports `Core`) and the surface syntax for `match`:
- statement form `match scrut { Ctor(v:T,…) => { … } … }` for
procedure bodies
- expression form `match scrut { arm Ctor(v:T,…) => body; … }` for
function bodies, with a head-arm body surfaced as a direct
`tp`-typed argument so DDM's type inference fixes the result
type from the user's first body, plus a tail-arm-list category
whose bodies are typed `Expr` (re-checked by the translator)
- the `arm` keyword anchors each tail arm against parser ambiguity
with Core operators
- both forms share the `match` keyword; the parser dispatches by
category (Statement vs Expr) so the two never compete
* `DDMTransform/StrataGen.lean` runs `#strata_gen CoreMatch` to
produce the typed `CoreMatchDDM.*` AST.
* `DDMTransform/Translate.lean` walks the typed `CoreMatchDDM.Program`
and emits an `MProgram`. Tracks `bvars`, `tyBVars`, datatype
ctor info, label counters, and a `currentRecFns` / `recRewrite`
state used by the rec-call rewrite for natural-style structural
recursion. Detects two arm shapes (`natural`, `cata`, `unknown`)
via `classifyArm`, rewrites self-calls `f(field)` into sentinel
fvars during arm body translation, then resolves the sentinels to
the eliminator's rec-result bvars at the absMulti wrapping step.
Non-structural self-references (bare references, calls on
non-recursive arguments) are rejected with a clear diagnostic.
* `Verify.lean` exposes `toCore` (programmatic) and `parseToCore`
(source) entry points for the dialect.
`Strata.lean` registers the three new modules.
* `Examples/pattern_matching/`: nine `.coreMatch.st` programs covering non-recursive enums with wildcards, structural recursion (natural and cata-style), nested matches, mixed-type constructor fields, expression-tree evaluation, and constructor calls in arm bodies. README documents the conventions. * `StrataTest/Languages/CoreMatch/CompileTest.lean`: programmatic invariants on `MProgram → Core.Program` lowering plus ArmCheck diagnostics. * `StrataTest/Languages/CoreMatch/SourceTest.lean`: end-to-end source pipeline tests asserting `D$Elim` head, no self-references after natural-style rewrite, ite-shaped procedure bodies, and the redundancy / non-structural diagnostic surface.
* `editors/GenSyntax.lean` extended to emit a per-dialect highlighter: add `LangCfg`/`EmacsCfg` config records and a `DialectGenSpec` describing each language to write. The generator now produces highlighters for both `Core` (`.core.st`) and `CoreMatch` (`.coreMatch.st`). * Auto-generated mode files: `editors/emacs/coreMatch-st-mode.el`, `editors/vscode/syntaxes/coreMatch-st.tmLanguage.json`. * VSCode `package.json` and Emacs / VSCode README updates wire the new language entries.
Quick CLI that loads the CoreMatch DDM dialect, parses a single `.coreMatch.st` file, runs the redundancy / exhaustiveness checker, and lowers it to a `Core.Program`. Prints `OK: <N> decls` (with optional `--show-decls` summary) or `ERR: <message>`. Useful for iterating on individual programs in `Examples/pattern_matching/` without rebuilding the full test suite.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the CoreMatch dialect: a Core extension giving surface-level
pattern matching at both expression and statement positions, with
two recursion styles that lower to identical Core.
matchlowers to an application of theauto-generated structural eliminator
D$Elim. Recursive functionswritten this way carry no termination obligation — the eliminator
is total by construction.
matchlowers to a nestedifchain over theauto-generated testers
D..isCᵢ, with pattern binders prependedas
var b : T := D..fᵢ(scrut);init statements.armkeyword anchors expression-level arms against ambiguitywith Core operators.
Two recursion surfaces, one lowering
Both forms compile to the same
List$Elimapplication. The naturalstyle runs through a small two-pass transformation:
tryRecRewriteswaps each
f(field)self-call for a sentinel fvar during bodytranslation, and
desugarNaturalToCatainjects rec-result binders,lifts existing bvars past them, and substitutes the sentinels for
the matching
bvars.What CoreMatch can do today
matchwith constructor and wildcard arms,and any number of pattern binders per constructor.
matchwith the head arm anchoring theresult type and the
armkeyword on tail arms.zero or more typed fields; constructors can mix recursive and
non-recursive fields freely.
_ =>) covering one or more uncoveredconstructors; redundancy checker rejects a wildcard that's already
redundant given the other arms.
f(field)self-calls on recursive constructor fields,rewritten internally to rec-result references.
binders, typed as the function's return type.
DFS with both
leftandrightrecursive fields, each producingits own auto-injected rec-result binder.
same
rec function … and …block is a valid self-call target.match.datatype value from a recursive case (
map_inc,filter_pos,etc.).
requires/ensuresclauses, withfreeattributes and user-supplied labels, both preserved throughlowering.
assert,assume,if/else, non-deterministicif *, labelledblocks,var n : T;(no rhs),var n : T := e;, assignments.int,bool,string,real, everybv{1,8,16,32,64}width,Map K V,Sequence T,arrowtypes,and arbitrarily nested type constructors.
duplicate or unknown constructor in an arm, redundant or multiple
wildcards, non-structural recursive calls (
f(otherList),f(reverse(tl)),fas a higher-order value), and wrong armbinder counts.
What CoreMatch cannot do today
translator's
toCoreDatatypeaccepts an emptytypeParamslistonly.
rewrite only fires for
f(field)wherefield's type matches thescrutinee datatype directly. Children stored in
Sequence TorMap K Taren't lifted across the type constructor, sosumonan n-ary
Tree { Node(value : int, children : Sequence Tree) }cannot be written structurally.
merge,Ackermann,
gcd, etc. all require either an explicitdecreasesmeasure or a different fixed-point principle. CoreMatch has no
surface form for
decreasesclauses yet; the escape hatch is tofall back to Core's procedure-level termination machinery.
constructor at depth 1. Nested decomposition needs a nested
matchinside the arm body.when …orif …guard clause; pushthe predicate into the body.
never literal values. Match
Succ(Succ(_))is not parseable.scrutinee datatype must be cached during the same
Translatewalk. Datatypes from imports aren't visible to the redundancy
check or natural-style rewrite.
statements — the only way to lift an expression-level match
into statement position is via
var x := match …;.