Struct/record type declarations with named field access in Boole#1128
Struct/record type declarations with named field access in Boole#1128kondylidou wants to merge 2 commits into
Conversation
tautschnig
left a comment
There was a problem hiding this comment.
Scope observation. The PR bundles:
- Record/struct type declarations (
@[declareRecord]+RecordBindingSpec+struct_decl). - Dot-notation desugaring (
val.field→T..field(val)). - Unsigned bitvector comparisons (
<u,<=u,>u,>=u) across all widths (1/8/16/32/64). - Generalization of
toCoreTypedBin/toCoreTypedUnfromint-only toint + bvN. - Integer left-shift with constant amount (
x << n→x * 2^n).
Each is independently reviewable. Splitting into three PRs (records + dot notation together; unsigned comparisons + generalization together; integer shift on its own) would have made the diff much easier to grade. The PR body lists them cleanly, which helps — but the actual feature surface is larger than the title implies. Not blocking, just a note for future PRs.
Substantive observations on the design.
-
Dot-notation desugaring is string-split-based and brittle. At
Elab/Core.lean:1696–1715, the "identifier not found" path attempts to spliceval.field→T..field(val)by doingname.splitOn "."andname.splitOn ".."and checking that there are exactly two parts and no double-dot. This works for the current benchmark but:- Chained field access
a.b.cis unsupported."a.b.c".splitOn "."gives three parts and theparts.length == 2check fails. Sofe.inner.limb0just errors out "Unknown expr identifier". Worth either supporting (recursive desugar) or explicitly documenting as a limitation. - Parser-level solution would be cleaner. The desugaring lives at elaboration time because Boole's grammar currently parses
val.fieldas a single identifier token. A grammar-levelfieldAccessproduction (the way Core does it) would remove the string-splitting entirely and naturally handle chained access. Out of scope for this PR; worth tracking as a follow-up. - No direct unit tests. The only coverage is the benchmark. A couple of
#guard_msgscases targeting: (a) validval.field, (b)a.b.c(failure case — pins the limitation so a future change is visible), (c)val.fieldwherevalisn't a record (should keep the original "Unknown expr identifier" error), and (d) accidentalT..fieldliteral in source (should NOT trigger the rewrite) — would lock the fragile logic down. See inline.
- Chained field access
-
Hardcoded
_mkconstructor suffix.addRecordBindingsandtoCoreDeclsboth assume the constructor is namedrecordName ++ "_mk". Fine as a convention; worth a module-level constant so the string appears once (private def recordConstructorSuffix : String := "_mk"). Maintenance risk is low, but if someone ever wantsMyRecord.newor similar they'll need to grep for four places. Non-blocking. -
Integer left-shift constant-folding. The encoding
x << n → x * (1 <<< n)atVerify.lean:378–386is sound overInt, but it silently allows the shift amount to be arbitrarily large (via.natToInt), producing a potentially hugeInt.ofNat (1 <<< 200)literal. SMT solvers handle arbitrary-precision integers so this is probably fine in practice, but an upper-bound sanity check (e.g. rejectn > 256or similar) would prevent accidental DoS-style inputs. Low priority. The pattern match.natToInt _ ⟨_, n⟩is narrow — a user writingx << (5 + 0)would fail with "requires a constant (literal) shift amount", which is correct but the error could mention "literal" means strictly a numeral, not a constant expression. See inline. -
Dispatch table explosion for unsigned comparisons.
Core/DDMTransform/Translate.lean:512–652adds 20 new lines — 4 operators × 5 widths. Consistent with the existing signed-comparison pattern, which is also 20 lines, and probably fine to keep parallel for readability. A meta-level factoring (for bvN in [1,8,16,32,64]: for op in [ULt, ULe, UGt, UGe]: ...) would reduce 40 total lines to ~8 but at the cost of a more indirect dispatch. Out of scope here; worth a follow-up if the table ever grows further. -
toCoreTypedUn/toCoreTypedBinwildcard fallthrough. The old code rejected non-inttypes up-front with a nice error message ("Unsupported typed operator type: {repr ty}"). The new code dispatches totoCoreBvBin/toCoreBvUnon the wildcard, so non-int non-bv types (e.g..bool,.string, some future.array) now fail insidebvWidthwith"Unsupported bit-vector type", which is less informative. One-line fix: explicit| .bv{1,8,16,32,64} _ => toCoreBv…cases and a final_ => throwAt m s!"Unsupported typed operator type: {repr ty}"fallthrough.
Test coverage. Only the benchmark exercises the new features. That's a lot of surface area for one test program:
- No unit test for
@[declareRecord]processing in isolation (e.g., a malformeddeclareRecordattribute producing a sensible error rather than the panics currently flagged by the lint). - No unit test for dot-notation desugaring edge cases (above).
- No unit test for unsigned-vs-signed comparison distinction (e.g., a
bv64value with MSB set that flips between<sand<u). - No unit test for
int << nwherenis not a literal, wherenis 0, wherenis large. - No unit test for single-constructor-datatype equivalence (a record with named fields should produce a type isomorphic to what the user would have written as a one-constructor datatype manually).
A small StrataTest/Languages/Boole/Records.lean with a handful of #guard_msgs-level tests would lock each of these down independently of the FE51 benchmark.
Proof-coverage suggestions.
Three invariants are worth stating, either as example := on small inputs, as plausible properties, or (for the first two) as theorems in a proof module:
- Record ↔ single-constructor-datatype equivalence. For every
type T := {f1 : A, f2 : B}programP, the result oftoCoreDeclsis α-equivalent to the result of the hand-writtentype T := { data T_mk(f1: A, f2: B) }program. Pinning this as a theorem (or at least a programmatic fixture check) prevents the_mk/T..field(val)synthesis from drifting away from "the program you would have written manually" over time. - Dot-notation preservation.
elabExprapplied to aval.fieldidentifier whereval : TandT..fieldis in scope produces the sameExprtree aselabExprapplied toT..field(val)as two separate tokens. This is exactly what the desugaring is supposed to guarantee; stating it as a theorem (even informally in a comment) prevents regressions in the string-split logic. - Integer shift equivalence.
∀ x : Int, ∀ n : Nat, Int.shiftLeft x n = x * Int.ofNat (1 <<< n). This is mathematically trivial but closing the semantic loop between the Boole-level<<and the Core-level*encoding is worthwhile because it documents the axiom the encoding relies on.
Inline comments on the two highest-impact items (the lint-blocking panics and the dot-notation desugaring) below.
| | { dialect := _, name := "declareRecord" } => do | ||
| let args := attr.args | ||
| if args.size < 2 then | ||
| newBindingErr "declareRecord expects at least 2 arguments (name, fields)." | ||
| return none | ||
| let .catbvar nameIndex := args[0]! | ||
| | newBindingErr "declareRecord: invalid name index"; return none | ||
| let .catbvar fieldsIndex := args[1]! | ||
| | newBindingErr "declareRecord: invalid fields index"; return none | ||
| let .isTrue nameP := decideProp (nameIndex < argDecls.size) | ||
| | return panic! "Invalid name index" | ||
| let .isTrue fieldsP := decideProp (fieldsIndex < argDecls.size) | ||
| | return panic! "Invalid fields index" | ||
| let functionTemplates ← parseFunctionTemplates (args.extract 2 args.size) | ||
| some <$> .record <$> pure { | ||
| nameIndex := ⟨nameIndex, nameP⟩, | ||
| fieldsIndex := ⟨fieldsIndex, fieldsP⟩, | ||
| functionTemplates |
There was a problem hiding this comment.
CI blocker: Run lint checks (stable) fails on the three new panic! uses here and at line 2194. The Strata lint only flags newly added panics without -- nopanic:ok annotations (pre-existing ones in the same file, e.g. addDatatypeBindings at 1134–1138, are grandfathered), so this is strictly a new-code policy issue.
Two ways to fix:
- Quick fix: append
-- nopanic:okafter eachpanic!line, matching the grandfathered sibling sites. Minimal change; keeps behaviour identical. Suggested if you want to keep the parallel structure with the existing datatype code. - Better fix: replace
panic!with error returns through theExcept StringchannelparseNewBindingsalready uses. Line 1148–1155 already handles similar malformed-metadata cases vianewBindingErr "..."; return none; thedecideProp … | return panic!branches at lines 1156 and 1158 could do the same.addDatatypeBindings's existing panics would still be there but are out of scope.
Same choice applies to line 2194:
| a => panic! s!"Expected ident for record name {repr a}"which could become an Except String error like the rest of addRecordBindings, or gain a -- nopanic:ok. Note that the symmetric site in addDatatypeBindings (line 2134 area for the datatype name) also panics without annotation; if you take the quick-fix route here, consider tagging those too as a small consistency pass.
Either way, CI stays red until this is resolved.
| | -- Desugar dot notation: "base.field" → T..field(base) | ||
| -- A single dot (not "..") separates the variable from the field name. | ||
| let dotParts := name.splitOn ".." | ||
| let parts := name.splitOn "." | ||
| if parts.length == 2 && dotParts.length == 1 then | ||
| let baseName := parts[0]! | ||
| let fieldName := parts[1]! | ||
| if let some baseBinding := tctx.lookupVar baseName then | ||
| let typeExprOpt : Option TypeExpr := match baseBinding with | ||
| | .bvar _ (.expr tp) => some tp | ||
| | .fvar _ (.expr tp) => some tp | ||
| | _ => none | ||
| if let some (.fvar _ typeIdx _) := typeExprOpt then | ||
| if let some typeName := tctx.globalContext.nameOf? typeIdx then | ||
| let selectorName := typeName ++ ".." ++ fieldName | ||
| if let some selectorIdx := tctx.globalContext.findIndex? selectorName then | ||
| let baseExpr : Expr := match baseBinding with | ||
| | .bvar idx _ => .bvar loc idx | ||
| | .fvar idx _ => .fvar loc idx | ||
| let appExpr : Expr := .app loc (.fvar loc selectorIdx) (.expr baseExpr) |
There was a problem hiding this comment.
The dot-notation desugaring here is the most delicate part of the PR. Three observations:
(a) Chained access is silently unsupported. "a.b.c".splitOn "." returns three parts, so the parts.length == 2 check fails, and a.b.c falls through to logError loc s!"Unknown expr identifier {name}". That's a correctness non-issue (the identifier genuinely wasn't found, and we emit a sane error) but the user doesn't learn why — they see "Unknown expr identifier a.b.c" rather than "chained field access not yet supported". Either:
- Recursively desugar:
a.b.c→T..c(T..b(a)). A few extra lines, much nicer UX for struct-of-struct code. - Explicit check-and-report: if
parts.length > 2and none of the multi-dot tokens form a valid identifier, emit a targeted "chained field access not yet supported" diagnostic.
(b) String-split identifier parsing is fragile. Today it relies on Boole's grammar parsing val.field as a single atom with an embedded ., and on the convention that multi-dot tokens like T..field don't match the parts.length == 2 && dotParts.length == 1 guard. If the DDM elaborator ever starts producing val.field as two tokens, or the grammar adds another identifier category that uses . for a different purpose, this path silently stops applying. A grammar-level fieldAccess production (the way Core does it) would remove the string-splitting entirely. Follow-up-able, not blocking.
(c) No direct unit tests. The only coverage is struct_field_access.lean end-to-end. Four targeted #guard_msgs cases would lock the behaviour down:
- Valid single-level
val.fieldon a var-local record, on a parameter, and on a nested var: currently all rely on the benchmark. a.b.c: pins the limitation (expected either to resolve via nested desugar, or to produce a specific error message — pick one and test it).val.fieldwherevalis not a record: must keep the pre-existing"Unknown expr identifier"diagnostic intact.T..fieldtyped literally in source: must not trigger the rewrite (already handled by thedotParts.length == 1guard; a test pins this down so a future simplification doesn't regress).
See also the main review body's suggested proof: a theorem (or informal comment) that elabExpr (val.field) produces the same tree as elabExpr (T..field(val)) would state the desugaring's invariant explicitly.
| | .bvshl m ty a b => | ||
| match ty with | ||
| | .int _ => do | ||
| -- Integer << n encoded as multiplication by 2^n; n must be a literal. | ||
| let n ← match b with | ||
| | .natToInt _ ⟨_, n⟩ => pure n | ||
| | _ => throwAt m "Integer << requires a constant (literal) shift amount" | ||
| toCoreTypedBin m ty "Mul" (← toCoreExpr a) (.intConst () (Int.ofNat (1 <<< n))) | ||
| | _ => toCoreBvBin m ty "Shl" (← toCoreExpr a) (← toCoreExpr b) |
There was a problem hiding this comment.
Two small things on the integer-left-shift encoding:
(a) Error message could be more specific. "Integer << requires a constant (literal) shift amount" is accurate but a user who writes x << (5 + 0) may reasonably expect the evaluator to fold the RHS — the error doesn't communicate that only a numeric literal is accepted, not a constant-folded expression. Consider "Integer << requires a numeric literal as the shift amount; arbitrary expressions (even constant ones) are not yet supported."
(b) Unbounded shift amount. Int.ofNat (1 <<< n) where n is an arbitrary Nat from source produces arbitrary-precision Int literals. SMT solvers handle arbitrary-precision integers fine, but a pathological input like x << 10000 would generate a ~3000-digit literal in the SMT file. Cheap guard: reject n > SHIFT_MAX (e.g. 256, the widest BV we speak) with a targeted message.
(c) Test coverage is benchmark-only. intShlSeed in struct_field_access.lean covers one valid use (x < (1 << 51)) but nothing exercises the error paths. Two #guard_msgs tests on the "Integer << requires…" branch would pin the error message and catch future regressions silently flipping the fallback path.
|
I'm converting this to draft for now because there is quite some work to be done |
Summary
Implements struct/record type declarations with named field access in Boole (FR#13),
along with the bitvector operator support needed to make them useful with Rust/Verus
fixed-width integer field types (
bv8,bv16,bv32,bv64).Struct/record types (
type T := { f1: A, f2: B })Adds
struct_declto the Boole grammar with a@[declareRecord]DDM annotation.The declaration desugars to a single-constructor datatype
T_mk(f1: A, f2: B)withauto-generated field selectors
T..f1(val),T..f2(val)etc. A newRecordBindingSpecvariant handles elaboration-time registration so that constructor and selector names are
in scope for type-checking before function bodies are elaborated.
Dot notation (
val.field) for locals and parametersSingle-token
val.fieldidentifiers are desugared inelabExprtoT..field(val)selector applications. This path handles both
var-declared record locals andfunction/procedure parameters uniformly — both are
.bvar _ (.expr (.fvar _ typeIdx _))bindings in the typing context, so no separate parameter tracking is needed.
Unsigned bitvector comparisons (
<u,<=u,>u,>=u)Adds
bvult/bvule/bvugt/bvugeto the Core DDM grammar and Translate dispatchtable (all widths:
bv1,bv8,bv16,bv32,bv64), and routes them toBv{N}.ULt/ULe/UGt/UGeCore ops in Boole'sVerify.lean. Distinguishes unsigned comparisonfrom the existing signed
<s/<=sand generic</<=(integer), matching Rust/Verus unsignedinteger comparison semantics (
u8,u16,u32,u64).Bitvector arithmetic (
+,-,*,neg) onbvNtypesExtends
toCoreTypedBin/toCoreTypedUnto dispatch bitvector types totoCoreBvBin/toCoreBvUn. Previously these helpers were int-only;bv64fieldaddition (e.g.
a.limb0 + b.limb0) now routes toBv64.Add.Integer left-shift with constant amount (
int << literal)x << nwherenis an integer literal is encoded asx * 2^n(linear integerarithmetic). Core has no native integer shift; constant folding at translation time
avoids SMT non-linearity.
Benchmark
struct_field_access.leanmodels dalek's
FieldElement51 { limbs: [u64; 5] }with unrolledbv64limbs andverifies a limb-bounded addition procedure end to end.
Remaining gaps
[u64; 5]) — currently unrolled as separate scalar fields.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.