Skip to content

Struct/record type declarations with named field access in Boole#1128

Draft
kondylidou wants to merge 2 commits into
strata-org:mainfrom
kondylidou:pr/structs
Draft

Struct/record type declarations with named field access in Boole#1128
kondylidou wants to merge 2 commits into
strata-org:mainfrom
kondylidou:pr/structs

Conversation

@kondylidou
Copy link
Copy Markdown
Contributor

Summary

Implements struct/record type declarations with named field access in Boole (FR#13),
along with the bitvector operator support needed to make them useful with Rust/Verus
fixed-width integer field types (bv8, bv16, bv32, bv64).

Struct/record types (type T := { f1: A, f2: B })

Adds struct_decl to the Boole grammar with a @[declareRecord] DDM annotation.
The declaration desugars to a single-constructor datatype T_mk(f1: A, f2: B) with
auto-generated field selectors T..f1(val), T..f2(val) etc. A new RecordBindingSpec
variant handles elaboration-time registration so that constructor and selector names are
in scope for type-checking before function bodies are elaborated.

Dot notation (val.field) for locals and parameters

Single-token val.field identifiers are desugared in elabExpr to T..field(val)
selector applications. This path handles both var-declared record locals and
function/procedure parameters uniformly — both are .bvar _ (.expr (.fvar _ typeIdx _))
bindings in the typing context, so no separate parameter tracking is needed.

Unsigned bitvector comparisons (<u, <=u, >u, >=u)

Adds bvult/bvule/bvugt/bvuge to the Core DDM grammar and Translate dispatch
table (all widths: bv1, bv8, bv16, bv32, bv64), and routes them to
Bv{N}.ULt/ULe/UGt/UGe Core ops in Boole's Verify.lean. Distinguishes unsigned comparison
from the existing signed <s/<=s and generic </<= (integer), matching Rust/Verus unsigned
integer comparison semantics (u8, u16, u32, u64).

Bitvector arithmetic (+, -, *, neg) on bvN types

Extends toCoreTypedBin/toCoreTypedUn to dispatch bitvector types to
toCoreBvBin/toCoreBvUn. Previously these helpers were int-only; bv64 field
addition (e.g. a.limb0 + b.limb0) now routes to Bv64.Add.

Integer left-shift with constant amount (int << literal)

x << n where n is an integer literal is encoded as x * 2^n (linear integer
arithmetic). Core has no native integer shift; constant folding at translation time
avoids SMT non-linearity.

Benchmark

struct_field_access.lean
models dalek's FieldElement51 { limbs: [u64; 5] } with unrolled bv64 limbs and
verifies a limb-bounded addition procedure end to end.

Remaining gaps

  • Fixed-size array fields ([u64; 5]) — currently unrolled as separate scalar fields.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link
Copy Markdown
Contributor

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scope observation. The PR bundles:

  1. Record/struct type declarations (@[declareRecord] + RecordBindingSpec + struct_decl).
  2. Dot-notation desugaring (val.fieldT..field(val)).
  3. Unsigned bitvector comparisons (<u, <=u, >u, >=u) across all widths (1/8/16/32/64).
  4. Generalization of toCoreTypedBin / toCoreTypedUn from int-only to int + bvN.
  5. Integer left-shift with constant amount (x << nx * 2^n).

Each is independently reviewable. Splitting into three PRs (records + dot notation together; unsigned comparisons + generalization together; integer shift on its own) would have made the diff much easier to grade. The PR body lists them cleanly, which helps — but the actual feature surface is larger than the title implies. Not blocking, just a note for future PRs.

Substantive observations on the design.

  • Dot-notation desugaring is string-split-based and brittle. At Elab/Core.lean:1696–1715, the "identifier not found" path attempts to splice val.fieldT..field(val) by doing name.splitOn "." and name.splitOn ".." and checking that there are exactly two parts and no double-dot. This works for the current benchmark but:

    • Chained field access a.b.c is unsupported. "a.b.c".splitOn "." gives three parts and the parts.length == 2 check fails. So fe.inner.limb0 just errors out "Unknown expr identifier". Worth either supporting (recursive desugar) or explicitly documenting as a limitation.
    • Parser-level solution would be cleaner. The desugaring lives at elaboration time because Boole's grammar currently parses val.field as a single identifier token. A grammar-level fieldAccess production (the way Core does it) would remove the string-splitting entirely and naturally handle chained access. Out of scope for this PR; worth tracking as a follow-up.
    • No direct unit tests. The only coverage is the benchmark. A couple of #guard_msgs cases targeting: (a) valid val.field, (b) a.b.c (failure case — pins the limitation so a future change is visible), (c) val.field where val isn't a record (should keep the original "Unknown expr identifier" error), and (d) accidental T..field literal in source (should NOT trigger the rewrite) — would lock the fragile logic down. See inline.
  • Hardcoded _mk constructor suffix. addRecordBindings and toCoreDecls both assume the constructor is named recordName ++ "_mk". Fine as a convention; worth a module-level constant so the string appears once (private def recordConstructorSuffix : String := "_mk"). Maintenance risk is low, but if someone ever wants MyRecord.new or similar they'll need to grep for four places. Non-blocking.

  • Integer left-shift constant-folding. The encoding x << n → x * (1 <<< n) at Verify.lean:378–386 is sound over Int, but it silently allows the shift amount to be arbitrarily large (via .natToInt), producing a potentially huge Int.ofNat (1 <<< 200) literal. SMT solvers handle arbitrary-precision integers so this is probably fine in practice, but an upper-bound sanity check (e.g. reject n > 256 or similar) would prevent accidental DoS-style inputs. Low priority. The pattern match .natToInt _ ⟨_, n⟩ is narrow — a user writing x << (5 + 0) would fail with "requires a constant (literal) shift amount", which is correct but the error could mention "literal" means strictly a numeral, not a constant expression. See inline.

  • Dispatch table explosion for unsigned comparisons. Core/DDMTransform/Translate.lean:512–652 adds 20 new lines — 4 operators × 5 widths. Consistent with the existing signed-comparison pattern, which is also 20 lines, and probably fine to keep parallel for readability. A meta-level factoring (for bvN in [1,8,16,32,64]: for op in [ULt, ULe, UGt, UGe]: ...) would reduce 40 total lines to ~8 but at the cost of a more indirect dispatch. Out of scope here; worth a follow-up if the table ever grows further.

  • toCoreTypedUn / toCoreTypedBin wildcard fallthrough. The old code rejected non-int types up-front with a nice error message ("Unsupported typed operator type: {repr ty}"). The new code dispatches to toCoreBvBin / toCoreBvUn on the wildcard, so non-int non-bv types (e.g. .bool, .string, some future .array) now fail inside bvWidth with "Unsupported bit-vector type", which is less informative. One-line fix: explicit | .bv{1,8,16,32,64} _ => toCoreBv… cases and a final _ => throwAt m s!"Unsupported typed operator type: {repr ty}" fallthrough.

Test coverage. Only the benchmark exercises the new features. That's a lot of surface area for one test program:

  • No unit test for @[declareRecord] processing in isolation (e.g., a malformed declareRecord attribute producing a sensible error rather than the panics currently flagged by the lint).
  • No unit test for dot-notation desugaring edge cases (above).
  • No unit test for unsigned-vs-signed comparison distinction (e.g., a bv64 value with MSB set that flips between <s and <u).
  • No unit test for int << n where n is not a literal, where n is 0, where n is large.
  • No unit test for single-constructor-datatype equivalence (a record with named fields should produce a type isomorphic to what the user would have written as a one-constructor datatype manually).

A small StrataTest/Languages/Boole/Records.lean with a handful of #guard_msgs-level tests would lock each of these down independently of the FE51 benchmark.

Proof-coverage suggestions.

Three invariants are worth stating, either as example := on small inputs, as plausible properties, or (for the first two) as theorems in a proof module:

  • Record ↔ single-constructor-datatype equivalence. For every type T := {f1 : A, f2 : B} program P, the result of toCoreDecls is α-equivalent to the result of the hand-written type T := { data T_mk(f1: A, f2: B) } program. Pinning this as a theorem (or at least a programmatic fixture check) prevents the _mk / T..field(val) synthesis from drifting away from "the program you would have written manually" over time.
  • Dot-notation preservation. elabExpr applied to a val.field identifier where val : T and T..field is in scope produces the same Expr tree as elabExpr applied to T..field(val) as two separate tokens. This is exactly what the desugaring is supposed to guarantee; stating it as a theorem (even informally in a comment) prevents regressions in the string-split logic.
  • Integer shift equivalence. ∀ x : Int, ∀ n : Nat, Int.shiftLeft x n = x * Int.ofNat (1 <<< n). This is mathematically trivial but closing the semantic loop between the Boole-level << and the Core-level * encoding is worthwhile because it documents the axiom the encoding relies on.

Inline comments on the two highest-impact items (the lint-blocking panics and the dot-notation desugaring) below.

Comment thread Strata/DDM/AST.lean
Comment on lines +1147 to +1164
| { dialect := _, name := "declareRecord" } => do
let args := attr.args
if args.size < 2 then
newBindingErr "declareRecord expects at least 2 arguments (name, fields)."
return none
let .catbvar nameIndex := args[0]!
| newBindingErr "declareRecord: invalid name index"; return none
let .catbvar fieldsIndex := args[1]!
| newBindingErr "declareRecord: invalid fields index"; return none
let .isTrue nameP := decideProp (nameIndex < argDecls.size)
| return panic! "Invalid name index"
let .isTrue fieldsP := decideProp (fieldsIndex < argDecls.size)
| return panic! "Invalid fields index"
let functionTemplates ← parseFunctionTemplates (args.extract 2 args.size)
some <$> .record <$> pure {
nameIndex := ⟨nameIndex, nameP⟩,
fieldsIndex := ⟨fieldsIndex, fieldsP⟩,
functionTemplates
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI blocker: Run lint checks (stable) fails on the three new panic! uses here and at line 2194. The Strata lint only flags newly added panics without -- nopanic:ok annotations (pre-existing ones in the same file, e.g. addDatatypeBindings at 1134–1138, are grandfathered), so this is strictly a new-code policy issue.

Two ways to fix:

  • Quick fix: append -- nopanic:ok after each panic! line, matching the grandfathered sibling sites. Minimal change; keeps behaviour identical. Suggested if you want to keep the parallel structure with the existing datatype code.
  • Better fix: replace panic! with error returns through the Except String channel parseNewBindings already uses. Line 1148–1155 already handles similar malformed-metadata cases via newBindingErr "..."; return none; the decideProp … | return panic! branches at lines 1156 and 1158 could do the same. addDatatypeBindings's existing panics would still be there but are out of scope.

Same choice applies to line 2194:

| a => panic! s!"Expected ident for record name {repr a}"

which could become an Except String error like the rest of addRecordBindings, or gain a -- nopanic:ok. Note that the symmetric site in addDatatypeBindings (line 2134 area for the datatype name) also panics without annotation; if you take the quick-fix route here, consider tagging those too as a small consistency pass.

Either way, CI stays red until this is resolved.

Comment thread Strata/DDM/Elab/Core.lean
Comment on lines +1696 to +1715
| -- Desugar dot notation: "base.field" → T..field(base)
-- A single dot (not "..") separates the variable from the field name.
let dotParts := name.splitOn ".."
let parts := name.splitOn "."
if parts.length == 2 && dotParts.length == 1 then
let baseName := parts[0]!
let fieldName := parts[1]!
if let some baseBinding := tctx.lookupVar baseName then
let typeExprOpt : Option TypeExpr := match baseBinding with
| .bvar _ (.expr tp) => some tp
| .fvar _ (.expr tp) => some tp
| _ => none
if let some (.fvar _ typeIdx _) := typeExprOpt then
if let some typeName := tctx.globalContext.nameOf? typeIdx then
let selectorName := typeName ++ ".." ++ fieldName
if let some selectorIdx := tctx.globalContext.findIndex? selectorName then
let baseExpr : Expr := match baseBinding with
| .bvar idx _ => .bvar loc idx
| .fvar idx _ => .fvar loc idx
let appExpr : Expr := .app loc (.fvar loc selectorIdx) (.expr baseExpr)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dot-notation desugaring here is the most delicate part of the PR. Three observations:

(a) Chained access is silently unsupported. "a.b.c".splitOn "." returns three parts, so the parts.length == 2 check fails, and a.b.c falls through to logError loc s!"Unknown expr identifier {name}". That's a correctness non-issue (the identifier genuinely wasn't found, and we emit a sane error) but the user doesn't learn why — they see "Unknown expr identifier a.b.c" rather than "chained field access not yet supported". Either:

  • Recursively desugar: a.b.cT..c(T..b(a)). A few extra lines, much nicer UX for struct-of-struct code.
  • Explicit check-and-report: if parts.length > 2 and none of the multi-dot tokens form a valid identifier, emit a targeted "chained field access not yet supported" diagnostic.

(b) String-split identifier parsing is fragile. Today it relies on Boole's grammar parsing val.field as a single atom with an embedded ., and on the convention that multi-dot tokens like T..field don't match the parts.length == 2 && dotParts.length == 1 guard. If the DDM elaborator ever starts producing val.field as two tokens, or the grammar adds another identifier category that uses . for a different purpose, this path silently stops applying. A grammar-level fieldAccess production (the way Core does it) would remove the string-splitting entirely. Follow-up-able, not blocking.

(c) No direct unit tests. The only coverage is struct_field_access.lean end-to-end. Four targeted #guard_msgs cases would lock the behaviour down:

  • Valid single-level val.field on a var-local record, on a parameter, and on a nested var: currently all rely on the benchmark.
  • a.b.c: pins the limitation (expected either to resolve via nested desugar, or to produce a specific error message — pick one and test it).
  • val.field where val is not a record: must keep the pre-existing "Unknown expr identifier" diagnostic intact.
  • T..field typed literally in source: must not trigger the rewrite (already handled by the dotParts.length == 1 guard; a test pins this down so a future simplification doesn't regress).

See also the main review body's suggested proof: a theorem (or informal comment) that elabExpr (val.field) produces the same tree as elabExpr (T..field(val)) would state the desugaring's invariant explicitly.

Comment on lines +378 to +386
| .bvshl m ty a b =>
match ty with
| .int _ => do
-- Integer << n encoded as multiplication by 2^n; n must be a literal.
let n ← match b with
| .natToInt _ ⟨_, n⟩ => pure n
| _ => throwAt m "Integer << requires a constant (literal) shift amount"
toCoreTypedBin m ty "Mul" (← toCoreExpr a) (.intConst () (Int.ofNat (1 <<< n)))
| _ => toCoreBvBin m ty "Shl" (← toCoreExpr a) (← toCoreExpr b)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small things on the integer-left-shift encoding:

(a) Error message could be more specific. "Integer << requires a constant (literal) shift amount" is accurate but a user who writes x << (5 + 0) may reasonably expect the evaluator to fold the RHS — the error doesn't communicate that only a numeric literal is accepted, not a constant-folded expression. Consider "Integer << requires a numeric literal as the shift amount; arbitrary expressions (even constant ones) are not yet supported."

(b) Unbounded shift amount. Int.ofNat (1 <<< n) where n is an arbitrary Nat from source produces arbitrary-precision Int literals. SMT solvers handle arbitrary-precision integers fine, but a pathological input like x << 10000 would generate a ~3000-digit literal in the SMT file. Cheap guard: reject n > SHIFT_MAX (e.g. 256, the widest BV we speak) with a targeted message.

(c) Test coverage is benchmark-only. intShlSeed in struct_field_access.lean covers one valid use (x < (1 << 51)) but nothing exercises the error paths. Two #guard_msgs tests on the "Integer << requires…" branch would pin the error message and catch future regressions silently flipping the fallback path.

@kondylidou kondylidou marked this pull request as draft May 6, 2026 13:30
@kondylidou
Copy link
Copy Markdown
Contributor Author

I'm converting this to draft for now because there is quite some work to be done

@shigoel shigoel added the CSLib PRs and issues marked with this label indicate contributions from/for the CSLib community. label May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Core CSLib PRs and issues marked with this label indicate contributions from/for the CSLib community. Git conflicts

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants