Skip to content

lemmas: booster-only gap fixes — widthOpCode, int/bytes simplifications, preserves-definedness#2849

Draft
ehildenb wants to merge 17 commits into
masterfrom
kevm-lemmas-booster
Draft

lemmas: booster-only gap fixes — widthOpCode, int/bytes simplifications, preserves-definedness#2849
ehildenb wants to merge 17 commits into
masterfrom
kevm-lemmas-booster

Conversation

@ehildenb
Copy link
Copy Markdown
Member

@ehildenb ehildenb commented May 25, 2026

The booster backend (booster-only-simplify mode) fails to discharge several side conditions and lemmas that kore handles correctly, because booster's equation engine has stricter matching requirements. This PR closes the gaps found while running the functional test-suite in booster-only mode. Note that I collected the information needed (or Claude did, more precisely) using functionality that's not merged yet (runtimeverification/k#4919), but that the improvements here are independent of that.

Changes:

  • Updates CLAUDE.md with some basic information about how to debug stuck proofs, mostly focused on getting logs out of the haskell backend for analysis.
  • evm.md: add total to #widthOpCode declaration; add preserves-definedness to all four #computeValidJumpDests rules so booster's definedness checker passes
  • lemmas.k: add smtlib(widthOpCode) annotation + smt-lemma bounds for #widthOpCode (0 ≤ x ≤ 33); add widthOpCode-concat-{left,right} rules for booster's single-step equation engine; add Bool tautology rules; add KSet membership/dedup rules with comments explaining which are permanently indeterminate in booster (backend limitation, not fixable from K source)
  • int-simplification.k: add minInt/maxInt comparison rules, commutativity normalisation (==Int concrete-on-left), arithmetic and Bool simplifications
  • evm-int-simplification.k: add bit-mask-asWord-m and bit-notmask-asWord-nm
  • bytes-simplification.k: add preserves-definedness to lookup-as-asWord; add concrete b"" concat identity rules
  • compute-valid-jump-dests-spec.k: add a fast N=5 functional claim alongside the existing N=1000 performance claim for quicker CI feedback
  • set-dedup-spec.k: new functional spec exercising the KSet membership and dedup rules under kore. This one is still not passing with --booster-only-simplify.

Validation:

  • Booster-only functional suite: 8/10 specs pass after these fixes (avg ~2× wall-clock speedup with --booster-only-simplify flag vs kore baseline). The two remaining failures (lemmas-spec with many claims, merkle-spec) are engine-level limitations unrelated to these lemmas.
  • compute-valid-jump-dests specifically: 1.23× wall-clock speedup in booster-only mode on the N=1000 symbolic claim; with --use-booster-dev (pure booster, no kore proxy) the same proof drops from ~200 s to 0.68s. The kore fallback is the bottleneck, not the semantics.
  • int-simplifications-spec improves 3.48× in booster-only mode after the minInt/maxInt and commutativity normalisation rules.
  • widthOpCode smt-lemma bounds let Z3 discharge #widthOpCode-containing side conditions without concrete evaluation, which was previously causing fallbacks.
  • set-dedup-spec is intentionally kore-only: it documents the booster matchSets() limitation (booster cannot match non-empty KSet patterns — Match.hs:673-676) so the permanently-dead rules in lemmas.k have a test-level explanation.

ehildenb and others added 15 commits May 25, 2026 18:28
…alidJumpDests rules

`total` on `#widthOpCode` tells the Haskell backend the function is defined
for all inputs, suppressing spurious #Ceil obligations.

`preserves-definedness` on all four `#computeValidJumpDests` rules unblocks
the booster's equation evaluator: without it the backend aborted with
`[failure][break] Uncertain about definedness` on every attempt because
`PGM[I]` (byte indexing) is non-total in general.  With the annotation the
booster correctly iterates the 2001-byte concrete prefix via LLVM and halts
at the symbolic `#buf` boundary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-asWord

Without this attribute Booster refused to apply the rule because the LHS
contains the non-total hook Lbl_[_]_BYTES-HOOKE.  After the fix Booster
correctly applies it (8× succeeded in the execute-node simplify request).

Analysis: scripts/bench-prove.py --analyse-fallbacks shows the "Uncertain
about definedness of rule due to: non-total symbol Lbl_[_]_BYTES-HOOKE"
error gone, replaced by successful Booster applications.  The rule-index
gap and indeterminate-match cases for other equations remain and will be
addressed in follow-up commits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bytes-simplification.k: add preserves-definedness to lengthBytes-buf
and lengthBytes-range.

evm-int-simplification.k: add asWord-buf-inversion-rangeUInt256.
lemmas.k: add widthOpCode-concat-left/right composition rules.
evm-int-simplification.k: add asWord-range-buf to match the actual
#asWord(#range(#buf(N,X), S, W)) term structure after concat-lookup
and lookup-as-asWord reductions.
…Code, smt-lemma bounds

Add smtlib(widthOpCode) to the #widthOpCode syntax declaration so it is
registered as a declared SMT function.  This allows smt-lemma rules to encode
bounds on #widthOpCode as universally-quantified axioms in the solver prelude,
letting Z3 discharge side conditions that contain #widthOpCode without needing
to evaluate it concretely.

Add widthOpCode-lb and widthOpCode-ub with [smt-lemma] so the solver knows
0 ≤ #widthOpCode(X) ≤ 33 for all X.

Also document the "save output to file before analysing" workflow in CLAUDE.md
to avoid re-running slow commands with different pipes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…1000 claim

compute-valid-jump-dests-fast (N=5): quick smoke test for iterating on
simplification lemmas — completes in under a second.

compute-valid-jump-dests-perf (N=1000): performance regression benchmark;
run without --haskell-logging for accurate baseline timing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ion rules

Fixes `int-simplifications-spec` regression under `--booster-only-simplify`.

The four new `minint-lt-maxint-*` / `minint-leq-maxint-*` rules at priority 40
make Booster produce `true` directly for claims like:

  minInt(A, _B) <Int maxInt(C, _D)           requires A <Int C
  minInt(_A, B) <=Int maxInt(_C, D)           requires B <=Int D

and their shifted variants (E+A, E+B vs F+C, F+D).

Without these rules, the existing priority-50 expansion rules
(minint-lt, maxint-gt) expand to a disjunction that Booster cannot
collapse to `true` in booster-only mode because the path condition is
unavailable at the implies check.

The non-linear shift rules (E appears twice in the minInt args) correctly
fire at priority 40 but their compound `andBool` side conditions fail to
discharge when the path condition variables are in non-linear position.
Two additional "factored-form" rules handle the result after
minInt-factor-left / maxInt-factor-left (priority 50) have applied: the
factored LHS is linear, so the compound condition discharges normally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…k-asWord-nm

Parametric masking of #asWord(BA:Bytes) with byte-aligned maxUIntN /
notMaxUIntN masks; fires at priority 40 (before multimask-b-and at 50).
…n-left

Kore's execute-fallback applies INT-KORE.eq-int-true-rigth which can swap
==Int argument order in path conditions (e.g. lengthBytes(BA) ==Int 32 →
32 ==Int lengthBytes(BA)).  This makes Booster's implies check fail due to a
syntactic mismatch even when the logical content is identical.

The new rule [int-eq-comm-concrete] at priority 30 fires before the +Int
cancellation rules and normalises concrete-on-left to concrete-on-right,
restoring the form the target spec expects.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ation lemmas

Adds to INT-SIMPLIFICATION-COMMON:
- A <=Int A => true  (reflexivity; fixes chop-overflow-02 under --booster-only-simplify)
- A <=Int A +Int B => 0 <=Int B
- A +Int ((B -Int A) +Int C) => B +Int C  (4-term cancellation; fixes cancellation-02)
- A <Int B andBool B <Int A => false  (strict order contradiction; fixes b2w-03)

Adds to INT-SIMPLIFICATION-HASKELL:
- A <Int B andBool C <=Int A => false requires B <=Int C [concrete(B,C)]  (range contradiction)
- C <=Int A andBool A <Int B => false requires B <=Int C [concrete(B,C)]  (range contradiction)
  (requires is purely concrete, evaluated by LLVM hook — no circular self-application risk)

Adds to LEMMAS-HASKELL:
- B orBool notBool B => true  (and commuted variant; fixes b2w-05, chop-additional-knowledge)
- B andBool notBool B => false  (and commuted variant)

Validation: full functional test suite passes in normal mode (slot-updates-spec.k failure
is pre-existing, unrelated to these changes). Under --booster-only-simplify, lemmas-spec.k
drops from 22 to 18 failing claims. No loops detected (checked via kevm-analyse hot-rules).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…iling claims

Add a higher-priority (40) variant of `A <=Int A+B => 0 <=Int B` that
short-circuits directly to `true` when `0 <=Int B` is already in the path
condition, fixing `chop-overflow-01` in --booster-only-simplify mode where the
intermediate `0 <=Int Y` form cannot be discharged at the implies check.

Add `map-update-comm` to `LEMMAS-HASKELL`: normalises `M[I1<-V1][I2<-V2]` to
`M[I2<-V2][I1<-V1]` when `I2 <Int I1`, fixing the map-update-commutativity
claim (`012353799830...`).  Loop-safe: after the swap the outer key is larger
than the inner key, so the `I2 <Int I1` condition is false on the result.

Validated: two booster-only claims fixed, zero new failures (306/322 now pass
vs 304/322 before).  No regressions in normal mode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In K, .Bytes (the constructor) and b"" (LLVM-evaluated concrete empty bytes,
a Kore domain value \dv{SortBytes}("")) are different Kore terms.  The existing
bytes-concat-empty-right/left rules use .Bytes and do not fire when LLVM hooks
(e.g. #encodeArgs base case) return b"".

New rules:
  [bytes-concat-empty-right-concrete]: B +Bytes b"" => B
  [bytes-concat-empty-left-concrete]:  b"" +Bytes B => B

Fixes booster-only failure for encodepacked-keccak01: the contract evaluates
keccak(1 : #encodeArgs(#uint256(A0))) where the #encodeArgs base case produces
b"", leaving keccak(X +Bytes b"") vs keccak(X) as the implies-check mismatch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… functional spec

Adds a runLemma/doneLemma spec that tests Set deduplication lemmas (SetItem
membership and |Set deduplication) in isolation. Registers the spec in the
KOMPILE_MAIN_FILE table so the pytest harness picks it up automatically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment on lines +274 to +275
rule [lengthBytes-buf]: lengthBytes(#buf(S, _)) => S requires 0 <=Int S [simplification, preserves-definedness]
rule [lengthBytes-range]: lengthBytes(#range(_, S:Int, W:Int)) => maxInt(0, W) requires 0 <=Int S [simplification, preserves-definedness]
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if these rules actually do preserve definedness.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#buf and #range are total so I don't see a problem with these.

Comment on lines +112 to +121
// General reduction: #asWord of a range extracted from a buf.
// #buf(N, X) is the big-endian N-byte encoding of X. Taking W bytes
// starting at byte S gives (X >> (8*(N-S-W))) & (2^(8*W)-1).
// Activated with concrete N, S, W so Booster can evaluate the arithmetic.
rule [asWord-range-buf]:
#asWord ( #range ( #buf ( N:Int, X:Int ), S:Int, W:Int ) ) =>
X /Int ( 2 ^Int ( 8 *Int ( N -Int S -Int W ) ) ) modInt ( 2 ^Int ( 8 *Int W ) )
requires 0 <=Int S andBool 0 <Int W andBool S +Int W <=Int N
andBool 0 <=Int X andBool X <Int 2 ^Int ( 8 *Int N )
[simplification, concrete(N, S, W), preserves-definedness]
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lemma seems unnecessarily complicated, we should track down which test triggered adding it, and see if we can find a smaller/simpler version that works.

…refs

Remove '## Running long commands' (belongs in the backend-docs PR). Fix
two bullet points in '## Diagnosing slow Haskell backend' that referenced
the now-deleted bench-prove.py script. Keep '## Testing a local Haskell
backend build' and its kdist cache rebuild instruction.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment on lines +245 to +273
// minInt <Int maxInt / minInt <=Int maxInt: priority 40 fires before the default-priority (50) expansion rules,
// so Booster sees `true` directly rather than a disjunction requiring path-condition reasoning.
rule [minint-lt-maxint-a]:
minInt(A:Int, _B:Int) <Int maxInt(C:Int, _D:Int) => true
requires A <Int C
[simplification(40), preserves-definedness]
rule [minint-lt-maxint-shift-a]:
minInt(E:Int +Int A:Int, E:Int +Int _B:Int) <Int maxInt(F:Int +Int C:Int, F:Int +Int _D:Int) => true
requires A <Int C andBool E <=Int F
[simplification(40), preserves-definedness]
rule [minint-leq-maxint-b]:
minInt(_A:Int, B:Int) <=Int maxInt(_C:Int, D:Int) => true
requires B <=Int D
[simplification(40), preserves-definedness]
rule [minint-leq-maxint-shift-b]:
minInt(_A:Int +Int E:Int, B:Int +Int E:Int) <=Int maxInt(_C:Int +Int F:Int, D:Int +Int F:Int) => true
requires B <=Int D andBool E <=Int F
[simplification(40), preserves-definedness]
// Factored forms: after minInt-factor-{left,right} + maxInt-factor-{left,right} (priority 50) have fired,
// the non-linear shift rules above no longer match; these linear rules catch the result.
rule [minint-lt-maxint-shift-a-factored]:
E:Int +Int minInt(A:Int, _B:Int) <Int F:Int +Int maxInt(C:Int, _D:Int) => true
requires A <Int C andBool E <=Int F
[simplification(40), preserves-definedness]
rule [minint-leq-maxint-shift-b-factored]:
minInt(_A:Int, B:Int) +Int E:Int <=Int maxInt(_C:Int, D:Int) +Int F:Int => true
requires B <=Int D andBool E <=Int F
[simplification(40), preserves-definedness]

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these seem like they can be simplified with some other reasoning steps in the way. For eaxmple some of the rules could be combined with a disjunction in the requires clause, and others can first do arithmetic simplification on the expression then rely on the other rules here to fire afterwards.

Comment on lines +43 to +61
// NOTE: the two non-linear rules below fire in kore but are INDETERMINATE in booster.
// Booster internalises SetItem(X) as KSet{def,[X],None}; matchSets() returns addIndeterminate
// for any non-empty KSet (Match.hs:673-676). Backend change required to fix.
rule ( S:Set |Set SetItem ( X ) ) |Set SetItem( X ) => ( S:Set |Set SetItem ( X ) ) [simplification]
rule ( ( S:Set |Set SetItem ( X ) ) |Set SetItem ( Y ) ) |Set SetItem( X ) => ( ( S:Set |Set SetItem ( X ) ) |Set SetItem ( Y ) ) [simplification]

// Membership in the |Set (union) form — fires in kore only (same matchSets limitation).
// Kore uses these to discharge the requires-clause of the dedup rule below.
rule X:Int in (REST:Set |Set SetItem(Y:Int)) => true requires X ==Int Y [simplification]
rule X:Int in (REST:Set |Set SetItem(Y:Int)) => X in REST requires notBool ( X ==Int Y ) [simplification]

// Deduplication of repeated concrete Int elements in a symbolic |Set expression.
// Required in normal mode (where kore's simplify step fires it) but INDETERMINATE in booster.
// Booster cannot match SetItem(X) in the pattern: matchSets() is unimplemented for non-empty
// KSets (booster/Match.hs:673-676). Cannot be fixed from K source; requires backend change.
// Context: #touchAccounts in evm.md adds addresses via |Set; when the same address is added
// twice to a symbolic touchedAccounts, this rule is needed to normalize back to a 2-element form.
rule S:Set |Set SetItem(X:Int) => S requires X in S [simplification]

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jberthold how much work would it be to support these types of simplification rules in the booster backend directly? We don't need to support sets in general, but just these expressions.

The attached set-dedup-spec.k file should capture claims that need to pass with the new rules.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related code is here: https://github.com/runtimeverification/haskell-backend/blob/00479fbdb86b70fc602aefca9fabab5afc8bad5f/booster/library/Booster/Pattern/Match.hs#L659-L676
At the moment, only empty sets are matched. It looks like we could consider matching singleton sets like the ones above, but there is the danger of discarding a match prematurely. For the above rules, it should be OK because there is always an explicit |Set operation separating the sets and they are singletons where it matters... 🤔

Comment on lines +305 to +308
rule [bool-or-not-self]: B:Bool orBool notBool B:Bool => true [simplification]
rule [bool-not-or-self]: notBool B:Bool orBool B:Bool => true [simplification]
rule [bool-and-not-self]: B:Bool andBool notBool B:Bool => false [simplification]
rule [bool-not-and-self]: notBool B:Bool andBool B:Bool => false [simplification]
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting needs fixed.

rule #computeValidJumpDests( _, I, RESULT, LEN) => RESULT requires I >=Int LEN
rule #computeValidJumpDests(PGM, I, RESULT, LEN) => #computeValidJumpDests(PGM, I +Int 1, RESULT[I <- 1], LEN) requires I <Int LEN andBool PGM [ I ] ==Int 91
rule #computeValidJumpDests(PGM, I, RESULT, LEN) => #computeValidJumpDests(PGM, I +Int #widthOpCode(PGM [ I ]), RESULT, LEN) requires I <Int LEN andBool notBool PGM [ I ] ==Int 91
rule #computeValidJumpDests(PGM) => #computeValidJumpDests(PGM, 0, padRightBytes(.Bytes, lengthBytes(PGM), 0), lengthBytes(PGM)) [preserves-definedness]
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is already marked as total.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this rule should still be marked preserves-definedness ... the RHS #computeValidJumpDests(_, _, _, _) is not total but should be defined for these arguments.

Comment on lines +1754 to +1756
rule #computeValidJumpDests( _, I, RESULT, LEN) => RESULT requires I >=Int LEN [preserves-definedness]
rule #computeValidJumpDests(PGM, I, RESULT, LEN) => #computeValidJumpDests(PGM, I +Int 1, RESULT[I <- 1], LEN) requires I <Int LEN andBool PGM [ I ] ==Int 91 [preserves-definedness]
rule #computeValidJumpDests(PGM, I, RESULT, LEN) => #computeValidJumpDests(PGM, I +Int #widthOpCode(PGM [ I ]), RESULT, LEN) requires I <Int LEN andBool notBool PGM [ I ] ==Int 91 [preserves-definedness]
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that these are preserving definedness?

Copy link
Copy Markdown
Collaborator

@jberthold jberthold May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not, but it is pragmatically the only choice to avoid having lengthBytes(PGM) again here.
This is only OK if the length of PGM and RESULT is > I but at the call sites they are the same length LEN.

Comment on lines +1815 to 1816
syntax Int ::= #widthOpCode(Int) [symbol(#widthOpCode), function, total, smtlib(widthOpCode)]
// -----------------------------------------------------------------
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting

…flow

In '## Testing a local Haskell backend build': add the pyproject.toml
snippet for pointing at a local pyk checkout, and the uv sync / revert
steps that go with it.

Add '## Updating local pyk vs local K': explains that a pyk-only change
only needs uv sync --reinstall-package kframework (no kdist rebuild),
while a K/Haskell backend change requires the full kup + kevm-kdist flow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants