lemmas: booster-only gap fixes — widthOpCode, int/bytes simplifications, preserves-definedness#2849
lemmas: booster-only gap fixes — widthOpCode, int/bytes simplifications, preserves-definedness#2849ehildenb wants to merge 17 commits into
Conversation
…alidJumpDests rules `total` on `#widthOpCode` tells the Haskell backend the function is defined for all inputs, suppressing spurious #Ceil obligations. `preserves-definedness` on all four `#computeValidJumpDests` rules unblocks the booster's equation evaluator: without it the backend aborted with `[failure][break] Uncertain about definedness` on every attempt because `PGM[I]` (byte indexing) is non-total in general. With the annotation the booster correctly iterates the 2001-byte concrete prefix via LLVM and halts at the symbolic `#buf` boundary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-asWord Without this attribute Booster refused to apply the rule because the LHS contains the non-total hook Lbl_[_]_BYTES-HOOKE. After the fix Booster correctly applies it (8× succeeded in the execute-node simplify request). Analysis: scripts/bench-prove.py --analyse-fallbacks shows the "Uncertain about definedness of rule due to: non-total symbol Lbl_[_]_BYTES-HOOKE" error gone, replaced by successful Booster applications. The rule-index gap and indeterminate-match cases for other equations remain and will be addressed in follow-up commits. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bytes-simplification.k: add preserves-definedness to lengthBytes-buf and lengthBytes-range. evm-int-simplification.k: add asWord-buf-inversion-rangeUInt256.
lemmas.k: add widthOpCode-concat-left/right composition rules. evm-int-simplification.k: add asWord-range-buf to match the actual #asWord(#range(#buf(N,X), S, W)) term structure after concat-lookup and lookup-as-asWord reductions.
…Code, smt-lemma bounds Add smtlib(widthOpCode) to the #widthOpCode syntax declaration so it is registered as a declared SMT function. This allows smt-lemma rules to encode bounds on #widthOpCode as universally-quantified axioms in the solver prelude, letting Z3 discharge side conditions that contain #widthOpCode without needing to evaluate it concretely. Add widthOpCode-lb and widthOpCode-ub with [smt-lemma] so the solver knows 0 ≤ #widthOpCode(X) ≤ 33 for all X. Also document the "save output to file before analysing" workflow in CLAUDE.md to avoid re-running slow commands with different pipes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…1000 claim compute-valid-jump-dests-fast (N=5): quick smoke test for iterating on simplification lemmas — completes in under a second. compute-valid-jump-dests-perf (N=1000): performance regression benchmark; run without --haskell-logging for accurate baseline timing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ion rules Fixes `int-simplifications-spec` regression under `--booster-only-simplify`. The four new `minint-lt-maxint-*` / `minint-leq-maxint-*` rules at priority 40 make Booster produce `true` directly for claims like: minInt(A, _B) <Int maxInt(C, _D) requires A <Int C minInt(_A, B) <=Int maxInt(_C, D) requires B <=Int D and their shifted variants (E+A, E+B vs F+C, F+D). Without these rules, the existing priority-50 expansion rules (minint-lt, maxint-gt) expand to a disjunction that Booster cannot collapse to `true` in booster-only mode because the path condition is unavailable at the implies check. The non-linear shift rules (E appears twice in the minInt args) correctly fire at priority 40 but their compound `andBool` side conditions fail to discharge when the path condition variables are in non-linear position. Two additional "factored-form" rules handle the result after minInt-factor-left / maxInt-factor-left (priority 50) have applied: the factored LHS is linear, so the compound condition discharges normally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…k-asWord-nm Parametric masking of #asWord(BA:Bytes) with byte-aligned maxUIntN / notMaxUIntN masks; fires at priority 40 (before multimask-b-and at 50).
…n-left Kore's execute-fallback applies INT-KORE.eq-int-true-rigth which can swap ==Int argument order in path conditions (e.g. lengthBytes(BA) ==Int 32 → 32 ==Int lengthBytes(BA)). This makes Booster's implies check fail due to a syntactic mismatch even when the logical content is identical. The new rule [int-eq-comm-concrete] at priority 30 fires before the +Int cancellation rules and normalises concrete-on-left to concrete-on-right, restoring the form the target spec expects. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ation lemmas Adds to INT-SIMPLIFICATION-COMMON: - A <=Int A => true (reflexivity; fixes chop-overflow-02 under --booster-only-simplify) - A <=Int A +Int B => 0 <=Int B - A +Int ((B -Int A) +Int C) => B +Int C (4-term cancellation; fixes cancellation-02) - A <Int B andBool B <Int A => false (strict order contradiction; fixes b2w-03) Adds to INT-SIMPLIFICATION-HASKELL: - A <Int B andBool C <=Int A => false requires B <=Int C [concrete(B,C)] (range contradiction) - C <=Int A andBool A <Int B => false requires B <=Int C [concrete(B,C)] (range contradiction) (requires is purely concrete, evaluated by LLVM hook — no circular self-application risk) Adds to LEMMAS-HASKELL: - B orBool notBool B => true (and commuted variant; fixes b2w-05, chop-additional-knowledge) - B andBool notBool B => false (and commuted variant) Validation: full functional test suite passes in normal mode (slot-updates-spec.k failure is pre-existing, unrelated to these changes). Under --booster-only-simplify, lemmas-spec.k drops from 22 to 18 failing claims. No loops detected (checked via kevm-analyse hot-rules). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…iling claims Add a higher-priority (40) variant of `A <=Int A+B => 0 <=Int B` that short-circuits directly to `true` when `0 <=Int B` is already in the path condition, fixing `chop-overflow-01` in --booster-only-simplify mode where the intermediate `0 <=Int Y` form cannot be discharged at the implies check. Add `map-update-comm` to `LEMMAS-HASKELL`: normalises `M[I1<-V1][I2<-V2]` to `M[I2<-V2][I1<-V1]` when `I2 <Int I1`, fixing the map-update-commutativity claim (`012353799830...`). Loop-safe: after the swap the outer key is larger than the inner key, so the `I2 <Int I1` condition is false on the result. Validated: two booster-only claims fixed, zero new failures (306/322 now pass vs 304/322 before). No regressions in normal mode. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In K, .Bytes (the constructor) and b"" (LLVM-evaluated concrete empty bytes,
a Kore domain value \dv{SortBytes}("")) are different Kore terms. The existing
bytes-concat-empty-right/left rules use .Bytes and do not fire when LLVM hooks
(e.g. #encodeArgs base case) return b"".
New rules:
[bytes-concat-empty-right-concrete]: B +Bytes b"" => B
[bytes-concat-empty-left-concrete]: b"" +Bytes B => B
Fixes booster-only failure for encodepacked-keccak01: the contract evaluates
keccak(1 : #encodeArgs(#uint256(A0))) where the #encodeArgs base case produces
b"", leaving keccak(X +Bytes b"") vs keccak(X) as the implies-check mismatch.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… functional spec Adds a runLemma/doneLemma spec that tests Set deduplication lemmas (SetItem membership and |Set deduplication) in isolation. Registers the spec in the KOMPILE_MAIN_FILE table so the pytest harness picks it up automatically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ead booster rules
| rule [lengthBytes-buf]: lengthBytes(#buf(S, _)) => S requires 0 <=Int S [simplification, preserves-definedness] | ||
| rule [lengthBytes-range]: lengthBytes(#range(_, S:Int, W:Int)) => maxInt(0, W) requires 0 <=Int S [simplification, preserves-definedness] |
There was a problem hiding this comment.
I'm not sure if these rules actually do preserve definedness.
There was a problem hiding this comment.
#buf and #range are total so I don't see a problem with these.
| // General reduction: #asWord of a range extracted from a buf. | ||
| // #buf(N, X) is the big-endian N-byte encoding of X. Taking W bytes | ||
| // starting at byte S gives (X >> (8*(N-S-W))) & (2^(8*W)-1). | ||
| // Activated with concrete N, S, W so Booster can evaluate the arithmetic. | ||
| rule [asWord-range-buf]: | ||
| #asWord ( #range ( #buf ( N:Int, X:Int ), S:Int, W:Int ) ) => | ||
| X /Int ( 2 ^Int ( 8 *Int ( N -Int S -Int W ) ) ) modInt ( 2 ^Int ( 8 *Int W ) ) | ||
| requires 0 <=Int S andBool 0 <Int W andBool S +Int W <=Int N | ||
| andBool 0 <=Int X andBool X <Int 2 ^Int ( 8 *Int N ) | ||
| [simplification, concrete(N, S, W), preserves-definedness] |
There was a problem hiding this comment.
This lemma seems unnecessarily complicated, we should track down which test triggered adding it, and see if we can find a smaller/simpler version that works.
…refs Remove '## Running long commands' (belongs in the backend-docs PR). Fix two bullet points in '## Diagnosing slow Haskell backend' that referenced the now-deleted bench-prove.py script. Keep '## Testing a local Haskell backend build' and its kdist cache rebuild instruction. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| // minInt <Int maxInt / minInt <=Int maxInt: priority 40 fires before the default-priority (50) expansion rules, | ||
| // so Booster sees `true` directly rather than a disjunction requiring path-condition reasoning. | ||
| rule [minint-lt-maxint-a]: | ||
| minInt(A:Int, _B:Int) <Int maxInt(C:Int, _D:Int) => true | ||
| requires A <Int C | ||
| [simplification(40), preserves-definedness] | ||
| rule [minint-lt-maxint-shift-a]: | ||
| minInt(E:Int +Int A:Int, E:Int +Int _B:Int) <Int maxInt(F:Int +Int C:Int, F:Int +Int _D:Int) => true | ||
| requires A <Int C andBool E <=Int F | ||
| [simplification(40), preserves-definedness] | ||
| rule [minint-leq-maxint-b]: | ||
| minInt(_A:Int, B:Int) <=Int maxInt(_C:Int, D:Int) => true | ||
| requires B <=Int D | ||
| [simplification(40), preserves-definedness] | ||
| rule [minint-leq-maxint-shift-b]: | ||
| minInt(_A:Int +Int E:Int, B:Int +Int E:Int) <=Int maxInt(_C:Int +Int F:Int, D:Int +Int F:Int) => true | ||
| requires B <=Int D andBool E <=Int F | ||
| [simplification(40), preserves-definedness] | ||
| // Factored forms: after minInt-factor-{left,right} + maxInt-factor-{left,right} (priority 50) have fired, | ||
| // the non-linear shift rules above no longer match; these linear rules catch the result. | ||
| rule [minint-lt-maxint-shift-a-factored]: | ||
| E:Int +Int minInt(A:Int, _B:Int) <Int F:Int +Int maxInt(C:Int, _D:Int) => true | ||
| requires A <Int C andBool E <=Int F | ||
| [simplification(40), preserves-definedness] | ||
| rule [minint-leq-maxint-shift-b-factored]: | ||
| minInt(_A:Int, B:Int) +Int E:Int <=Int maxInt(_C:Int, D:Int) +Int F:Int => true | ||
| requires B <=Int D andBool E <=Int F | ||
| [simplification(40), preserves-definedness] | ||
|
|
There was a problem hiding this comment.
Some of these seem like they can be simplified with some other reasoning steps in the way. For eaxmple some of the rules could be combined with a disjunction in the requires clause, and others can first do arithmetic simplification on the expression then rely on the other rules here to fire afterwards.
| // NOTE: the two non-linear rules below fire in kore but are INDETERMINATE in booster. | ||
| // Booster internalises SetItem(X) as KSet{def,[X],None}; matchSets() returns addIndeterminate | ||
| // for any non-empty KSet (Match.hs:673-676). Backend change required to fix. | ||
| rule ( S:Set |Set SetItem ( X ) ) |Set SetItem( X ) => ( S:Set |Set SetItem ( X ) ) [simplification] | ||
| rule ( ( S:Set |Set SetItem ( X ) ) |Set SetItem ( Y ) ) |Set SetItem( X ) => ( ( S:Set |Set SetItem ( X ) ) |Set SetItem ( Y ) ) [simplification] | ||
|
|
||
| // Membership in the |Set (union) form — fires in kore only (same matchSets limitation). | ||
| // Kore uses these to discharge the requires-clause of the dedup rule below. | ||
| rule X:Int in (REST:Set |Set SetItem(Y:Int)) => true requires X ==Int Y [simplification] | ||
| rule X:Int in (REST:Set |Set SetItem(Y:Int)) => X in REST requires notBool ( X ==Int Y ) [simplification] | ||
|
|
||
| // Deduplication of repeated concrete Int elements in a symbolic |Set expression. | ||
| // Required in normal mode (where kore's simplify step fires it) but INDETERMINATE in booster. | ||
| // Booster cannot match SetItem(X) in the pattern: matchSets() is unimplemented for non-empty | ||
| // KSets (booster/Match.hs:673-676). Cannot be fixed from K source; requires backend change. | ||
| // Context: #touchAccounts in evm.md adds addresses via |Set; when the same address is added | ||
| // twice to a symbolic touchedAccounts, this rule is needed to normalize back to a 2-element form. | ||
| rule S:Set |Set SetItem(X:Int) => S requires X in S [simplification] | ||
|
|
There was a problem hiding this comment.
@jberthold how much work would it be to support these types of simplification rules in the booster backend directly? We don't need to support sets in general, but just these expressions.
The attached set-dedup-spec.k file should capture claims that need to pass with the new rules.
There was a problem hiding this comment.
Related code is here: https://github.com/runtimeverification/haskell-backend/blob/00479fbdb86b70fc602aefca9fabab5afc8bad5f/booster/library/Booster/Pattern/Match.hs#L659-L676
At the moment, only empty sets are matched. It looks like we could consider matching singleton sets like the ones above, but there is the danger of discarding a match prematurely. For the above rules, it should be OK because there is always an explicit |Set operation separating the sets and they are singletons where it matters... 🤔
| rule [bool-or-not-self]: B:Bool orBool notBool B:Bool => true [simplification] | ||
| rule [bool-not-or-self]: notBool B:Bool orBool B:Bool => true [simplification] | ||
| rule [bool-and-not-self]: B:Bool andBool notBool B:Bool => false [simplification] | ||
| rule [bool-not-and-self]: notBool B:Bool andBool B:Bool => false [simplification] |
There was a problem hiding this comment.
Formatting needs fixed.
| rule #computeValidJumpDests( _, I, RESULT, LEN) => RESULT requires I >=Int LEN | ||
| rule #computeValidJumpDests(PGM, I, RESULT, LEN) => #computeValidJumpDests(PGM, I +Int 1, RESULT[I <- 1], LEN) requires I <Int LEN andBool PGM [ I ] ==Int 91 | ||
| rule #computeValidJumpDests(PGM, I, RESULT, LEN) => #computeValidJumpDests(PGM, I +Int #widthOpCode(PGM [ I ]), RESULT, LEN) requires I <Int LEN andBool notBool PGM [ I ] ==Int 91 | ||
| rule #computeValidJumpDests(PGM) => #computeValidJumpDests(PGM, 0, padRightBytes(.Bytes, lengthBytes(PGM), 0), lengthBytes(PGM)) [preserves-definedness] |
There was a problem hiding this comment.
This function is already marked as total.
There was a problem hiding this comment.
I think this rule should still be marked preserves-definedness ... the RHS #computeValidJumpDests(_, _, _, _) is not total but should be defined for these arguments.
| rule #computeValidJumpDests( _, I, RESULT, LEN) => RESULT requires I >=Int LEN [preserves-definedness] | ||
| rule #computeValidJumpDests(PGM, I, RESULT, LEN) => #computeValidJumpDests(PGM, I +Int 1, RESULT[I <- 1], LEN) requires I <Int LEN andBool PGM [ I ] ==Int 91 [preserves-definedness] | ||
| rule #computeValidJumpDests(PGM, I, RESULT, LEN) => #computeValidJumpDests(PGM, I +Int #widthOpCode(PGM [ I ]), RESULT, LEN) requires I <Int LEN andBool notBool PGM [ I ] ==Int 91 [preserves-definedness] |
There was a problem hiding this comment.
Is it true that these are preserving definedness?
There was a problem hiding this comment.
I guess not, but it is pragmatically the only choice to avoid having lengthBytes(PGM) again here.
This is only OK if the length of PGM and RESULT is > I but at the call sites they are the same length LEN.
| syntax Int ::= #widthOpCode(Int) [symbol(#widthOpCode), function, total, smtlib(widthOpCode)] | ||
| // ----------------------------------------------------------------- |
…flow In '## Testing a local Haskell backend build': add the pyproject.toml snippet for pointing at a local pyk checkout, and the uv sync / revert steps that go with it. Add '## Updating local pyk vs local K': explains that a pyk-only change only needs uv sync --reinstall-package kframework (no kdist rebuild), while a K/Haskell backend change requires the full kup + kevm-kdist flow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The booster backend (booster-only-simplify mode) fails to discharge several side conditions and lemmas that kore handles correctly, because booster's equation engine has stricter matching requirements. This PR closes the gaps found while running the functional test-suite in booster-only mode. Note that I collected the information needed (or Claude did, more precisely) using functionality that's not merged yet (runtimeverification/k#4919), but that the improvements here are independent of that.
Changes:
CLAUDE.mdwith some basic information about how to debug stuck proofs, mostly focused on getting logs out of the haskell backend for analysis.evm.md: addtotalto#widthOpCodedeclaration; addpreserves-definednessto all four#computeValidJumpDestsrules so booster's definedness checker passeslemmas.k: addsmtlib(widthOpCode)annotation +smt-lemmabounds for#widthOpCode(0 ≤ x ≤ 33); addwidthOpCode-concat-{left,right}rules for booster's single-step equation engine; addBooltautology rules; addKSetmembership/dedup rules with comments explaining which are permanently indeterminate in booster (backend limitation, not fixable from K source)int-simplification.k: addminInt/maxIntcomparison rules, commutativity normalisation (==Intconcrete-on-left), arithmetic andBoolsimplificationsevm-int-simplification.k: addbit-mask-asWord-mandbit-notmask-asWord-nmbytes-simplification.k: addpreserves-definednesstolookup-as-asWord; add concreteb""concat identity rulescompute-valid-jump-dests-spec.k: add a fastN=5functional claim alongside the existingN=1000performance claim for quicker CI feedbackset-dedup-spec.k: new functional spec exercising theKSetmembership and dedup rules under kore. This one is still not passing with--booster-only-simplify.Validation:
--booster-only-simplifyflag vs kore baseline). The two remaining failures (lemmas-specwith many claims,merkle-spec) are engine-level limitations unrelated to these lemmas.--use-booster-dev(pure booster, no kore proxy) the same proof drops from ~200 s to 0.68s. The kore fallback is the bottleneck, not the semantics.minInt/maxIntand commutativity normalisation rules.widthOpCodesmt-lemma bounds let Z3 discharge #widthOpCode-containing side conditions without concrete evaluation, which was previously causing fallbacks.