perf(compiler): statically resolve constant-amount EVM shift guards#535
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Optimizes EVM shift lowering in the multipass JIT by statically eliminating redundant >= 256 guards for constant shift amounts and pruning dead source-limb contributions based on value range, with new differential fixtures to ensure interpreter/multipass agree.
Changes:
- Statically fold SHL/SHR_U with constant shift amounts
>= 256to zero and omit theIsLargeShiftper-limb Select chain when constant< 256. - Add range-aware pruning (U64/U128) for const-amount SHL/SHR_U to skip dead shifted/carry terms from provably-zero source limbs.
- Add differential EVM asm fixtures and a new gtest suite covering the optimized paths and edge cases.
Reviewed changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/evm_asm/shr_const8_u64val.expected | Adds expected output for SHR const path with U64-masked input. |
| tests/evm_asm/shr_const8_u64val.easm | Adds fixture exercising SHR const path limb pruning from U64 range. |
| tests/evm_asm/shr_const72_dyn.expected | Adds expected output for cross-limb SHR by 72. |
| tests/evm_asm/shr_const72_dyn.easm | Adds fixture for SHR const cross-limb behavior (CompShift=1, ShiftMod=8). |
| tests/evm_asm/shr_const4_dyn.expected | Adds expected output for SHR by 4. |
| tests/evm_asm/shr_const4_dyn.easm | Adds SHR-by-4 fixture for const-amount path. |
| tests/evm_asm/shr_const256_dyn.expected | Adds expected output for SHR by 256 folding to zero. |
| tests/evm_asm/shr_const256_dyn.easm | Adds fixture for large constant SHR amount (>=256). |
| tests/evm_asm/shl_dyn_amount.expected | Adds expected output for dynamic shift amount path regression coverage. |
| tests/evm_asm/shl_dyn_amount.easm | Adds fixture to force dynamic shift amount lowering (memory-laundered amount). |
| tests/evm_asm/shl_const_highlimb_dyn.expected | Adds expected output for “high limb set” shift amount folding to zero. |
| tests/evm_asm/shl_const_highlimb_dyn.easm | Adds fixture for 2^64 “trap” constant shift amount (upper limb set). |
| tests/evm_asm/shl_const96_dyn.expected | Adds expected output for SHL cross-limb carry behavior. |
| tests/evm_asm/shl_const96_dyn.easm | Adds fixture exercising SHL const cross-limb carry terms (<<96). |
| tests/evm_asm/shl_const4_dyn.expected | Adds expected output for SHL by 4. |
| tests/evm_asm/shl_const4_dyn.easm | Adds SHL-by-4 fixture for const-amount path. |
| tests/evm_asm/shl_const256_dyn.expected | Adds expected output for SHL by 256 folding to zero. |
| tests/evm_asm/shl_const256_dyn.easm | Adds fixture for large constant SHL amount (>=256). |
| tests/evm_asm/shl_const200_u64val.expected | Adds expected output for U64-masked value shifted left by 200. |
| tests/evm_asm/shl_const200_u64val.easm | Adds fixture to verify SHL const path pruning dead source limbs (U64). |
| tests/evm_asm/sar_const8_neg.expected | Adds expected output for SAR negative value sign-fill preservation. |
| tests/evm_asm/sar_const8_neg.easm | Adds fixture covering SAR const path sign-fill behavior. |
| tests/evm_asm/sar_const64_pos.expected | Adds expected output for SAR positive value (CompShift=1, ShiftMod=0). |
| tests/evm_asm/sar_const64_pos.easm | Adds fixture for SAR const shift amount correctness on positive values. |
| src/tests/evm_interp_tests.cpp | Adds a parameterized differential test suite comparing interp vs multipass & JIT compilation. |
| src/compiler/evm_frontend/evm_mir_compiler.h | Implements static large-shift folding, nullable guard plumbing, and LiveLimbs propagation. |
| src/compiler/evm_frontend/evm_mir_compiler.cpp | Updates shift helpers to omit guard Selects when statically false and prune dead limbs. |
| docs/changes/2026-06-10-evm-const-shift-pruning/README.md | Documents motivation, soundness argument, tests, and measurements for the optimization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
⚡ Performance Regression Check Results✅ Performance Check Passed (interpreter)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions ✅ Performance Check Passed (multipass)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions |
The const-amount fast paths in SHL/SHR/SAR lowering kept a runtime >= 256 guard (one Select per result limb) plus an isU256GreaterOrEqual comparison chain even when the full 256-bit shift constant makes the guard statically decidable, and emitted shifted/carry terms for source limbs the value's range proves zero. - handleShift: a constant amount >= 256 folds SHL/SHR_U to a constant zero (EVM spec; SAR keeps the generic sign-dependent flow); a constant amount < 256 skips building IsLargeShift and passes nullptr. - Helpers skip the per-limb guard Select when IsLargeShift is nullptr; dynamic paths assert non-null. SAR sign-fill is untouched. - SHL/SHR_U const paths drop shifted/carry terms whose source limb index is >= the value operand's range tier (U64 -> 1 live limb, U128 -> 2); SAR is excluded so no new range claim is introduced. The limb0-only getConstShiftAmount trap (constant like 2^64 with a small limb0) is resolved statically by checking the full constant. Add 12 adversarial differential fixtures (cross-limb carries, source pruning, >= 256 folds, the 2^64 trap, SAR sign-fill both signs, and a dynamic-amount control) plus an EVMConstShiftDifferentialTest suite. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Compute the full 256-bit constant amount a single time and carry the below-256 verdict in a flag, instead of re-deriving it for the IsLargeShift gating (review feedback). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The evmone-statetests job died ~4.5 minutes in with its Build-and-Test step log never uploaded — the same runner-level termination signature seen earlier tonight on another PR's run (which passed on retrigger). The suite never reached test execution; this job passed on this PR's previous round. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…abels The shifted-dead/carry-live boundary example used << 200, where the top result limb actually keeps the shifted term and no carry term exists; << 136 is the case where only the carry term survives. Also replace internal reviewer codenames with neutral wording and set Status to Implemented. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Covers the carry-only emission branch in the const-shift handler: for a u64-tagged value (LiveLimbs=1), SHL by 136 (CompShift=2, ShiftMod=8) makes result limb 3 read SrcIdx=1 as a dead shifted term while its carry term reads the live limb Value[0] (Value[0] >> 56). That limb is therefore computed from a single carry shift with no OR and no >=256 guard Select. The existing 12 fixtures exercised shifted-term-live cases but never this shifted-dead/carry-live branch. Adds shl_const136_u64val (.easm + .expected) modeled on shl_const200_u64val, registers it in EVMConstShiftDifferentialTest, and bumps the change-doc verification count to 13/13. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
8bc0816 to
44d6223
Compare
The WASM multipass job normally completes in 3-4 minutes; the current run's instance has been stuck in progress for over two hours with no log output (runner-level hang). The EVM workflow on this same head passed completely. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The EVMConstShiftDifferentialTest suite and its 13 evm_asm fixtures relocate to the dedicated differential-suite change so optimization PRs stay code-only and the shared evm_interp_tests.cpp stops accumulating per-PR copies. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
The differential test suite and its fixtures have moved out of this PR into #539, which consolidates the interp-vs-multipass differential coverage from all three optimization PRs into a dedicated test target. This PR is now code + docs only; #539 carries the tests and can merge independently in any order. |
Align the doc with the technical-writing rule: describe internal labels by behavior, remove who-reviewed and process narrative, and keep every count, flag, code anchor, and measurement. No code or runtime change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Cen5bPpPEgkSkcxWWTSY7d
Resolves constant-amount SHL/SHR/SAR shift guards at compile time, removing four
Selects plus a 4-limb compare chain per constant-shift site. Correctness-neutral (12/12 differential fixtures, 223/223 unittests, 2723/2723 statetest); benchmark-neutral (median -0.08% over the 27-bench sweep).What
The const-amount fast paths in EVM SHL/SHR/SAR lowering still emitted a runtime
>= 256guard — anisU256GreaterOrEqualcomparison chain plus oneSelectper result limb — even when the shift amount is a compile-time constant that makes the guard statically decidable, and still computed shifted/carry terms for source limbs that the value operand's range proves zero. This PR resolves both at compile time:handleShift): when the shift amount is constant, the full 256-bit value is checked via intx. Amount>= 256folds SHL/SHR to a constant zero (EVM spec — result is identically zero for any value, mirroring the existing both-constant fold); SAR keeps the generic flow since its fill depends on the value's sign bit. Amount< 256skips buildingIsLargeShiftentirely and passesnullptrto the limb helpers.Select(IsLargeShift, fill, R)when the guard isnullptr. SAR's out-of-bounds sign-fill comes from the limb's default initializer, not the removed select, and is untouched. Dynamic-amount paths gained a defensiveZEN_ASSERT(IsLargeShift != nullptr).LiveLimbsparameter (value range U64 → 1, U128 → 2, else 4) drops shifted/carry terms whose source limb index is>= LiveLimbs— those limbs are semantically zero under the existing range contract. SAR is deliberately excluded so the change introduces no new range claim.Net effect per constant-shift site: four
Selects plus a 4-limb comparison chain removed; for shifts of proven-narrow values, additional deadshl/ushr/orterms removed.Correctness notes
getConstShiftAmountinspects only limb0, so a constant amount like2^64(limb0 == 0, upper limb set) historically relied on the runtime guard. The static resolution checks the full 256-bit constant, so such amounts either fold to zero (SHL/SHR) or keep a real guard (SAR);nullptris only ever passed for full constants< 256. Covered by a dedicated fixture.>= 256constants fires after both operands are popped; EVM stack operands are pure values, so dropping the unmaterialized value expression is safe.>= 256fold now yields a constant zero whose auto-derived tag is more precise than the old dynamic zero. This more-precise constant-zero tag is a strict strengthening of the prior dynamic-zero tag and changes no observable behavior.Verification
tests/evm_asm/) + anEVMConstShiftDifferentialTestsuite asserting interpreter and multipass outputs match byte-for-byte and that multipass JIT-compiled: cross-limb carries (<< 96), source pruning (u64 << 200,u64 >> 8),>= 256folds, the2^64trap, SAR sign-fill for both signs, and a dynamic-amount control. 12/12 pass.-k fork_Cancun2723/2723; golden.easmsuite no regressions;tools/format.sh checkclean; no new warnings.Performance
evmone-bench 27-bench sweep (multipass, vs upstream/main baseline, median of 5 reps): median delta -0.08%. Shift-focused benchmarks and all first-pass outliers re-measured with 15 repetitions resolve into their run-variance bands (blake2b_shifts +1.3% at cv 2.4-3.4%, sha1_shifts +0.2%, signextend -0.1%). End-to-end neutral, no regressions; the benefit is generated-code reduction at constant-shift sites, which this suite's hot paths do not isolate.
Known limitation
Source-limb pruning trusts the range contract. Within-block narrow producers (AND-mask, constants) physically zero the upper limbs; cross-block narrow tags imported via
EntryStackRangesadditionally rely on the analyzer being a sound over-approximation (verified for current transfer rules). That import path is gated byZEN_ENABLE_EVM_STACK_SSA_LIFT, default OFF and OFF in CI; if the flag is enabled by default in the future, the analyzer transfer rules should be re-audited and a lift-ON differential fixture added. Noted in the change document.Change document:
docs/changes/2026-06-10-evm-const-shift-pruning/README.md.🤖 Generated with Claude Code