perf(compiler): narrow EVM SUB lowering for u64-proven operand pairs#536
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new range-based u64 fast path lowering for EVM SUB in the MIR compiler and introduces differential tests/fixtures to validate interpreter vs multipass/JIT equivalence on wrap/borrow edge cases.
Changes:
- Implement range-proven u64
SUBlowering usingdiff+ unsigned-borrow compare + broadcast fill for upper limbs. - Add EVM assembly fixtures (goldens) covering no-underflow, underflow, boundary wrap, dynamic-zero RHS, and a wide control case.
- Extend arithmetic compile stats and add a new gtest parameterized differential suite for the new fixtures.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/evm_asm/sub_wide_u64_control.expected | Golden output for wide-u256 minus u64 control case. |
| tests/evm_asm/sub_wide_u64_control.easm | Fixture ensuring the u64 range fast path does not fire for wide minuend. |
| tests/evm_asm/sub_u64_pair_zero_rhs_dyn.expected | Golden output for dynamic-zero RHS case. |
| tests/evm_asm/sub_u64_pair_zero_rhs_dyn.easm | Fixture exercising new range-based u64 fast path (RHS is dynamic 0). |
| tests/evm_asm/sub_u64_pair_wrap_boundary.expected | Golden output for wrap boundary case with upper-limb all-ones fill. |
| tests/evm_asm/sub_u64_pair_wrap_boundary.easm | Fixture validating borrow broadcast behavior at 0 - (2^64-1). |
| tests/evm_asm/sub_u64_pair_underflow.expected | Golden output for adversarial underflow (5-7). |
| tests/evm_asm/sub_u64_pair_underflow.easm | Fixture validating underflow behavior and all-ones upper limbs. |
| tests/evm_asm/sub_u64_pair_nounderflow.expected | Golden output for typical non-underflow subtraction. |
| tests/evm_asm/sub_u64_pair_nounderflow.easm | Fixture for non-underflow in the range-u64 path. |
| tests/evm_asm/sub_u64_pair_equal.expected | Golden output for equal operands case. |
| tests/evm_asm/sub_u64_pair_equal.easm | Fixture validating equal operands produce zero. |
| src/tests/evm_interp_tests.cpp | Adds parameterized differential test (interp vs multipass/JIT) for the new stems. |
| src/compiler/evm_frontend/evm_mir_compiler.h | Implements new range-based u64 SUB lowering and adds a compile-stat counter. |
| src/compiler/evm_frontend/evm_mir_compiler.cpp | Wires new counter into hasArithCompileStats() and summary logging. |
| docs/changes/2026-06-10-evm-sub-u64-wrap-lowering/README.md | Design/change note describing the rationale, lowering, and validation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
⚡ Performance Regression Check Results✅ Performance Check Passed (interpreter)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions ✅ Performance Check Passed (multipass)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions |
A SUB result wraps to full width on underflow, so the RESULT cannot be
narrowed -- but when both operands are range-proven u64 the COMPUTATION
can: (a - b) mod 2^256 is exactly {wrapping_sub(a0, b0), fill, fill,
fill} where fill = 0 - borrow and borrow = (a0 <u b0). On underflow the
i64 negation yields all-ones, reproducing the wrapped upper 192 bits
bit-exactly for every input -- no no-underflow proof is needed.
Replace the generic path's eight protectUnsafeValue spills plus SUB/SBB
chain with one sub, one compare, and one negation (no flag-protection
barrier needed without an SBB chain). The result keeps the default U256
range tag, symmetric with the analyzer's SUB transfer rule.
Wire the new SubFastRangeU64Count into the [EVM-ARITH-SUMMARY] predicate
and log line.
Add 6 adversarial differential fixtures (underflow all-ones fill, the
0 - (2^64-1) wrap boundary, dynamic zero RHS, one-sided-wide control)
plus an EVMSubWrapDifferentialTest suite.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The evmone-statetests job on the previous run died ~5 minutes in with its Build-and-Test step log never uploaded (runner-level termination); the suite never reached test execution. All 16 other checks passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The implementation ships in this PR; the Status field was left at the Proposed value from the drafting stage. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
094eee3 to
48d32d5
Compare
|
Follow-up measurement: the stacking estimate in the PR body is now confirmed by a paired capture on a local merge of current main (which includes #524/#532) plus #534/#535/#536. On the EEST Cancun suite (27,742 shared compiled sites, site-weighted, zero reverse transitions):
The remaining SUB FULL mass on that baseline is dominated by statically unproven U256|U256 pairs (1,262 of 1,584), i.e. analysis-side rather than lowering-side. |
The EVMSubWrapDifferentialTest suite and its 6 SUB-path fixtures relocate to the dedicated EVM differential-suite change so optimization PRs stay code-only and the shared evm_interp_tests.cpp file stops accumulating per-PR copies. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
The differential test suite and its fixtures have moved out of this PR into #539, which consolidates the interp-vs-multipass differential coverage from all three optimization PRs into a dedicated test target. This PR is now code + docs only; #539 carries the tests and can merge independently in any order. |
Align the doc with the technical-writing rule: describe internal labels by behavior, remove who-reviewed and process narrative, and keep every count, flag, code anchor, and measurement. No code or runtime change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Cen5bPpPEgkSkcxWWTSY7d
Narrow EVM SUB lowering when both operands are range-proven u64, replacing the eight-limb generic borrow chain with a three-instruction fast path.
What
A SUB result wraps to full 256-bit width on underflow, so the result cannot carry a narrow range tag — but when both operands are range-proven u64, the computation can be narrowed: for
a, b ∈ [0, 2^64),On underflow the i64 negation yields all-ones, reproducing the wrapped upper 192 bits bit-exactly for every input. No no-underflow proof is required — this sidesteps the relational-fact prerequisite that correctly deferred result-narrowing for SUB.
The new fast path (gated on
bothFitU64with both operands non-constant; all existing constant paths keep priority) emits onesub, one unsigned compare, and one negation. The generic path it replaces pre-materializes all eight operand limbs throughprotectUnsafeValueto shield its SUB/SBB borrow chain from flag clobbering — the new path has no SBB chain, so no barrier is needed. The difference has a single consumer and no flag chain, so it needs noprotectUnsafeValueeither. The result keeps the default U256 range tag, symmetric with the analyzer's SUB transfer rule.Also wires the new
SubFastRangeU64Countinto the arithmetic-summary predicate and log line.Soundness
Verification
EVMSubWrapDifferentialTest(interpreter vs multipass byte-equality + JIT-compiled assertions): no-underflow, underflow all-ones fill (5 - 7 → 2^256 - 2), equal operands, the 0 - (2^64-1) wrap boundary (limb0 = 1, upper limbs all-ones), dynamic-zero RHS, and a one-sided-wide control that must not take the new path. 6/6 pass; golden suite no regressions.-k fork_Cancun2723/2723; format check clean; no new warnings.Measurements
Paired site-weighted measurement (per-site instrumentation tap, on a separate branch not part of this PR) on the EEST Cancun suite (28,109 shared compiled sites):
SUB has the largest full-width-path site population on this suite (3,893 sites). Measured against plain main. Stacked with the range-tag consumption PR (#534), whose ENV/compare tags create additional u64 pairs, covered sites grow from 925 to ~1,594 (directional estimate from #534's own measurement run, not re-measured here).
Performance (evmone-bench 27-bench sweep vs upstream/main, median of 5 + 15-rep outlier reruns): median +0.62%, with every outlier — including benchmarks the diff cannot affect — resolving into its run-variance band on rerun; the largest and most stable benchmark (snailtracer, cv ~1%) is +0.4%. End-to-end neutral, no regressions; the win is the per-site removal of seven spills plus the SBB chain, which this suite's hot paths do not isolate.
Known limitation
On real-mainnet workloads, cross-block range widening currently leaves almost no dynamically-proven u64 SUB pairs at full-path sites (execution-weighted ≈ 0); the EEST gains come from in-block pairs (loop counters, gas math). Real-load benefit depends on planned cross-block range-precision work that propagates u64 ranges across basic blocks. Noted in the change document.
Change document:
docs/changes/2026-06-10-evm-sub-u64-wrap-lowering/README.md.🤖 Generated with Claude Code