feat(invariant): gas-envelope fuzzing via invariant.gas_fuzz#14902
feat(invariant): gas-envelope fuzzing via invariant.gas_fuzz#14902grandizzy wants to merge 26 commits into
Conversation
Adds a single `invariant.gas_fuzz = false` config flag that, when enabled, turns on per-call gas envelope randomization, per-selector max-gas tracking, and a colored "Max Gas" column in the metrics table. Config - New `gas_fuzz: bool` field on `InvariantConfig` (default false; no behavior change for existing tests). Sampler (`crates/evm/fuzz/src/strategies/gas_sampler.rs`) - `GasObservations`: campaign-scoped `(target, selector) -> max gas_used` tracker, `Arc<RwLock<HashMap>>` so it's cheap to share. - `sample_gas_limit`: razor-power-law (k=4) biased toward the OOG edge (15% natural / 80% [50%-100%) of max / 5% [10%-30%) of max). - `sample_gas_price`: calibration-free 40% zero / 40% [1, 1e10) wei / 20% [1e10, 1e12) wei. - Floors at 50_000 gas; skips selectors whose observed max is under 100k. Tx plumbing (`fuzz/src/lib.rs`, `invariant/mod.rs`, etc.) - `CallDetails.gas_limit: Option<u64>` added so a sampled per-call limit rides with the tx all the way to `execute_tx` and into the corpus. - `PartialEq, Eq` on `BasicTxDetails` / `CallDetails` so corpus replay can deduplicate. Runner (`invariant/mod.rs`) - Per call when `gas_fuzz` is on: sample `tx.gas_limit` and stamp it on `call_details`; sample `tx.gasprice` and stuff it into `cheatcodes.gas_price` (one-shot slot consumed in `initialize_interp`). - `execute_tx`: save/restore the executor's natural gas limit around `call_raw` so invariant assertion calls aren't starved. - Only natural-gas runs (no override) feed the observation tracker, so truncated-OOG `gas_used` doesn't poison the running max. Metrics & summary (`InvariantMetrics`, `cmd/test/summary.rs`) - New `max_gas: u64` and `max_gas_sequence: Vec<BasicTxDetails>` fields, both `skip_serializing_if` so they're absent in JSON when gas_fuzz=false. - Sequence snapshot is the run prefix up to the strict-new-max call — acts as a reproducer for power-users who pipe `--json` to dashboards. - Table grows a "Max Gas" column only when at least one selector has `max_gas > 0`; the legacy 5-column layout is unchanged otherwise. Tests (`crates/forge/tests/cli/test_cmd/invariant/gas.rs`) - Dedicated `gas` submodule of the invariant CLI test suite. - Column presence: snapshot tests for the Max Gas column being shown when `gas_fuzz = true` and absent (legacy 5-column layout) when off. - GovernMental 2016 gas-DoS repro: asserts `bulkPayout` Max Gas > 500k (parsed imperatively; snapbox can't express a numeric lower bound). - Proof-of-value for the active OOG-driving strategy: a swallowed inner-OOG state-corruption bug where the inner SSTORE loop OOGs and the outer keeps its retained 1/64 gas and silently advances `debited` without the matching `credited` write. Paired tests show the invariant fires under `gas_fuzz = true` and passes (16x200 calls) under `gas_fuzz = false` on the same harness. Comparison with Echidna / Medusa - New mode: **driving execution into OOG windows as a fuzz strategy**. Echidna's `estimateGas` and Medusa's gas reporting are passive observers — they report peak gas. We additionally use gas as an active fuzz axis: the razor-power-law sampler deliberately starves calls just below their observed natural cost, exposing the OOG-induced state-corruption / swallowed-error bug class that neither tool reaches today (the buggy sub-step OOGs while the outer call proceeds, so any user invariant about post-conditions catches the divergence). Foundry's existing `int256`-return optimization mode (`is_optimization_invariant`) covers return-value maximization the same way Medusa does, but neither it nor Medusa actively probes the OOG envelope. - Single `gas_fuzz` flag turns on what Echidna splits across `estimateGas` and `maxGasprice` plus the gas-DoS reporting workflow.
060e47d to
3dfb548
Compare
The per-call `gas_price` sampled under `invariant.gas_fuzz = true` was written to `self.executor`'s cheatcodes inspector, but each run executes against `current_run.executor` — a deep clone made before the inner loop. Because the `Executor::clone` impl in `executors/mod.rs` clones the inspector stack (and therefore the `Cheatcodes` inspector with it), the sampled price never reached the transaction under test: `tx.gasprice` inside the handler stayed at the executor's default for the whole campaign and the gas-price axis was silently inert. Fix - Stamp the sampled price onto `current_run.inputs.last_mut().call_details .gas_price`, mirroring how `gas_limit` is already plumbed. - `execute_tx` writes `tx.call_details.gas_price` to the executor it is actually running against, just before `call_raw`. The cheatcodes inspector consumes the slot one-shot via `.take()` in `initialize_interp`, so the override applies to exactly the call being fuzzed and does not leak into follow-up invariant assertion calls. Replay - `CallDetails` gains `gas_price: Option<u128>` next to `gas_limit`, both `#[serde(default, skip_serializing_if = "Option::is_none")]` so existing corpus files stay compatible and only gas-fuzzed sequences carry the field. Failing sequences now replay with the exact gas envelope (limit + price) that triggered them. Test (`gas::should_apply_sampled_gas_price_to_call`) - Handler stamps each observed `tx.gasprice` into a `mapping(uint256 => bool)` and bumps `uniqueCount`. The invariant asserts `uniqueCount <= 1`, so it must fire once the sampler delivers a second distinct price — a direct check that the override reaches the EVM, not a write-only destination. - Verified by temporarily disabling the apply-side of the fix: the test flipped from `assert_failure` back to passing (invariant never fired, proving no diversity reached the handler), and back to failing once the apply was restored.
The per-call gas-limit / gas-price sampler used `rand::rng()`, a fresh thread-local entropy source. That made `--fuzz-seed` no longer deterministically reproduce a gas-fuzzed campaign — failure replay still worked via the values persisted on `CallDetails`, but rerunning the same seed could explore a different sequence of gas envelopes and surface (or hide) a different bug. CI triage on a failing seed silently lost its anchor. Route both samplers through `invariant_test.test_data.branch_runner .rng()` instead — the same seeded `TestRng` the rest of the campaign already uses for input generation. Drops the now-unused `rand` workspace dependency from `foundry-evm`.
|
Can you post results of a bench run? |
…fuzz # Conflicts: # crates/evm/evm/src/executors/invariant/mod.rs
Benchmark: 1h Aave v4 SCFuzzBench, PR #14902
|
| Run | seeds | total txs | tx/s agg | edges (avg/seed) | peak RSS |
|---|---|---|---|---|---|
| #14853 hash+depth (old, ref) | — | 32.16M | 8938 per-stream | 371–409 | 4.19 GB (single proc) |
master 0659e1dbc |
42–51 | 30.92M | 8594 | ~375 | 1.35 GB/seed |
master 0659e1dbc |
52–61 | 30.56M | 8494 | ~370 | 1.35 GB/seed |
PR #14902 gas_fuzz=OFF |
42–51 | 29.45M | 8188 | ~375 | 1.33 GB/seed |
PR #14902 gas_fuzz=OFF |
52–61 | 28.47M | 7915 | ~370 | 1.33 GB/seed |
PR #14902 gas_fuzz=ON |
42–51 | 30.45M | 8464 | ~442 | 1.35 GB/seed |
Paired Δ vs master on identical seeds: −4.7% (42–51), −6.8% (52–61). Average ~−5.8% throughput regression on the OFF default path.
Bugs found (union of [FAIL] lines across seeds)
| Run | seeds | invariants | handlers | total |
|---|---|---|---|---|
| #14853 hash+depth | — | 5 | 3 | 8 |
| master | 42–51 | 4 | 4 | 8 |
| master | 52–61 | 3 | 3 | 6 |
PR gas_fuzz=OFF |
42–51 | 5 | 4 | 9 |
PR gas_fuzz=OFF |
52–61 | 4 | 5 | 9 |
PR gas_fuzz=ON |
42–51 | 3 | 3 | 6 |
Cross-batch variance is substantial on master (8 → 6) and on OFF (9 → 9). The bug deltas should be read as PR finds more bugs reliably across both batches in OFF mode, not as a precise count.
Max Gas distribution (gas_fuzz=ON, seed42 representative)
Top-10 selectors by observed Max Gas:
| Selector | Calls | Reverts | Max Gas |
|---|---|---|---|
iSpoke_borrow |
153,762 | 150,269 (97.7%) | 581,631 |
iAaveOracle_setPrice |
213,164 | 67,067 (31.5%) | 548,424 |
iSpoke_withdraw_ASSERTION_WITHDRAW_DOS |
146,769 | 141,050 | 497,974 |
iSpoke_updateUserDynamicConfig |
132,692 | 10,019 | 418,545 |
iSpoke_updateUserRiskPremium |
122,980 | 9,995 | 412,155 |
iSpoke_setUsingAsCollateral |
170,449 | 28,049 | 410,520 |
iSpoke_supply_ASSERTION_SUPPLY_DOS |
140,840 | 78,945 | 404,025 |
iHub_mintFeeShares_ASSERTION_MINT_FEE_SHARES_PPS_CHANGE |
135,250 | 123,286 | 363,248 |
iSpoke_repay_ASSERTION_REPAY_DOS |
112,087 | 111,571 | 343,610 |
iHub_setInterestRateData |
131,475 | 120,318 | 313,611 |
Two selectors registered Max Gas = 0 (zero successful execution):
| Selector | Calls | Reverts |
|---|---|---|
iHub_updateAssetConfig |
144,742 | 144,742 (100%) |
iSpoke_liquidationCall_ASSERTION_LIQUIDATION_CALL_DOS |
158,244 | 158,244 (100%) |
Both are scfuzzbench harness artifacts, not foundry issues:
iSpoke_liquidationCall_*_DOSuses atry { ... } catch { require(false) }pattern by design — every call enters the catch and discards.iHub_updateAssetConfigis anasAdminconfig-mutation handler whoseInterestRateDatastruct input isn't being generated in a valid shape.
The Max Gas column surfacing these is a useful side benefit of the feature.
Findings
-
gas_fuzz=ONdelivers a large coverage uplift. Edges jump to ~442 avg (vs ~375 OFF — +18%), aggregate throughput rises +3.4% over OFF. Per-process peak RSS unchanged. -
OFF-mode regression is real and reproduces on two seed sets. Average ~−5.8% throughput vs master on the default path. An earlier revision (
9301b445b) widenedCallDetailsby twoOption<u64>(+32 bytes), copied into everyBasicTxDetails. Commit4661bf390boxes the overrides behindOption<Box<GasOverrides>>(8 bytes inline, niche-optimized, allocation-free when OFF, JSON shape preserved viaserde(flatten)). That recovered the per-tx-layout portion of the regression; a residual ~5% gap remains and is mode-agnostic (ON mode shows the same magnitude drop vs the pre-fix ON measurement), suggesting a campaign-scoped cost rather than per-tx. Initial bisection towardrecord_metricswas inconclusive — the residual source has not been localized. -
Bug-finding is consistently better on the PR. OFF mode hit 9 unique bugs across both seed batches (42–51 and 52–61) vs master's 8 / 6. Both
v1andv2variants oftotalBorrowedLessThanSuppliedplus all 4 handlers landed in both PR batches. Caveat: the PR changes RNG consumption order, so "seed 42" explores a different campaign in PR vs master — part of the bug delta may be sampling artifact rather than search-quality improvement. The trend is consistent across two seed sets, which is suggestive but not statistically tight. -
gas_fuzz=ONfinds 6 unique bugs — fewer than OFF's 9. ON missesinvariant_totalBorrowedLessThanSupplied_v1,_v2, andiSpoke_repay_ASSERTION_REPAY_DOS— all "deep state" bugs requiring multi-call setup chains. The wider gas envelope trades depth for breadth: more code paths attempted (+18% edges), fewer deep sequences completed before OOG. This reproduces identically on the boxed-fix binary, so it's a behavior of the gas-envelope sampling itself, not the plumbing. Worth a follow-up to understand whether the new distribution is hiding these bugs or just pushing their discovery threshold past 1h. -
The Max Gas column delivers value beyond bug-finding. Surfaces two handlers producing zero successful executions — a harness-quality signal invisible from
Calls/Reverts/Discardsalone.
Conclusion
The feature delivers a substantial coverage uplift (+18% edges) and consistent bug-finding gains on the suite tested, with the cost of a ~5% throughput regression on the default OFF path. The OFF-mode per-tx layout cost has been diagnosed and fixed in 4661bf390; a residual campaign-scoped ~5% gap remains unattributed and affects both modes equally. Recommended path: merge with the opt-in flag as drafted (no behavioral surprise for existing users), file the residual gap and the ON-mode v1/v2 starvation as follow-ups.
The two Option<u64> fields added directly on CallDetails widened every BasicTxDetails by 32 bytes, paid by every invariant tx regardless of whether gas_fuzz was enabled. On aave-v4-scfuzzbench this cost ~7% of throughput with gas_fuzz=false vs master. Move the overrides behind Option<Box<GasOverrides>> so the inline cost is one niche-optimized pointer (8 bytes) and is allocation-free when gas_fuzz is off. Accessors gas_limit()/gas_price()/set_gas_envelope() keep call sites unchanged; serde(flatten) preserves the existing top- level JSON keys so corpus files written by earlier revisions of this PR still round-trip. 1h/10-seed aave-v4-scfuzzbench: 8046 → 8188 tx/s (vs master 8594), 9 unique bugs found (vs 7 OFF / 8 master), full envelope replay preserved. Co-authored-by: grandizzy <38490174+grandizzy@users.noreply.github.com>
Would be nice to know specifically the area of code that is covered using lcov-diff My suggestion would to do a 2nd experiment, setting gaslimit to 2^24 (https://eips.ethereum.org/EIPS/eip-7825) by default, and adding a mutator that does |
…t re-arm it execute_tx sets cheats.gas_price = Some(sampled) for the per-call tx.gasprice override. The cheatcodes inspector consumes it one-shot during initialize_interp, but the surrounding Executor::commit(&mut result) calls set_gas_price(result.tx_env.gas_price()) and re-arms the override from the sampled value still sitting on result.tx_env. The main loop worked around this with a local clear_pending_gas_price helper after commit, but replay, shrink, corpus replay and showmap all go through execute_tx + commit without that cleanup and leak the sampled price into the next call / invariant assertion. Snapshot the executor's natural gas price up front, scrub it back onto result.tx_env before returning from execute_tx, and defensively clear the inspector field. Remove the now-redundant main-loop helper. Add a CLI regression test that exercises the replay path: a forced failure triggers shrink + replay_run (execute_tx -> commit -> call_invariant_function), and the surfaced revert must be the forced one, not the leak.
…2^24, 2^25) Drop the per-(target, selector) gas_used observation tracker, the gas_price sampling envelope, the calibrated-limit machinery, and the boxed override struct in favor of a single Option<u64> gas_limit field on CallDetails that is stamped per call with a flat draw over [2^24, 2^25). This keeps the EIP-7825 cap as the natural floor and exercises gas-conditional EVM dispatch (refund accounting, EIP-150 1/64 retention, OOG dispatch at the cap) without the per-handler state, calibration warm-up, replay gas_price scrubbing, or heap allocation per call that the richer envelope required. Behavior under gas_fuzz=false is unchanged.
@0xalpharush pls check 1e08588 and comparison below, looks much cleaner Benchmarks — this PR vs masterWorkload: scfuzzbench, 10 parallel 1h
Findings
ConclusionA single-knob design: when |
…fresh gas_fuzz doc Mirror the timeout exit path so the MaxAssumeRejects break surfaces the max-gas prefix collected so far instead of emitting an empty max_gas_sequence. Also update the gas_fuzz config doc to match the post-simplification design (uniform tx.gas_limit draw, no gasprice sampling).
…fuzz is off Plumb the campaign-level gas_fuzz flag into WorkerCorpus and drop any persisted call_details.gas_limit on corpus load (both master-worker initial replay and inter-worker sync) when the flag is false. Stops a prior gas_fuzz=true corpus from continuing to stamp the executor after the user turns gas_fuzz off. Failure-cache replay is unaffected so saved sequences still reproduce at their original gas envelope.
| (calldata_strategy, value_strategy).prop_map(move |(calldata, value)| { | ||
| trace!(input=?calldata, ?value); | ||
| CallDetails { target, calldata, value } | ||
| CallDetails { target, calldata, value, gas_limit: None } |
There was a problem hiding this comment.
gas_fuzz never stamps generated calls with a sampled gas limit. invariant_strat receives config, but here we return always CallDetails { ..., gas_limit: None }, and sample_gas_limit is only referenced by its unit test. As written, enabling invariant.gas_fuzz adds reporting, but does not randomize tx.gaslimit(). Let's thread config.gas_fuzz into the call-details strategy and set Some(sample_gas_limit(...)) only when enabled?
…fuzz # Conflicts: # crates/evm/evm/src/executors/corpus.rs # crates/evm/evm/src/executors/invariant/mod.rs
OSS-185
Motivation
Foundry's invariant runner currently treats
tx.gas_limitandtx.gaspriceas fixed environmental constants. Bugs that only surface when an inner call
runs out of gas — swallowed-OOG state corruption, single-selector gas DoS,
half-applied multi-step writes — are unreachable from the invariant
harness today.
This PR adds a single opt-in flag,
invariant.gas_fuzz, that turns on:tx.gas_limitrandomization via a razor-power-law samplercentered on the observed natural cost of each
(target, selector).tx.gaspricerandomization (calibration-free).the metrics table, with thresholds tied to the configured block gas
limit.
The strategy deliberately starves calls just below their observed natural
cost so user invariants about post-conditions catch the divergence when the
sub-step OOGs but the outer call proceeds.
UX changes
New config (default:
false, no behavior change for existing tests)New metrics column (only when
gas_fuzz = true)Max Gasis appended to the existingCalls | Reverts | Discardstable.Cell color is driven by the share of the block gas limit consumed by a
single call:
New JSON fields (omitted when zero)
InvariantMetricsgainsmax_gas: u64andmax_gas_sequence: Vec<BasicTxDetails>so--jsonconsumers can pipe peak gas + areproducer sequence into dashboards.
How the sampler works
crates/evm/fuzz/src/strategies/gas_sampler.rs:GasObservations: campaign-scoped(target, selector) -> max gas_usedtracker,
Arc<RwLock<HashMap>>.sample_gas_limit— razor-power-law (k=4):[50% · max, max)biased towardmax.[10% · max, 30% · max)for griefing / DoS paths.sample_gas_price— 40% zero / 40%[1, 1e10)/ 20%[1e10, 1e12)wei.Only natural-gas runs (no override) feed the observation tracker, so
truncated-OOG
gas_useddoesn't poison the running max.Comparison with Echidna and Medusa
tx.gas_limitrandomizationblockGasLimit/transactionGasLimittx.gaspricerandomizationCallDetailsfor replaymaxGaspricecap; not adaptive, not persisted per-call for replayestimateGaswas experimental and removed in 2.2.7