Skip to content

Upgrade reth 1.9.3 → 1.11.3#927

Open
ZanCorDX wants to merge 7 commits into
developfrom
dx/reth-1.11.3
Open

Upgrade reth 1.9.3 → 1.11.3#927
ZanCorDX wants to merge 7 commits into
developfrom
dx/reth-1.11.3

Conversation

@ZanCorDX

@ZanCorDX ZanCorDX commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

📝 Summary

Bumps reth 27a8c0fd6324d6 (1.11.3) and the coupled major bumps: revm 31→34,
alloy-evm 0.23→0.27, alloy 1.x→1.6, alloy-trie 0.8→0.9, revm-inspectors→0.34.
Adapts rbuilder to the new reth/revm APIs and removes the unsafe that reth's
trait-object Sync change would otherwise force.

Dependencies

  • reth-* → rev d6324d6; revm 34, revm-inspectors 0.34.2, alloy-evm 0.27.2,
    alloy 1.6.3, alloy-primitives 1.5.6, alloy-trie 0.9.4.
  • Pin time = 0.3.47 (0.3.48 has an E0119 coherence bug); MSRV 1.85 → 1.88.
  • reth-tasks in rbuilder-utils repointed to the same rev (was stale).

Key adaptations

Provider Send/Sync . reth 1.11 dropped Sync from StateProviderBox
(Box<dyn StateProvider + Send>). The unit shared across
building threads is now StateProviderSource (factory + parent hash, Send+Sync+Clone);
each consumer opens its own provider on demand. CachedDB holds an already-open provider;
the block-building helper clones itself via an explicit, fallible try_clone that reopens
the provider. This is the reth way.

revm 34. Database::Error is now EvmDatabaseError<ProviderError>;
CfgEnv::with_specwith_spec_and_mainnet_gas_params (sets Osaka's per-tx gas cap,
EIP-7825). order_commit maps the new error shape.

Root hash. OverlayStateProviderFactory::new(factory, ChangesetCache),
ParallelStateRoot::new(.., Runtime); provider bounds gain
ChangeSetReader + StorageChangeSetReader + DBProvider + StorageSettingsCache.
Transparently supports reth storage v1 and v2 (reth reads the mode from the DB;
rbuilder only forwards the bound).

Provider construction. ProviderFactory::new now takes a RocksDBProvider + Runtime
and returns Result; it reads storage settings on open. Adds a process-wide task runtime.

eth-sparse-mpt. Unified onto nybbles 0.4 (matching alloy-trie 0.9 / reth): the crate's
Nibbles is now the same packed type reth's trie nodes use, so the conversion shims between
the two representations are gone — encode/decode pass Nibbles straight into alloy-trie node
types, removing per-node conversions in the root-hash hot path. Single nybbles version
across the workspace.

🤖 Generated with Claude Code with me kicking his ass to get it right....

💡 Motivation and Context

I like prime numbers so I switched 9 for 11.

✅ I have completed the following steps:

  • Run make lint
  • Run make test
  • Added tests (if applicable)

@ZanCorDX ZanCorDX requested a review from dvush as a code owner June 19, 2026 14:18
@claude

claude Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ZanCorDX's task in 4m 14s —— View job


Review: PR #927 — Upgrade reth 1.9.3 → 1.11.3

  • Read context & gather diff
  • Review provider Send/Sync refactor
  • Review try_clone / block-building helper
  • Review revm 34 error handling
  • Review root hash + runtime changes
  • Review eth-sparse-mpt nibbles changes
  • Post review feedback

The migration is well-structured and the unsafe-removal via StateProviderSource is a clean approach. The error-mapping for revm 34 (EvmDatabaseError wrapping, provider_error helper) is correct and the try_clone reconstruction of BlockState is equivalent to the old derived Clone. Comments below are concerns worth addressing/verifying — I've left inline annotations on the specific lines.

Main concern — provider opens moved into hot loops (perf)

The shift from a single Arc<dyn StateProvider> to a Send-only StateProviderBox opened on demand means several hot paths now call history_by_block_hash repeatedly where they used to Arc::clone:

  • sim_worker.rs (line 65) — was once per sim context, now once per order.
  • parallel_builder/conflict_resolvers.rs (initialize_block_state, line 293) — now once per sequence attempt (sequence_to_try × tasks/thread); previously once per worker thread in conflict_resolving_pool::start. The pre-existing // @todo actually reuse it for the duration of the block is now considerably more expensive.
  • block_building_helper::try_clone — every best-block selection / multi-bid copy reopens a provider (best_block_from_algorithms, unfinished_block_processing).

Opening a history provider is meaningfully heavier than an Arc::clone. Correctness is preserved (the shared SharedCachedReads still de-dups reads), but this is the change most likely to affect block-building latency/throughput. Recommend benchmarking sim + parallel-builder throughput against develop, or reworking so the provider is opened once per block and borrowed per task.

Other items

  • utils/mod.rs::reth_task_runtime — the OnceLock permanently captures the tokio Handle from the first caller. If that first call comes from a short-lived runtime (e.g. a #[tokio::test]), later use of the cached runtime operates on a dropped handle — a plausible source of order-dependent test flakiness (test_reth_db is a plain #[test]). Fine in the production single-runtime path. Consider always building standalone, or documenting the first-call requirement.
  • v2/trie/mod.rs::unpack_nibbles (and the identical v_experimental copy) — the [0u8; 64] buffer panics (OOB) if a key exceeds 64 nibbles. Invariant holds for trie keys, but a debug_assert!(len <= 64) would make it explicit.
  • roothash/mod.rs:144ChangesetCache::new() is allocated per root-hash computation; confirm reth doesn't intend this cache to be shared/reused across calls.

Minor / notes

  • time = 0.3.47 pin and MSRV bump to 1.88 are documented in the PR body — fine, but the pin is worth a tracking issue so it isn't forgotten once the upstream E0119 fix lands.
  • IpcStateProvider now shares per-block caches across clones via Arc<DashMap> — correct, and storage_by_hashed_key returning UnsupportedProvider is the right call for IPC.
  • ProviderFactoryReopener: replacing testing_mode with Option<ReopenPaths> cleanly disables both the consistency check and reopen for the test path — good.

No blocking correctness or security issues found in the reviewed paths. The perf concern in the hot loops is the one I'd most want validated before merge.

let order_id = task.order.id();
let start_time = Instant::now();
let state_provider = match provider
.history_by_block_hash(current_sim_context.block_ctx.attributes.parent)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perf regression in the sim hot loop. Previously the state provider was opened once per sim context (outside the inner while let loop) and cheaply Arc-cloned per order. Now history_by_block_hash is called for every order task. Opening a history provider walks reth's state/overlay setup and is materially more expensive than the old Arc::clone, so this adds per-order latency and DB pressure on the simulation path.

Since CachedDB now owns a non-cloneable StateProviderBox, the provider can't simply be hoisted — but it's worth confirming the per-order open cost is acceptable (benchmark mev_boost/sim throughput), or reworking so the provider is opened once per block and the per-order CachedDB borrows it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ~0.4 µs but I can fix it here

Comment on lines 288 to +295
fn initialize_block_state(
&mut self,
state_provider: Arc<dyn StateProvider>,
) -> BlockState<CachedDB> {
let cached = CachedDB::new(state_provider, self.ctx.shared_cached_reads.clone());
BlockState::new(cached)
source: StateProviderSource,
) -> ProviderResult<BlockState<CachedDB>> {
let cached = CachedDB::new(
source.state_provider()?,
self.ctx.shared_cached_reads.clone(),
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perf regression in the conflict-resolution hot loop. initialize_block_state now opens a fresh provider via source.state_provider()? on every sequence attempt: resolve_conflict_task loops over sequence_to_try (multiple per task) × many tasks per thread. Before this change the provider was opened once per worker thread in conflict_resolving_pool::start and Arc-cloned per task. The existing // @todo actually reuse it for the duration of the block (line 130) is now much more costly than before. Worth measuring the parallel-builder throughput impact, or reusing one provider per thread/block.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parallel builder is not working anyway

Comment on lines +43 to +52
.get_or_init(|| {
let config = match tokio::runtime::Handle::try_current() {
Ok(handle) => RuntimeConfig::with_existing_handle(handle),
Err(_) => RuntimeConfig::default(),
};
// Invariant: building the process-wide task runtime is a startup operation. If it
// fails the builder cannot compute state roots or open the provider factory, so there
// is no meaningful way to continue.
RuntimeBuilder::new(config)
.build()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process-wide OnceLock captures the tokio Handle from whatever context makes the first call, and that decision is permanent. Two consequences worth checking:

  1. If the first caller runs inside a short-lived runtime (e.g. a #[tokio::test]), the cached Runtime holds a handle to a tokio runtime that is dropped at the end of that test. Any later use of the global runtime (another test, or test_reth_db which is a plain #[test]) then operates on a dead handle — a plausible source of order-dependent test flakiness, since test execution order within a binary isn't deterministic.
  2. First-call-wins also means the standalone-vs-attached choice can be "wrong" for later callers (attached when the first caller happened to be outside a runtime, or vice versa).

In the production single-runtime path this is fine; the risk is concentrated in the test suite. Consider not attaching to the ambient handle (always build standalone), or documenting that the first call must come from the long-lived runtime.

Comment on lines +19 to +25
fn unpack_nibbles<'a>(nibbles: &Nibbles, buf: &'a mut [u8; 64]) -> &'a [u8] {
let len = nibbles.len();
for (i, nibble) in nibbles.iter().enumerate() {
buf[i] = nibble;
}
&buf[..len]
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixed [0u8; 64] buffer indexes buf[i] without a bound check, so a Nibbles longer than 64 panics with an index-out-of-bounds. 64 is the correct max for account/storage trie keys (32-byte keccak = 64 nibbles), so the invariant holds today — but a debug_assert!(nibbles.len() <= 64) would make the invariant explicit and fail loudly in tests rather than relying on the implicit panic. Same applies to the identical helper in v_experimental/trie/mod.rs.

+ 'static,
{
let overlay = OverlayStateProviderFactory::new(provider);
let overlay = OverlayStateProviderFactory::new(provider, ChangesetCache::new());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fresh ChangesetCache::new() is allocated on every root-hash computation. If reth intends this cache to be reused across blocks/computations (to amortize changeset reads), a per-call cache provides no benefit. If a per-computation cache is the intended semantics here, ignore — but worth confirming against how reth's node wires OverlayStateProviderFactory.

@claude

claude Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ZanCorDX's task in 1m 15s —— View job


Re-review: PR #927 — Upgrade reth 1.9.3 → 1.11.3

  • Identify changes since last review
  • Re-check sim_worker / provider hot-loop fix
  • Verify into_db() cache-reclaim correctness
  • Status of remaining open items

New since last review — recycle cache (sim_worker.rs)

The fix is correct. The provider is now opened once per sim context (line 55) outside the while let loop, and the CachedDB (live provider + SharedCachedReads) is reclaimed across orders via cached = block_state.into_db() (line 88). This restores the old "open once per context" behavior and keeps the per-order cost at an Arc-clone of the shared reads rather than a fresh history_by_block_hash open.

Two minor notes (non-blocking):

  • A fresh BlockState::new(cached) is created per order with an empty BundleState, and into_db() discards that bundle while keeping the CachedDB — correct for independent per-order simulation against the parent state.
  • Error path on provider open changed from break (inner loop) to continue (outer loop). Equivalent net effect; if the open were to fail persistently the outer loop would re-grab the same context and retry — acceptable since a failed history open here is rare/fatal-ish.

Remaining items from prior review (still open — all non-blocking)

  • conflict_resolvers.rs — acknowledged: "parallel builder is not working anyway." Fine to defer.
  • utils/mod.rs OnceLock runtime handle — first-caller-wins capture of the tokio Handle; still a plausible source of order-dependent test flakiness if first call comes from a short-lived #[tokio::test]. Worth a debug_assert/doc note, not a blocker.
  • v2/trie/mod.rs & v_experimental [0u8; 64] — implicit OOB panic if a key exceeds 64 nibbles; invariant holds, a debug_assert!(len <= 64) would make it explicit.
  • roothash/mod.rs:144 ChangesetCache::new() per call — confirm reth doesn't intend cross-call reuse.

No new correctness or security issues in the changed code. The primary perf concern from the first pass (sim hot loop) is resolved.

@ZanCorDX ZanCorDX requested a review from metachris as a code owner June 19, 2026 16:05
@claude

claude Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ZanCorDX's task in 1m 20s —— View job


Re-review: PR #927 — Upgrade reth 1.9.3 → 1.11.3

  • Identify changes since last review
  • Review new integration-test CI setup (int test. test)
  • Status of remaining open items

New since last review — int test. test

Only CI/integration-test wiring changed (checks.yaml + new playground.yaml). No rbuilder source changes since the recycle cache pass.

  • The switch from builder-playground start l1 --use-native-reth to start <recipe.yaml> is the right move: the built-in l1 recipe hardcodes reth v1.9.3, whose on-disk DB can't be opened by the new 1.11 ProviderFactory, so pinning reth via a custom recipe is necessary for the integration job to be meaningful. The release override → UseHostExecution correctly replaces --use-native-reth.
  • One thing to confirm: playground.yaml pins the reth release tag v1.11.3 (paradigmxyz binary), while rbuilder is built against rev d6324d6. If d6324d6 is exactly the v1.11.3 tag this is fine; if it's an intermediate commit with DB-format or RPC differences, the host reth binary and rbuilder could disagree. Worth a quick check that the tagged release and the pinned rev match.
  • builder-playground: v0.3.3 bump is required for the release recipe override — matches the inline comment. (Note: I can't edit files under .github/workflows/ due to GitHub App permissions, so this is review-only.)

Remaining items from prior reviews (still open — all non-blocking)

  • conflict_resolvers.rs provider-per-sequence-attempt open — acknowledged ("parallel builder is not working anyway"). Deferred.
  • utils/mod.rs:40 reth_task_runtimeOnceLock still captures the first caller's ambient Handle permanently. The doc comment now documents the attach-vs-standalone behavior (good), but the first-caller-wins flakiness risk for the test suite (a #[tokio::test] seeding the global before a plain #[test] like test_reth_db) remains. Doc note helps; not a blocker.
  • v2/trie/mod.rs:19 & v_experimental unpack_nibblesbuf[i] on a [0u8; 64] still panics OOB if a key exceeds 64 nibbles. Invariant holds for trie keys; a debug_assert!(nibbles.len() <= 64) would make it explicit. Fix this →
  • roothash/mod.rs:144 ChangesetCache::new() per computation — confirm reth doesn't intend cross-call reuse.

No new correctness or security issues. The CI change is sound; the only thing I'd verify before merge is the v1.11.3 tag ↔ rev d6324d6 equivalence so the host reth binary matches what rbuilder links against.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant