Conversation
Standalone pebble db at <home>/data/trace_cache so writes don't share LSM with the chain state (the lesson from 42b7077, where the sentinel-pointer experiment regressed avgTotal ~32% due to compaction contention with chain pebble). Key shape: "ts/" || height(BE,8) || tracerLen(1) || tracer || txHash(32). Height is leading so Prune is a single range-delete by height window. Tx hashes are globally unique on this chain, so (height, tracer, txHash) collisions are impossible. Also defines TraceEnqueuer + a tiny indirection (SetTraceEnqueuer / Enqueue) so the keeper can hold one *TraceCache field that owns both the cache and the forwarder, without taking a hard dep on the baker that lives in evmrpc. All methods are nil-safe: callers can hold a single field and skip init when the feature is off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-bakes debug_trace results so trace RPCs become a single PK lookup in the trace cache instead of full re-execution. The baker is a bounded-queue worker pool that pulls heights enqueued from EndBlock, calls the existing tracers.API.TraceBlockByNumber for each configured tracer, and writes the per-tx JSON into TraceCache. Hard guarantee on consensus impact: Enqueue is a non-blocking channel send (drops on full queue with sparse logging); all re-execution happens on baker goroutines; reads from chain pebble go through versioned MVCC (no locks); writes go to a separate pebble db. If the baker falls behind, dropped blocks fall through to today's on-demand re-execution at trace time. No correctness loss. Tracer indirection (blockTracer interface) keeps the baker testable without standing up a real EVM/keeper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a single *TraceCache field on the Keeper (nil-safe) plus an Enqueue call from EndBlock that forwards the just-committed height to the trace baker if one is registered. Skipped during tracing (re-entry guard) so debug_trace replays don't recursively re-enqueue. The Enqueue call is a non-blocking channel send via TraceCache (which forwards to the registered TraceEnqueuer). When the baker queue is full, the height is dropped and the block falls through to today's on-demand re-execution at trace time. Consensus latency is unaffected in any case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a cache lookup at the top of TraceTransaction. On hit (the baker already produced the result for this tx + tracer), returns the cached JSON directly. On miss (no cache, unbakeable tracer config, missing receipt, or absent row) falls through to today's tracersAPI re-execution path with no behavior change. bakeableTracerName decides whether a config can be served from cache. We only bake the standard named tracers (callTracer / prestateTracer / flatCallTracer) without per-call TracerConfig — anything else (struct logger, raw JS, custom config) misses by design so we can't return a false hit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds five new evm.* config knobs (all default-off / sane defaults):
trace_bake_enabled (bool, default false)
trace_bake_workers (int, default 1)
trace_bake_queue_size (int, default 4096)
trace_bake_tracers ([]string, default ["callTracer"])
trace_bake_window_blocks (int64, default 0 = disabled)
When trace_bake_enabled=true:
- app.go opens the trace cache pebble db at <home>/data/trace_cache
and attaches it to the EVM keeper (so EndBlock can Enqueue heights).
- The HTTP server constructs a TraceBaker that re-executes blocks via
the existing tracers.API, registers it as the keeper's enqueuer, and
starts the workers.
Validators leave it off and pay nothing. RPC nodes flip it on. The
keeper-side EndBlock enqueue is a non-blocking channel send that
short-circuits to a counter when the queue is full, so consensus
latency is bounded regardless of baker progress.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TraceBlockByNumber, TraceBlockByHash, and the *ExcludeTraceFail variants now check the trace cache before falling through to live re-execution. The cache lookup is "all-or-nothing": if every tx in the block has a cached entry under the requested tracer, return the assembled list; if any tx misses, fall through to the existing path (no partial results to keep the live path simple and deterministic). Cached entries are never errored (the baker skips errored traces), so the ExcludeTraceFail filter applied to live traces is a no-op for cache hits. The inner cache lookup is a free function over (cache, height, txHashes, config) so it stays unit-testable without standing up an EVM backend. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-ups in one commit because both depend on TipFn:
last_baked_height watermark
TraceCache gains SetLastBakedHeight (atomic-max under a small lock,
out-of-order workers can't roll the watermark backwards) and
LastBakedHeight (read). Stored under "meta/last_baked_height" so
Prune's "ts/" range delete leaves it alone. The bakeBlock worker
updates the watermark after every successful (block, tracer) bake.
Catch-up sweep
When TipFn is set, Start() spawns a one-shot catchUpLoop that walks
last_baked+1 .. tip, baking each height directly (bypasses the
bounded queue so backfill can't drop). Bounded by WindowBlocks so a
long-stopped node doesn't try to bake from genesis. Skipped when
no prior watermark exists (operators who want a one-shot full
backfill run it explicitly).
Periodic prune
When TipFn is set AND WindowBlocks > 0, Start() spawns a pruneLoop
ticking on PruneInterval (default 1m). Each tick calls
cache.Prune(tip - WindowBlocks) — one DeleteRange on pebble, cheap.
Wiring: server.go passes TipFn := func() int64 { return
ctxProvider(LatestCtxHeight).BlockHeight() } and forwards
TraceBakeWindowBlocks from config.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
- evmrpc/tracers.go: drop the redundant json.RawMessage(bz) conversion flagged by unconvert. cache.Get already returns json.RawMessage so the result is the same byte sequence wrapped in the same type. - x/evm/keeper/trace_cache.go: annotate the int64 -> uint64 conversion in traceCacheKey with //nolint:gosec; block heights are non-negative, matching the same annotation already used elsewhere in the file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- evmrpc/tracers.go: drop the second redundant json.RawMessage(bz) conversion in tryTraceCache (cache.Get already returns json.RawMessage). - evmrpc/tracers.go: extract callTracerName / prestateTracerName / flatCallTracerName constants so the tracer names appear in one place (goconst was flagging "callTracer" with 3 occurrences). - x/evm/keeper/trace_cache.go: handle the closer.Close() return value in lastBakedHeightUnlocked via "_ = closer.Close()" inside a deferred closure (matches the existing pattern in Get). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1. Off-by-one in EndBlock enqueue.
When EndBlock(N) fires, height N isn't yet "safe latest" for
geth tracer queries — the watermark sits at N-1. The baker was
consistently failing every block with:
"requested height N is not yet available; safe latest is N-1"
Fix: enqueue (height - 1) from EndBlock; skip the genesis tick
where height-1 wouldn't exist.
2. Trace cache wasn't closed on graceful shutdown.
Baker writes use pebble.NoSync, so SIGTERM lost in-memory data
because nothing flushed the WAL on the way out. HandleClose now
closes the cache before falling through to the receipt store
close, mirroring the existing pattern.
Plus minor: log a debug-level "trace cache hit" line on the read
path and a startup banner from the baker so this kind of e2e bug
is visible to operators on next debug.
Verified end-to-end against a local sei-chain at -chain-id sei-chain:
- bake "n_results=1" log line for the block carrying our test tx
- "trace cache hit" log line on the matching debug_traceTransaction
- graceful shutdown flushed 13 WAL batches; reopened db shows
last_baked_height advanced and the tx's row at "ts/<height>/<tracer>/<txHash>"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a parallel "tb/<height,8>/<tracerLen,1><tracer>" keyspace for the assembled per-block trace result. Same height ordering as the per-tx "ts/" keyspace so Prune is still cheap — one DeleteRange per prefix, both bounded work regardless of row count. Block-level reads (debug_traceBlockBy*) can now be a single PK seek into "tb/" instead of N seeks under "ts/" + assembly. The baker (next commit) writes both rows when the new flag is on so per-tx and per-block paths each hit at one seek. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When debug_traceBlockBy* dominates the trace traffic, caching only the per-tx rows costs N seeks per block lookup. With CacheBlockResults the baker additionally writes the assembled JSON to a "tb/<height>/<tracer>" row, so block-level reads hit at one PK seek instead of N. Per-tx "ts/" rows are still written either way — the new flag is purely additive. Reader fast-path: tryBlockResultCache checks tb/ first; on miss falls back to today's per-tx assembly. Per-tx hits are unchanged. Unbakeable tracer configs (struct logger, custom JS, per-call TracerConfig) short-circuit before touching either keyspace. Empty blocks are skipped on the write side — per-tx assembly already returns [] for them at zero cache cost, and json.Marshal(nil)="null" would have been a format mismatch with the live path. Verified live: tx in block 0xdf gets a tb/ row written; per-block RPC returns the cached JSON; empty blocks fall through to the per-tx path and return [] correctly. Final state: ts/ rows = 1 (one tx), tb/ rows = 1 (one tx-bearing block), no empty-block garbage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your changes and provide context
and serves debug_traceTransaction / debug_traceBlockBy* from cache
Testing performed to validate your change