Cache Debug Trace by Kbhat1 · Pull Request #3339 · sei-protocol/sei-chain

Kbhat1 · 2026-04-29T18:45:08Z

Describe your changes and provide context

Cache debug_trace* results off the consensus path into a separate pebble cache so RPC nodes serve traces in instead of full re-execution
Using this approach we get single-digit millisecond debug_trace times vs how long it takes now (especially helpful for indexers, etc)
Add a background trace-baker that re-runs each committed block under callTracer on a worker goroutine
and serves debug_traceTransaction / debug_traceBlockBy* from cache
Fully configurable
Cache debug_trace results during normal block flow, not on demand

Testing performed to validate your change

Ran fully on local node
Verifying on mainnet node
Unit tests

Standalone pebble db at <home>/data/trace_cache so writes don't share LSM with the chain state (the lesson from 42b7077, where the sentinel-pointer experiment regressed avgTotal ~32% due to compaction contention with chain pebble). Key shape: "ts/" || height(BE,8) || tracerLen(1) || tracer || txHash(32). Height is leading so Prune is a single range-delete by height window. Tx hashes are globally unique on this chain, so (height, tracer, txHash) collisions are impossible. Also defines TraceEnqueuer + a tiny indirection (SetTraceEnqueuer / Enqueue) so the keeper can hold one *TraceCache field that owns both the cache and the forwarder, without taking a hard dep on the baker that lives in evmrpc. All methods are nil-safe: callers can hold a single field and skip init when the feature is off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pre-bakes debug_trace results so trace RPCs become a single PK lookup in the trace cache instead of full re-execution. The baker is a bounded-queue worker pool that pulls heights enqueued from EndBlock, calls the existing tracers.API.TraceBlockByNumber for each configured tracer, and writes the per-tx JSON into TraceCache. Hard guarantee on consensus impact: Enqueue is a non-blocking channel send (drops on full queue with sparse logging); all re-execution happens on baker goroutines; reads from chain pebble go through versioned MVCC (no locks); writes go to a separate pebble db. If the baker falls behind, dropped blocks fall through to today's on-demand re-execution at trace time. No correctness loss. Tracer indirection (blockTracer interface) keeps the baker testable without standing up a real EVM/keeper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a single *TraceCache field on the Keeper (nil-safe) plus an Enqueue call from EndBlock that forwards the just-committed height to the trace baker if one is registered. Skipped during tracing (re-entry guard) so debug_trace replays don't recursively re-enqueue. The Enqueue call is a non-blocking channel send via TraceCache (which forwards to the registered TraceEnqueuer). When the baker queue is full, the height is dropped and the block falls through to today's on-demand re-execution at trace time. Consensus latency is unaffected in any case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a cache lookup at the top of TraceTransaction. On hit (the baker already produced the result for this tx + tracer), returns the cached JSON directly. On miss (no cache, unbakeable tracer config, missing receipt, or absent row) falls through to today's tracersAPI re-execution path with no behavior change. bakeableTracerName decides whether a config can be served from cache. We only bake the standard named tracers (callTracer / prestateTracer / flatCallTracer) without per-call TracerConfig — anything else (struct logger, raw JS, custom config) misses by design so we can't return a false hit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds five new evm.* config knobs (all default-off / sane defaults): trace_bake_enabled (bool, default false) trace_bake_workers (int, default 1) trace_bake_queue_size (int, default 4096) trace_bake_tracers ([]string, default ["callTracer"]) trace_bake_window_blocks (int64, default 0 = disabled) When trace_bake_enabled=true: - app.go opens the trace cache pebble db at <home>/data/trace_cache and attaches it to the EVM keeper (so EndBlock can Enqueue heights). - The HTTP server constructs a TraceBaker that re-executes blocks via the existing tracers.API, registers it as the keeper's enqueuer, and starts the workers. Validators leave it off and pay nothing. RPC nodes flip it on. The keeper-side EndBlock enqueue is a non-blocking channel send that short-circuits to a counter when the queue is full, so consensus latency is bounded regardless of baker progress. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

TraceBlockByNumber, TraceBlockByHash, and the *ExcludeTraceFail variants now check the trace cache before falling through to live re-execution. The cache lookup is "all-or-nothing": if every tx in the block has a cached entry under the requested tracer, return the assembled list; if any tx misses, fall through to the existing path (no partial results to keep the live path simple and deterministic). Cached entries are never errored (the baker skips errored traces), so the ExcludeTraceFail filter applied to live traces is a no-op for cache hits. The inner cache lookup is a free function over (cache, height, txHashes, config) so it stays unit-testable without standing up an EVM backend. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two follow-ups in one commit because both depend on TipFn: last_baked_height watermark TraceCache gains SetLastBakedHeight (atomic-max under a small lock, out-of-order workers can't roll the watermark backwards) and LastBakedHeight (read). Stored under "meta/last_baked_height" so Prune's "ts/" range delete leaves it alone. The bakeBlock worker updates the watermark after every successful (block, tracer) bake. Catch-up sweep When TipFn is set, Start() spawns a one-shot catchUpLoop that walks last_baked+1 .. tip, baking each height directly (bypasses the bounded queue so backfill can't drop). Bounded by WindowBlocks so a long-stopped node doesn't try to bake from genesis. Skipped when no prior watermark exists (operators who want a one-shot full backfill run it explicitly). Periodic prune When TipFn is set AND WindowBlocks > 0, Start() spawns a pruneLoop ticking on PruneInterval (default 1m). Each tick calls cache.Prune(tip - WindowBlocks) — one DeleteRange on pebble, cheap. Wiring: server.go passes TipFn := func() int64 { return ctxProvider(LatestCtxHeight).BlockHeight() } and forwards TraceBakeWindowBlocks from config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-29T18:46:08Z

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`✅ passed`	`✅ passed`	`✅ passed`	Apr 30, 2026, 6:28 PM

- evmrpc/tracers.go: drop the redundant json.RawMessage(bz) conversion flagged by unconvert. cache.Get already returns json.RawMessage so the result is the same byte sequence wrapped in the same type. - x/evm/keeper/trace_cache.go: annotate the int64 -> uint64 conversion in traceCacheKey with //nolint:gosec; block heights are non-negative, matching the same annotation already used elsewhere in the file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- evmrpc/tracers.go: drop the second redundant json.RawMessage(bz) conversion in tryTraceCache (cache.Get already returns json.RawMessage). - evmrpc/tracers.go: extract callTracerName / prestateTracerName / flatCallTracerName constants so the tracer names appear in one place (goconst was flagging "callTracer" with 3 occurrences). - x/evm/keeper/trace_cache.go: handle the closer.Close() return value in lastBakedHeightUnlocked via "_ = closer.Close()" inside a deferred closure (matches the existing pattern in Get). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

1. Off-by-one in EndBlock enqueue. When EndBlock(N) fires, height N isn't yet "safe latest" for geth tracer queries — the watermark sits at N-1. The baker was consistently failing every block with: "requested height N is not yet available; safe latest is N-1" Fix: enqueue (height - 1) from EndBlock; skip the genesis tick where height-1 wouldn't exist. 2. Trace cache wasn't closed on graceful shutdown. Baker writes use pebble.NoSync, so SIGTERM lost in-memory data because nothing flushed the WAL on the way out. HandleClose now closes the cache before falling through to the receipt store close, mirroring the existing pattern. Plus minor: log a debug-level "trace cache hit" line on the read path and a startup banner from the baker so this kind of e2e bug is visible to operators on next debug. Verified end-to-end against a local sei-chain at -chain-id sei-chain: - bake "n_results=1" log line for the block carrying our test tx - "trace cache hit" log line on the matching debug_traceTransaction - graceful shutdown flushed 13 WAL batches; reopened db shows last_baked_height advanced and the tx's row at "ts/<height>/<tracer>/<txHash>" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a parallel "tb/<height,8>/<tracerLen,1><tracer>" keyspace for the assembled per-block trace result. Same height ordering as the per-tx "ts/" keyspace so Prune is still cheap — one DeleteRange per prefix, both bounded work regardless of row count. Block-level reads (debug_traceBlockBy*) can now be a single PK seek into "tb/" instead of N seeks under "ts/" + assembly. The baker (next commit) writes both rows when the new flag is on so per-tx and per-block paths each hit at one seek. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When debug_traceBlockBy* dominates the trace traffic, caching only the per-tx rows costs N seeks per block lookup. With CacheBlockResults the baker additionally writes the assembled JSON to a "tb/<height>/<tracer>" row, so block-level reads hit at one PK seek instead of N. Per-tx "ts/" rows are still written either way — the new flag is purely additive. Reader fast-path: tryBlockResultCache checks tb/ first; on miss falls back to today's per-tx assembly. Per-tx hits are unchanged. Unbakeable tracer configs (struct logger, custom JS, per-call TracerConfig) short-circuit before touching either keyspace. Empty blocks are skipped on the write side — per-tx assembly already returns [] for them at zero cache cost, and json.Marshal(nil)="null" would have been a format mismatch with the live path. Verified live: tx in block 0xdf gets a tb/ row written; per-block RPC returns the cached JSON; empty blocks fall through to the per-tx path and return [] correctly. Final state: ts/ rows = 1 (one tx), tb/ rows = 1 (one tx-bearing block), no empty-block garbage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Kbhat1 and others added 7 commits April 29, 2026 14:10

Kbhat1 and others added 5 commits April 29, 2026 14:52

Kbhat1 added the non-app-hash-breaking label Apr 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache Debug Trace#3339

Cache Debug Trace#3339
Kbhat1 wants to merge 12 commits intomainfrom
trace-baker

Kbhat1 commented Apr 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kbhat1 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes and provide context

Testing performed to validate your change

Uh oh!

github-actions Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Kbhat1 commented Apr 29, 2026 •

edited

Loading

github-actions Bot commented Apr 29, 2026 •

edited

Loading