Skip to content

cmd/integration: commitment rebuild with history support#19016

Merged
sudeepdino008 merged 88 commits intomainfrom
commitment_history_regen
Mar 21, 2026
Merged

cmd/integration: commitment rebuild with history support#19016
sudeepdino008 merged 88 commits intomainfrom
commitment_history_regen

Conversation

@sudeepdino008
Copy link
Copy Markdown
Member

@sudeepdino008 sudeepdino008 commented Feb 6, 2026

Summary

Extends integration commitment rebuild to support regenerating commitment history (.v and .vi files) alongside the commitment domain (.kv files). This is the only practical way to regenerate commitment history for an existing synced node — stage_exec re-executes all blocks from scratch which is ~2-4x slower and requires full EVM re-execution.

Closes #18954

Usage

# Rebuild commitment (with history if enabled in DB)
integration commitment rebuild --datadir=<dir> --chain=<chain>

# If commitment history is enabled, you will be prompted:
#   "commitment history is enabled. Rebuild with history? (yes/no)"

# Clear commitment data without rebuilding (useful for cleanup)
integration commitment rebuild --datadir=<dir> --chain=<chain> --clear-commitment

# Resume a previously interrupted rebuild
integration commitment rebuild --datadir=<dir> --chain=<chain> --resume

Flags

Flag Description
--clear-commitment Remove commitment data from DB and delete state files, then exit without rebuilding
--resume Resume a previously interrupted commitment rebuild (requires commitment history enabled)
--squeeze Enable squeeze pass for ReplaceKeysInValues (default: true)

Prerequisites

  • The node must be synced (accounts/storage/code domain files must exist)
  • MaxTxNum table must be populated. If not, run: integration stage_headers --reset --datadir=<dir> --chain=<chain>

How It Works

When commitment history is not enabled, the existing RebuildPatriciaTrieBasedOnFiles path is used — it reads the latest state from domain files and recomputes the commitment trie in a single pass.

When commitment history is enabled, the new RebuildCommitmentFilesWithHistory path:

  1. Collects history keys — for each batch of blocks, queries AccountsDomain and StorageDomain history to find which keys changed in each block
  2. Replays block-by-block — for each block, touches the changed keys in the commitment trie using TouchKey, then computes the commitment root via ComputeCommitment
  3. Verifies state roots — after each block, compares the computed root against the canonical header's state root
  4. Flushes per-step — when the accumulated data crosses a step boundary (stepSize txNums), flushes domains to MDBX, builds snapshot files via BuildFiles2, waits for completion, then prunes
  5. Merges and squeezes — after all blocks are processed, runs the merge loop and squeeze migration to produce final compressed files

Key design decisions

  • Step-based batching: flushes exactly at step boundaries to produce clean per-step .kv files that merge correctly
  • ETL-based history collection: uses ETL collectors to sort history keys by block number, avoiding expensive per-block history queries
  • Discards non-commitment writes: account/storage/code domain writes are discarded since those files already exist — only commitment domain writes are persisted
  • Warmup cache: pre-warms the commitment trie cache for better read performance (disable with ERIGON_REBUILD_NO_WARMUP_CACHE=1)

Performance (Hoodi testnet, 2.4M blocks)

Metric commitment rebuild (with history) stage_exec (from scratch)
Block processing ~38 blk/s ~8 blk/s
Block processing time ~17.5 hours ~80+ hours (estimated)
Total wall clock (incl. merge/squeeze) ~33.5 hours ~80+ hours (estimated)
Peak memory (RSS) ~90 GB (mostly mmap) ~6 GB
Chaindata growth Stable at 27 GB Growing
EVM re-execution No Yes

The commitment rebuild approach is ~2-4x faster because it reads pre-computed state from existing snapshot files instead of re-executing every transaction through the EVM.

Files Changed

  • cmd/integration/commands/commitment.go — adds --clear-commitment and --resume flags, interactive prompt for history mode, routes to appropriate rebuild function
  • cmd/integration/commands/flags.go — new flag definitions
  • db/state/squeeze.go — core RebuildCommitmentFilesWithHistory function with ETL-based history collection, block-by-block replay, step-based flushing, and merge/squeeze
  • db/kv/rawdbv3/txnum.go — adds IsMaxTxNumPopulated helper
  • db/state/execctx/domain_shared.go — exposes ClearWarmupCache
  • execution/stagedsync/stage_commit_rebuild.go — adds RebuildPatriciaTrieWithHistory entry point
  • execution/commitment/commitmentdb/ — reader/context changes for rebuild state reader

@sudeepdino008 sudeepdino008 force-pushed the commitment_history_regen branch 4 times, most recently from 594b302 to 018864d Compare February 6, 2026 17:36
@sudeepdino008
Copy link
Copy Markdown
Member Author

current iteration (with parallel prefetch for each block) works but is still super slow

INFO[02-10|08:57:55.138] [rebuild_commitment_history] progress    block=931/10217115 blk/s=0.5 keys=946 root=65fd94b55b37f5dbbc465e568f2e547b64484059a3b0725c4c69be7c4d5a1b6b memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
INFO[02-10|08:58:19.187] [rebuild_commitment_history] progress    block=950/10217115 blk/s=0.8 keys=965 root=d38909ecd417a8355f7a7dc7f16a40727fe2ec8455320c8ef140f8f830f571b1 memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
INFO[02-10|08:58:38.398] [rebuild_commitment_history] progress    block=963/10217115 blk/s=0.7 keys=978 root=ad6fbe34ab300eec89c12f79bd78744d45cefdac4bf7cc0ca88fee62dfe43d9c memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
INFO[02-10|08:58:49.222] [mem] memory stats                       Rss=12.8GB Size=0B Pss=12.8GB SharedClean=2.6MB SharedDirty=0B PrivateClean=10.4GB PrivateDirty=2.5GB Referenced=12.8GB Anonymous=2.5GB Swap=0B alloc=2.2GB sys=2.5GB
INFO[02-10|08:58:55.655] [rebuild_commitment_history] progress    block=973/10217115 blk/s=0.6 keys=988 root=e72a597e68642190e06ebafe7da19bd986954fa95781841e14d6fa605e081a09 memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
INFO[02-10|08:59:17.290] [rebuild_commitment_history] progress    block=985/10217115 blk/s=0.6 keys=1.00k root=c965378c4a9d4529c4307ae2f1d35ede0a8ea1cda756123b03edbaef7d780bfb memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
INFO[02-10|08:59:39.166] [rebuild_commitment_history] progress    block=1001/10217115 blk/s=0.7 keys=1.01k root=876c5b4fee1d2c04b45d8f0181aa6f186460cee8a1c44909b3bc529f35b445a2 memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
INFO[02-10|08:59:55.994] [rebuild_commitment_history] progress    block=1006/10217115 blk/s=0.3 keys=1.02k root=426f9cee16e451ea101c19fd41fe07450e356bc78838e4b2baedb2ce04abc1f9 memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
INFO[02-10|09:00:15.173] [rebuild_commitment_history] progress    block=1019/10217115 blk/s=0.7 keys=1.03k root=6f66f3027522aea46417904c739f09a283b601ccae885a8b43ec6ef5f24b2572 memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
INFO[02-10|09:00:39.710] [rebuild_commitment_history] progress    block=1033/10217115 blk/s=0.6 keys=1.04k root=f6ccc81518aeddc148ac5765d4034453182d8d9c84325ce499f926d5eaa3b6ab memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
  • pprof says most time is spend on HistoryKeyRange -- which makes sense..for each block, we essentially read the entire ef file and try to collect the keys which lie in the block's txNum range.
  • what is needed: get the keys for 100 blocks (key+txnum) and then bucket the keys across those 100 blocks and then process it.
  • but HistoryRange or HistoryKeyRange -- it does de-duplication. Instead we need the duplicated values to come through..

sudeepdino008 and others added 15 commits March 7, 2026 15:57
Windows doesn't allow renaming files that are still open. The defer
comp.Close() was happening after Compress() which does the rename,
but the decompressor was being opened before the compressor was closed.
…transaction

The Clone() method was returning itself without updating the transaction,
causing stale transaction usage when trieContext() clones the state reader.

This affected:
- Commitment history regeneration (RebuildCommitmentFilesWithHistory)
- RPC commitment verification (eth_getProof)
- Receipts generation with state root computation
Add RebuildStateReader to commitmentdb that stores SharedDomains reference
for proper Clone() behavior. This reader:
- Reads commitment from SharedDomains in-memory batch (LatestStateReader)
- Reads plain state from history (HistoryStateReader)
- Clone() creates new reader with new tx while preserving sd and plainStateAsOf

Use NewRebuildStateReader in RebuildCommitmentFilesWithHistory instead of
CommitmentReplayStateReader.
Plain state reads come from disk history via HistoryStateReader, not from
in-memory batch. Disabling inMemHistoryReads avoids accumulating unnecessary
history data in memory.
The variable 'keyPos' used to track key offsets in accessor files was
declared outside the retry loop. When a recsplit collision occurred and
the loop retried, 'keyPos' retained its value from the previous iteration,
causing incorrect key offset tracking in the index.

Fix: Move keyPos initialization inside the retry loop so it's reset on
each attempt.

Similar to the fix in history.go (PR #19697).
Conflicts resolved:
- db/state/domain.go: keep testHook from main, remove redundant outer var
- db/state/entity_integrity_check.go: keep disableInterDomain field from HEAD, drop unused dirs field
- execution/commitment/commitmentdb/reader.go: use main's CommitmentReplayStateReader.Clone fix (don't replace plainStateReader), keep RebuildStateReader from HEAD

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without this, BuildFiles2 runs in a background goroutine and the code
immediately proceeds to prune. Since files aren't built yet, the prune
sees stale commitFilesEndTxNum and the DB keeps growing (1.1TB+).

Use the existing WaitForFiles() to block until file building completes,
ensuring data is moved from DB to snapshot files before pruning.
Replace adaptive block-based batching with step-based batching.
Each iteration processes exactly 1 step worth of blocks, then
flushes and builds files. This ensures predictable memory usage
and file sizes, since the old memBatch size metric only tracked
latest state (~2GB) while history data (5-12GB per step) was
invisible to the adaptive logic.

- Remove batchSize, batchBlockCount, adaptive grow/shrink logic
- Each iteration: find current step from blockFrom's txNum,
  compute step boundary block, process blocks, flush, build files
- Last step may be partial (fewer blocks), which is fine
lastToTxNum is inclusive (last txNum of the step, e.g. 781249 for step 1
with stepSize 390625). Integer division 781249/390625=1 means toStep=1,
so BuildFiles2 loop 'for step := fromStep; step < 1' skips building step 1.

Fix: use (lastToTxNum+1)/stepSize to get the exclusive step boundary.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 82788f1.
- Replace RebuildStateReader with CommitmentReplayStateReader
- Revert integrity_checker_test changes (belongs in separate PR)
- Fix gofmt in domain.go
@sudeepdino008
Copy link
Copy Markdown
Member Author

✅ Successful Hoodi commitment history regeneration

Branch: commitment_history_regen (commit abf8ea12a0)
Chain: Hoodi (2,426,826 blocks, 87.89M txs)
Command: integration commitment rebuild --datadir=... --chain=hoodi
Machine: 16 cores, 128GB RAM

Timeline

Phase Duration Notes
Block processing 17h 31m 743.45M keys, ~27-35 blk/s, 1-step batching
.kv merges ~2h 7m 0-128: 1h26m, 128-192: 31m, 192-224: 10m
History (.v) builds ~13h 12m 0-128: 8h8m (86GB), 128-192: 3h7m (33GB), 192-224: 1h1m (13GB)
Index (.vi) builds ~4m Fast
Squeeze migration ~19m Saved 12.1 GB across 4 files (44.2% on 0-128.kv)
Total wall time ~33h Start 2026-03-18 16:04 → End 2026-03-20 01:37 CET

Final disk usage

Component Size
Total datadir 284 GB
Chaindata (mdbx) 27 GB (stable throughout)
Snapshots 251 GB
commitment.0-128.kv 12 GB (post-squeeze)
commitment.0-128.v 86 GB
commitment.128-192.v 33 GB
commitment.192-224.v 13 GB

State root

955a9e0c3052b0c0faf80199468f4b11ab32104c5087a2cf0304f95243238976

Integrity check

erigon seg integrityall checks passed

Checks run: Blocks, HeaderNoGaps, BlocksTxnID, InvertedIndex, HistoryNoSystemTxs, CommitmentKvi, ReceiptsNoDups, RCacheNoDups, CommitmentRoot, CommitmentHistVal, StateRootVerifyByHistory (257 sampled blocks, 6m5s), Publishable ("All snapshots are publishable").

Key fixes applied

  1. BuildFiles2 race: Added WaitForFiles() after BuildFiles2 to prevent pruning from racing ahead
  2. Step-based batching: Replaced adaptive SizeEstimate batching with 1-step-per-flush (SizeEstimate only tracks latest state, not history)
  3. Off-by-one fix: (lastToTxNum + 1) / stepSize to avoid missing last step

CommitmentReplayStateReader.Clone() does not clone the plainStateReader,
which causes warmup goroutines to share the write transaction's
HistoryStateReader. RebuildStateReader.Clone() creates a fresh reader
pair with the new tx, avoiding the issue.

This reverts the reader swap from cleanup 2 (abf8ea1) while keeping
CommitmentReplayStateReader intact for its other callers.
Flush every 2 steps instead of 1, halving the number of mdbx writes
and BuildFiles2 calls during the block processing phase.
Resolved conflicts in DeleteStateSnapshots API (positional args → struct).
@sudeepdino008 sudeepdino008 changed the title [wip] Commitment history regen tool commitment history regen tool Mar 20, 2026
@sudeepdino008 sudeepdino008 changed the title commitment history regen tool [wip] commitment history regen tool Mar 20, 2026
@sudeepdino008 sudeepdino008 changed the title [wip] commitment history regen tool cmd/integration, db/state: commitment rebuild with history support Mar 20, 2026
@sudeepdino008 sudeepdino008 changed the title cmd/integration, db/state: commitment rebuild with history support cmd/integration: commitment rebuild with history support Mar 20, 2026
@sudeepdino008 sudeepdino008 added this pull request to the merge queue Mar 21, 2026
Merged via the queue into main with commit b772ea8 Mar 21, 2026
34 checks passed
@sudeepdino008 sudeepdino008 deleted the commitment_history_regen branch March 21, 2026 15:00
sudeepdino008 added a commit that referenced this pull request Apr 1, 2026
Extends `integration commitment rebuild` to support regenerating
**commitment history** (`.v` and `.vi` files) alongside the commitment
domain (`.kv` files). This is the only practical way to regenerate
commitment history for an existing synced node — `stage_exec`
re-executes all blocks from scratch which is ~2-4x slower and requires
full EVM re-execution.

Closes #18954

```bash
integration commitment rebuild --datadir=<dir> --chain=<chain>

integration commitment rebuild --datadir=<dir> --chain=<chain> --clear-commitment

integration commitment rebuild --datadir=<dir> --chain=<chain> --resume
```

| Flag | Description |
|------|-------------|
| `--clear-commitment` | Remove commitment data from DB and delete state
files, then exit without rebuilding |
| `--resume` | Resume a previously interrupted commitment rebuild
(requires commitment history enabled) |
| `--squeeze` | Enable squeeze pass for ReplaceKeysInValues (default:
true) |

- The node must be synced (accounts/storage/code domain files must
exist)
- `MaxTxNum` table must be populated. If not, run: `integration
stage_headers --reset --datadir=<dir> --chain=<chain>`

When commitment history is **not** enabled, the existing
`RebuildPatriciaTrieBasedOnFiles` path is used — it reads the latest
state from domain files and recomputes the commitment trie in a single
pass.

When commitment history **is** enabled, the new
`RebuildCommitmentFilesWithHistory` path:

1. **Collects history keys** — for each batch of blocks, queries
`AccountsDomain` and `StorageDomain` history to find which keys changed
in each block
2. **Replays block-by-block** — for each block, touches the changed keys
in the commitment trie using `TouchKey`, then computes the commitment
root via `ComputeCommitment`
3. **Verifies state roots** — after each block, compares the computed
root against the canonical header's state root
4. **Flushes per-step** — when the accumulated data crosses a step
boundary (stepSize txNums), flushes domains to MDBX, builds snapshot
files via `BuildFiles2`, waits for completion, then prunes
5. **Merges and squeezes** — after all blocks are processed, runs the
merge loop and squeeze migration to produce final compressed files

- **Step-based batching**: flushes exactly at step boundaries to produce
clean per-step `.kv` files that merge correctly
- **ETL-based history collection**: uses ETL collectors to sort history
keys by block number, avoiding expensive per-block history queries
- **Discards non-commitment writes**: account/storage/code domain writes
are discarded since those files already exist — only commitment domain
writes are persisted
- **Warmup cache**: pre-warms the commitment trie cache for better read
performance (disable with `ERIGON_REBUILD_NO_WARMUP_CACHE=1`)

| Metric | `commitment rebuild` (with history) | `stage_exec` (from
scratch) |

|--------|--------------------------------------|----------------------------|
| Block processing | ~38 blk/s | ~8 blk/s |
| Block processing time | ~17.5 hours | ~80+ hours (estimated) |
| Total wall clock (incl. merge/squeeze) | ~33.5 hours | ~80+ hours
(estimated) |
| Peak memory (RSS) | ~90 GB (mostly mmap) | ~6 GB |
| Chaindata growth | Stable at 27 GB | Growing |
| EVM re-execution | No | Yes |

The commitment rebuild approach is **~2-4x faster** because it reads
pre-computed state from existing snapshot files instead of re-executing
every transaction through the EVM.

- **`cmd/integration/commands/commitment.go`** — adds
`--clear-commitment` and `--resume` flags, interactive prompt for
history mode, routes to appropriate rebuild function
- **`cmd/integration/commands/flags.go`** — new flag definitions
- **`db/state/squeeze.go`** — core `RebuildCommitmentFilesWithHistory`
function with ETL-based history collection, block-by-block replay,
step-based flushing, and merge/squeeze
- **`db/kv/rawdbv3/txnum.go`** — adds `IsMaxTxNumPopulated` helper
- **`db/state/execctx/domain_shared.go`** — exposes `ClearWarmupCache`
- **`execution/stagedsync/stage_commit_rebuild.go`** — adds
`RebuildPatriciaTrieWithHistory` entry point
- **`execution/commitment/commitmentdb/`** — reader/context changes for
rebuild state reader

---------

Co-authored-by: Alexey Sharov <AskAlexSharov@gmail.com>
Co-authored-by: nanobot <nanobot@example.com>
Co-authored-by: antonis19 <antonis19@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tool to regenerate commitment history

3 participants