Skip to content

feat: commit current fork#21908

Open
deffrian wants to merge 40 commits intomerge-train/spartanfrom
nikita/commit-current-fork
Open

feat: commit current fork#21908
deffrian wants to merge 40 commits intomerge-train/spartanfrom
nikita/commit-current-fork

Conversation

@deffrian
Copy link
Copy Markdown
Collaborator

@deffrian deffrian commented Mar 23, 2026

Ref: A-311

  • New commit_fork operation in C++ world state that commits a fork's tree state directly to LMDB instead of recalculating via SYNC_BLOCK. Validates canonical tip hasn't moved, syncs
    pruning metadata, and destroys the fork after commit.
  • registerForkForBlock API on the TS side allows callers to associate a fork with a block's archive root. When handleL2BlockAndMessages is called for that block, it uses COMMIT_FORK
    instead of SYNC_BLOCK, with automatic fallback if the commit fails.
  • Validator flow: re-executes the proposed block on a fork, registers it, and commits it when the block syncs — avoiding redundant tree recalculation.
  • Proposer flow: after each non-last block in a checkpoint, the fork is registered and committed via syncImmediate, then a new fork is created at the advanced tip for the next block.

@deffrian deffrian force-pushed the nikita/commit-current-fork branch 3 times, most recently from a7e9e38 to 59c8fd1 Compare March 25, 2026 15:16
@deffrian deffrian changed the title Nikita/commit current fork [WIP]Nikita/commit current fork Mar 30, 2026
@deffrian deffrian force-pushed the nikita/commit-current-fork branch from 73b1f3b to 86238e0 Compare March 30, 2026 16:52
@deffrian deffrian force-pushed the nikita/commit-current-fork branch from 01f788b to 6cbfdad Compare March 31, 2026 14:47
@deffrian deffrian changed the title [WIP]Nikita/commit current fork feat: commit current fork Apr 1, 2026
Copy link
Copy Markdown
Contributor

@spalladino spalladino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the ts side of things. Looks good, but left a few comments.

await worldState.handleL2BlockAndMessages(block, messages);
}

await tempFork.close();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to close it explicitly because the commit_fork already does, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in happy case we close them in commit_fork. Also we close them on TS in finally clause in checkpoint_proposal_job.ts or when we do setFork

// Register the fork so SYNC_BLOCK can commit it instead of recalculating.
this.worldState.registerForkForBlock(block.archive.root, checkpointBuilder.getForkId());

// If this is the last block, exit the loop so we can build the checkpoint and start collecting attestations.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment?

// sync and reuse the current fork for the next block.
if (!this.config.skipPushProposedBlocksToArchiver) {
await this.worldState.syncImmediate(blockNumber);
const newFork = await this.worldState.fork(undefined, { closeDelayMs: 12_000 });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's always be explicit on the blocknumber (and block hash) for forking, just in case. I wouldn't want to be in a situation where a reorg hits and changes world-state underneaths the checkpoint proposal job.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, who's responsibility is it to close this fork?

if (!this.config.skipPushProposedBlocksToArchiver) {
await this.worldState.syncImmediate(blockNumber);
const newFork = await this.worldState.fork(undefined, { closeDelayMs: 12_000 });
await checkpointBuilder.setFork(newFork);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like this setFork call inbetween blocks. If now we have to supply a new fork for each block we build, can't we make the fork an argument of the buildBlock method, rather than a field that we need to re-set before each call?

Comment on lines +210 to +216
// COMMIT_FORK runs on the canonical queue, but we need to clean up the fork's queue
const actualForkId = (body as { forkId: number }).forkId;
const forkQueue = this.queues.get(actualForkId);
if (forkQueue) {
await forkQueue.stop();
this.queues.delete(actualForkId);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this honor the closeDelay?

if (reexecutionResult?.block && this.config.skipPushProposedBlocksToArchiver === false) {
this.worldState.registerForkForBlock(reexecutionResult.block.archive.root, fork.forkId);
await this.blockSource.addBlock(reexecutionResult.block);
await this.worldState.syncImmediate(blockNumber);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up this syncImmediate may fail. Given this is an optimization, let's have a trySyncImmediate that swallows the error so we can carry on if the sync fails.

Copy link
Copy Markdown
Collaborator Author

@deffrian deffrian Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's needed. As I understand it can fail only if something is wrong with world state. I'm not sure we can just move on in this case

Comment on lines +503 to +508
// Close forks to release native resources. Already-committed forks silently handle "Fork not found".
const currentFork = checkpointBuilder?.getFork();
if (currentFork && currentFork !== fork) {
await currentFork.close();
}
await fork?.close();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect this to be handled by the checkpoint builder maybe? Looks odd such explicit handling of forks in here.

@spalladino
Copy link
Copy Markdown
Contributor

Also, in case it helps, some notes from Claude


Bugs: TypeScript

Bug 1 (Medium): WorldStateOpsQueue leak after COMMIT_FORK + subsequent close()

native_world_state_instance.ts:209-216 — After COMMIT_FORK succeeds, the fork's queue is stopped and removed from this.queues. Later, when MerkleTreesForkFacade.close() fires (from setFork() or the finally block), it calls DELETE_FORK which creates a new queue for that fork ID (since the old one was deleted). The native call fails with "Fork not found", and because the exception is thrown before the cleanup code at line 206, the newly created orphan queue is never stopped or deleted.

In a proposer building 3 blocks per checkpoint, at least 2 orphan queues are created per checkpoint. Over hours of operation, this grows unbounded.

Fix: Either wrap the post-execute cleanup at lines 205-217 in a finally block, or have MerkleTreesForkFacade.close() check if the queue still exists before calling DELETE_FORK.

Bug 2 (Low): registeredFork not cleared in clear()

native_world_state.ts — The clear() method resets the world state instance and clears cachedStatusSummary, but does not clear registeredFork. After clear(), a block with a coincidentally matching archive root would attempt COMMIT_FORK on a destroyed instance. Fallback saves correctness, but generates spurious error logs.

Fix: Add this.registeredFork = undefined to clear().

Bugs: C++

Bug 1 (High): No rollback on partial LMDB commit in commit_fork

world_state.cpp:304-309commit(fork, status) dispatches 5 independent commit_block operations to the thread pool, one per tree. Each tree writes its own LMDB transaction. If some trees succeed but others fail, the method throws without calling rollback() on canonical. Compare with sync_block which has a try/catch that calls rollback() before re-throwing.

The consequence: after a partial failure, LMDB contains a mix of block N+1 data for some trees and block N data for others. The next validate_trees_are_equally_synched() call will fail on every subsequent mutating operation, bricking the world state. Adding rollback() before the throw doesn't fix the LMDB inconsistency, but it at least makes the in-memory state match LMDB so the failure mode is consistent.

Fix: Wrap the commit call in try/catch with rollback() before re-throwing, matching the sync_block pattern.

Bug 2 (Medium): Fork permanently destroyed on recoverable validation failure

world_state.cpp:274-282retrieve_and_remove_fork(forkId) atomically removes the fork from _forks before the tip-moved validation check. If the validation throws, the fork (with all its tree data) is permanently destroyed. The caller cannot retry or inspect the fork. This is surprising behavior — a failed commit_fork is also an implicit delete_fork.

Fix: Either validate before removing from the map, or document this as intentional behavior and test that the fork is gone after a failed commit.

Architecture Improvements: TypeScript

1. registerForkForBlock is a temporal side-channel — pass the fork ID explicitly

I disagree with Claude on this one

The registration pattern creates invisible coupling between two unrelated call sites. Every new state-mutating path (like unwindBlocks) must remember to clear registeredFork. Consider adding an optional parameter to handleL2BlockAndMessages:

handleL2BlockAndMessages(block: L2Block, msgs: Fr[], opts?: { commitForkId?: number }): Promise<...>

This makes the commit-vs-sync decision local to a single call and eliminates cleanup in unwindBlocks.

2. forkId: number on MerkleTreeWriteOperations leaks a native detail

Every wrapper (GuardedMerkleTreeOperations, HintingMerkleWriteOperations) had to add pass-through getters for a value only two call sites use. The mock returns 0 (the canonical fork ID — a footgun). Move forkId to a narrower CommittableFork type returned by fork(), or remove it entirely if suggestion 1 is adopted.

3. setFork on ICheckpointBlockBuilder is fragile mutable state

The fork lifecycle is now split across 6 sites in 3 files: create in proposal job, use in builder, register in proposal job, commit in handleL2BlockAndMessages, create-new in proposal job, cleanup in finally. The finally block already has non-trivial logic distinguishing the initial fork from the current fork.

Consider making the builder own the fork lifecycle internally (create, register-for-commit, advance-to-next-fork), exposing only advanceToNextBlock() and close(). The finally block simplifies to await checkpointBuilder?.close().

4. Single registeredFork slot silently drops registrations

I think keeping a single one is fine

Use a Map<string, number> keyed by archive root instead of a single slot. This is future-proof for concurrent or batched block syncing and prevents silent overwrites.

5. setDb() on LightweightCheckpointBuilder breaks encapsulation

Allows the database to be swapped while accumulated state (lastArchives, spongeBlob, blocks) still references the old DB's state. Better modeled as an internal detail of advanceToNextBlock with consistency validation.

@fcarreiro fcarreiro removed their request for review April 16, 2026 10:52
@deffrian deffrian force-pushed the nikita/commit-current-fork branch 2 times, most recently from 2789222 to 3438b2d Compare April 22, 2026 16:18
refactor

simplify

buildOne

fix formatting
@deffrian deffrian force-pushed the nikita/commit-current-fork branch from 3438b2d to 72db43d Compare April 22, 2026 16:25
@deffrian
Copy link
Copy Markdown
Collaborator Author

Comments on Claude review:
Bug 1 (High): No rollback on partial LMDB commit in commit_fork
If partial commit happens and we fail on some trees this means that the db is corrupted. SYNC_BLOCK works the same way
2. forkId: number on MerkleTreeWriteOperations leaks a native detail
I think it's fine. It will create one more class for no obvious reason.

Agree on 1 and 4. All other issues are fixed

@deffrian deffrian requested a review from spalladino April 23, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants