Skip to content

fix(harness): route nightly state store to flatkv, not memiavl_only#444

Closed
bdchatham wants to merge 2 commits into
mainfrom
fix/nightly-flatkv-ss-write-mode
Closed

fix(harness): route nightly state store to flatkv, not memiavl_only#444
bdchatham wants to merge 2 commits into
mainfrom
fix/nightly-flatkv-ss-write-mode

Conversation

@bdchatham

Copy link
Copy Markdown
Collaborator

Problem

The nightly load suite hangs: TestBenchmark provisions the chain, logs network bench: ready, then the RPC follower wedges and never becomes EVM-serving, so seiload never runs and NightlyRunFailed fires.

Root cause (root-caused on a live hung node): the RPC follower's seid parks all threads on futex_wait_queue right after Found 0 WALs, opening no listeners (/proc/net/tcp shows no :8545/:26657/:26656). The follower runs ss-enable=true (full node) with ss-write-mode=memiavl_only; validators run the same override but ss-enable=false, so they're fine.

The latest nightly seid image constructs the FlatKV state store for full nodes. memiavl_only routes all EVM data away from that enabled store, so the SS-store open path deadlocks before listener bind. The override is weeks-old and was benign on older images (which didn't build the SS store); the new image makes it fatal — only for full nodes (RPC followers), not validators.

Fix

In flatkvStorageConfig (was memiavlStorageConfig), shared by the load/release/chaos suites:

  • storage.state_commit.write_mode: memiavl_only (unchanged — commitment tree stays memiavl; the controller default cosmos_only is rejected by the nightly image)
  • storage.state_store.write_mode: memiavl_onlyflatkv_only (route state to the FlatKV store the image now builds)

The major-upgrade suite still omits this map (it tests the migration path itself). Var renamed to reflect the state store is now FlatKV.

Validation

  • go vet -tags integration ./test/integration/ passes; gofmt clean; all 3 usages renamed.
  • Post-merge: new integration-harness image build → bump the platform nightly cronjob's harness image → re-trigger load → confirm seid opens :8545 and emits seiload_run_duration_seconds.

🤖 Generated with Claude Code

The load/release/chaos suites pinned both storage write-modes to memiavl_only.
The latest nightly seid image builds the FlatKV state store for full nodes
(ss-enable=true), and memiavl_only routes all EVM data away from that enabled
store — its open path deadlocks. The RPC followers (full nodes) wedge right
after WAL discovery, before binding any listener, so the benchmark's
EVM-readiness gate never passes and NightlyRunFailed fires. Validators are
unaffected (ss-enable=false makes the mode inert).

State commitment stays on memiavl (the controller default cosmos_only is
rejected by the nightly image); only the state-store mode moves to flatkv_only
so the enabled FlatKV store is the write target it expects. Renames the config
var to match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cursor

cursor Bot commented Jun 24, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
Only integration test provisioning defaults change; no production controller or runtime logic is modified.

Overview
Nightly integration harnesses (benchmark, chaos, release) now pin storage.state_store.write_mode to flatkv_only instead of memiavl_only, while storage.state_commit.write_mode stays memiavl_only. The shared config is renamed from memiavlStorageConfig to flatkvStorageConfig with comments explaining that current nightly seid builds FlatKV for full-node RPC followers and memiavl_only on the state store deadlocks startup before EVM listeners bind.

This is a test-harness / provisioning config change only; major-upgrade tests still omit this map on purpose.

Reviewed by Cursor Bugbot for commit 475bfef. Bugbot is set up for automated code reviews on this repo. Configure here.

The "controller default cosmos_only" note conflated the chain-optimizations
write-mode enum with the migration enum SC actually uses; trim to the
verifiable fact (the controller default is rejected, so SC is pinned).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bdchatham

Copy link
Copy Markdown
Collaborator Author

Superseded by #445 (link adjusts if number differs). Grounding the direction in canonical sei-chain main (commit 51eb1fd) showed storage.state_store.write_mode is an unbound key on current sei-chain — StateStoreConfig has no write-mode field (EVM routing is the evm-split bool), so this PR's flatkv_only change is a silently-ignored no-op. The real lever is ss-enable; the replacement disables the SeiDB state store on the nightly nodes to match the validators.

@bdchatham bdchatham closed this Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant