Skip to content

autobahn: add data_prune_after to bound data.State memory (CON-256)#3375

Open
wen-coding wants to merge 1 commit intomainfrom
wen/data_prune_after_for_autobahn
Open

autobahn: add data_prune_after to bound data.State memory (CON-256)#3375
wen-coding wants to merge 1 commit intomainfrom
wen/data_prune_after_for_autobahn

Conversation

@wen-coding
Copy link
Copy Markdown
Contributor

data.State.runPruning is a background goroutine that drops in-memory blocks/QCs/AppProposals older than a configurable duration, but the config knob (data.Config.PruneAfter) was never wired up — giga_router constructed data.NewState with only Committee set, so the pruner never spawned. data.State.PruneBefore (the giga_router-driven path based on cosmos-sdk RetainHeight) is also a no-op when the chain is configured with pruning="nothing" (the localnode default, common in test setups), so in-memory data.State grew with the chain under sustained load and eventually OOM-killed nodes.

Plumb DataPruneAfter through:
`AutobahnFileConfig.data_prune_after` (json) → `GigaRouterConfig.DataPruneAfter` → `data.Config.PruneAfter` → `data.State.runPruning`.

Production default (gen-autobahn-config): 30m, gives operators plenty of recent history for /block, /tx, /trace_*, etc. while bounding memory under load. Localnode/test override (step4_config_override.sh): 1m, keeps data.State small under sustained-throughput tests where cosmos pruning is "nothing".

Things done

  • AutobahnFileConfig.DataPruneAfter
  • GigaRouterConfig.DataPruneAfter + thread into data.Config
  • node/setup.go pass-through from autobahn.json
  • gen-autobahn-config production default (30m)
  • step4_config_override.sh localnode override (1m)
  • gofmt, vet clean

data.State.runPruning is a background goroutine that drops in-memory
blocks/QCs/AppProposals older than a configurable duration, but the
config knob (data.Config.PruneAfter) was never wired up — giga_router
constructed data.NewState with only Committee set, so the pruner
never spawned. data.State.PruneBefore (the giga_router-driven path
based on cosmos-sdk RetainHeight) is also a no-op when the chain
is configured with pruning="nothing" (Sei's localnode default,
common in test setups), so in-memory data.State grew with the chain
under sustained load and eventually OOM-killed nodes.

Plumb DataPruneAfter through:
  AutobahnFileConfig.data_prune_after (json) →
  GigaRouterConfig.DataPruneAfter →
  data.Config.PruneAfter →
  data.State.runPruning

Production default (gen-autobahn-config): 30m, gives operators
plenty of recent history for /block, /tx, /trace_*, etc. while
bounding memory under load.

Localnode/test override (step4_config_override.sh): 1m, keeps
data.State small under sustained-throughput tests where cosmos
pruning is "nothing".
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMay 4, 2026, 4:13 AM

@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 66.66667% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.17%. Comparing base (6620fb6) to head (f3cf900).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sei-tendermint/node/setup.go 50.00% 1 Missing and 1 partial ⚠️
...int/cmd/tendermint/commands/gen_autobahn_config.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3375      +/-   ##
==========================================
- Coverage   59.17%   59.17%   -0.01%     
==========================================
  Files        2097     2097              
  Lines      172641   172648       +7     
==========================================
+ Hits       102163   102167       +4     
- Misses      61615    61617       +2     
- Partials     8863     8864       +1     
Flag Coverage Δ
sei-chain-pr 68.98% <66.66%> (?)
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-tendermint/config/autobahn.go 28.57% <ø> (ø)
sei-tendermint/internal/p2p/giga_router.go 69.30% <100.00%> (+0.46%) ⬆️
...int/cmd/tendermint/commands/gen_autobahn_config.go 18.18% <0.00%> (-0.34%) ⬇️
sei-tendermint/node/setup.go 69.23% <50.00%> (-0.35%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wen-coding wen-coding changed the title autobahn: add data_prune_after to bound data.State memory (CON-257) autobahn: add data_prune_after to bound data.State memory (CON-256) May 4, 2026
@pompon0
Copy link
Copy Markdown
Contributor

pompon0 commented May 4, 2026

fyi, pruneAfter in data was used in sei-v3, but in sei-chain it is application which is responsible for pruning, via retainHeight field in ResponseCommit. I currently don't know if we want to change ownership of pruning from application to consensus. IMO it would make sense, given that application should rather be solely concerned with the latest state at all times. However perhaps sei-chain app makes some assumptions wrt which blocks are available (I can imagine that it does, but I haven't looked into that yet).

@wen-coding
Copy link
Copy Markdown
Contributor Author

fyi, pruneAfter in data was used in sei-v3, but in sei-chain it is application which is responsible for pruning, via retainHeight field in ResponseCommit. I currently don't know if we want to change ownership of pruning from application to consensus. IMO it would make sense, given that application should rather be solely concerned with the latest state at all times. However perhaps sei-chain app makes some assumptions wrt which blocks are available (I can imagine that it does, but I haven't looked into that yet).

Sorry I'm confused. Not planning to change prune ownership in this PR (although we can discuss whether that should be done, I'm generally of the opinion this is consensus cleanup which should probably be controlled via consensus), I just want to set a smaller prune period in tests, so that in less powerful machines (my Mac) we can still run long throughput tests without the validators getting OOM. The 30m default in gen-autobahn-config is a defensive cap (still opt-out — operators can drop the field), not a replacement for app-driven pruning.

@pompon0
Copy link
Copy Markdown
Contributor

pompon0 commented May 5, 2026

Currently pruning is driven by retainHeight computed via

func (app *BaseApp) GetBlockRetentionHeight(commitHeight int64) (int64, error) {
. If pruning in tests does not work, we should first check whether this function actually advances retainHeight and fix it if it doesn't (or if it doesn't keep up with the block rate).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants