Skip to content

Autobahn hashvault integration#3602

Open
cody-littley wants to merge 13 commits into
mainfrom
cjl/hashvault-integration
Open

Autobahn hashvault integration#3602
cody-littley wants to merge 13 commits into
mainfrom
cjl/hashvault-integration

Conversation

@cody-littley

@cody-littley cody-littley commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Describe your changes and provide context

Wire in the hash vault to autobahn. This prevents the app hash from changing for a particular block without human intervention. This does not impact cosmos, and there is no intention to add this to cosmos.

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJun 24, 2026, 2:42 PM

@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 54.21687% with 38 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.27%. Comparing base (ba0fa95) to head (7812a81).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sei-tendermint/internal/p2p/giga_router_common.go 45.28% 23 Missing and 6 partials ⚠️
sei-db/state_db/sc/hashvault/noop_hashvault.go 0.00% 8 Missing ⚠️
sei-tendermint/node/setup.go 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3602      +/-   ##
==========================================
- Coverage   58.98%   58.27%   -0.71%     
==========================================
  Files        2247     2149      -98     
  Lines      185002   175110    -9892     
==========================================
- Hits       109129   102054    -7075     
+ Misses      66118    63981    -2137     
+ Partials     9755     9075     -680     
Flag Coverage Δ
sei-chain-pr 70.46% <53.03%> (?)
sei-db 70.41% <ø> (ø)
sei-db-state-db ?
sei-db-state-db-pr 79.92% <42.85%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-db/state_db/sc/hashvault/pebble_hashvault.go 82.64% <100.00%> (+0.74%) ⬆️
sei-tendermint/config/config.go 75.97% <100.00%> (+1.38%) ⬆️
sei-tendermint/config/toml.go 59.25% <ø> (+4.25%) ⬆️
sei-tendermint/internal/p2p/giga_router.go 68.36% <ø> (-31.64%) ⬇️
sei-tendermint/node/setup.go 72.80% <75.00%> (+10.20%) ⬆️
sei-db/state_db/sc/hashvault/noop_hashvault.go 0.00% <0.00%> (ø)
sei-tendermint/internal/p2p/giga_router_common.go 58.25% <45.28%> (-4.17%) ⬇️

... and 200 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cody-littley cody-littley marked this pull request as ready for review June 17, 2026 13:18
@cody-littley cody-littley requested a review from wen-coding June 17, 2026 13:19
@cursor

cursor Bot commented Jun 17, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Touches validator block execution and equivocation protection; hash mismatches panic and misconfiguration or disabling the vault removes slash-relevant safeguards.

Overview
HashVault is wired into Autobahn GigaRouter block execution so each height’s app hash is durably recorded before app.Commit and before PushAppHash, blocking silent hash changes at the same height.

runExecute opens a Pebble vault under <PersistentStateDir>/hashvault (or a new NoopHashVault when there is no persistent dir or when hash-vault-disabled-unsafe is set). On restart it re-commits the last finalized app hash before re-pushing for AppQC; mismatches panic via commitHashToVault. Vault prune follows the data layer’s RetainHeight. Operators get a new top-level hash-vault-disabled-unsafe knob in config.toml (default off), louder unsafe-disable startup logs, and expanded Pebble mismatch logs (data dir + recovery/slashing warnings). TestCommitHashToVault covers idempotent match, divergent hash panic, and shutdown cancellation.

Reviewed by Cursor Bugbot for commit 7812a81. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread sei-tendermint/internal/consensus/replay.go Outdated
Comment thread sei-tendermint/internal/state/execution.go Outdated

// Commit this block's hash to the equivocation guard before saving state. See commitHashToVault.
// A returned error is a benign shutdown cancellation; genuine faults panic inside the call.
if err := commitHashToVault(ctx, blockExec.hashVault, block.Height, block.Hash()); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent is to vault AppHash? block.Hash() is the Tendermint block header hash.

I think you should do:

if err := commitHashToVault(ctx, blockExec.hashVault, block.Height, state.AppHash); err != nil {
    return state, err
}

instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change made

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note, this file is now reverted. We decided not to integrate the hashvault into cosmos pathways.

h.eventBus,
sm.NopMetrics(),
h.consensusPolicy,
h.hashVault)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only covers non-Autobahn part. Is the intent to do Autobahn part later? I think you would need to:

  1. GigaRouter struct — add the vault field:
type GigaRouter struct {
    // ... existing fields ...
    hashVault hashvault.HashVault
}
  1. NewGigaRouter — accept it as a parameter and set the field.

  2. executeBlock — vault resp.AppHash (the FinalizeBlock output, not the Autobahn header hash) right after FinalizeBlock succeeds, before Commit:

resp, err := app.FinalizeBlock(ctx, ...)
if err != nil { ... }

// Seal the computed AppHash before commit — same equivocation guard as the Cosmos path.
if err := commitHashToVault(ctx, r.hashVault, int64(b.GlobalNumber), resp.AppHash); err != nil {
    return nil, err
}

commitResp, err := app.Commit(ctx)
  1. buildGigaRouter in setup.go — thread the vault from makeNode through to NewGigaRouter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I think I did this backwards. I've reverted the cosmos changes and re-integrated with autobanhn.

return fmt.Errorf("hashvault CommitToHash aborted at height %d: %w", height, err)
}
if errors.Is(err, hashvault.ErrHashMismatch) {
logger.Error("FATAL: HashVault detected a block-hash mismatch — the node has equivocated. "+

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted with Greg today, since re-execution of a block should only happen during restart, it's fine to just panic here. I think you can print previous and current hash, then hint if the human is really really sure, he can remove directory to proceed, but warn that this may lead to slashing etc

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code now logs both hashes, as well as instructions about how to bypass the problem.

"Halting. DO NOT RESTART WITHOUT HUMAN INTERVENTION.",
"height", height, "hash", fmt.Sprintf("%X", hash), "err", err)
} else {
logger.Error("FATAL: HashVault could not commit the block hash (operational error, not a "+

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a transient I/O error land here?

What's the behavior if the on-disk file is corrupted? If the on-disk file is corrupted maybe we should just ask human to remove or just proceed? I just don't want one corrupted bit on disk to take down the whole validator. What do you feel, log an error and proceed, but don't overwrite the corrupted entry?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and this is by design. If we can't read and write to the hashvault, we have no idea if the hash we are providing will cause slashing. The only way we can be 100% confident is if the hash vault is able to read/write the file system and we find no conflict in the data.

The challenge with random file system errors is that there isn't a clear way to recover from them automatically. If the files are corrupted, the correct solution is probably going to be for a human operator to delete the hashvault archive. But if the error is from a full disk or bad file permissions, the recovery pathway will look very different.

Let me know if you'd like to discuss.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 5429566. Configure here.

Comment thread sei-tendermint/internal/state/execution.go Outdated
@cody-littley cody-littley changed the title integrate hashvault Autobahn hashvault integration Jun 24, 2026
@@ -217,9 +217,14 @@

func (p *PebbleHashVault) logHashMismatch(blockHeight uint64, existing, incoming []byte) {
logger.Error("Hashvault detected hash mismatch; node attempted to change its mind. "+

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make it clear this is AppHash mismatch

# hash-vault-disabled-unsafe disables the block-hash equivocation guard (HashVault).
# DO NOT set this to true unless you are knowingly running an UNSAFE node as a last-resort
# recovery measure. A node with this enabled has NO protection against changing its mind about
# a committed block's hash, and will log error-level warnings on every startup.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can probably tell them something like:
It's safer to not disable hash vault, just remove hash vault files as instructed and let it run. They should only disable hash vault if they are very sure the hash vault is totally wrong and they will keep running into the same panic on new blocks.

# Placed here (as a top-level key, before any [section] header) so the TOML parser sees it at
# root scope where mapstructure expects it — viper would otherwise nest it under the
# immediately preceding section.
hash-vault-disabled-unsafe = {{ .HashVaultDisabledUnsafe }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "Advanced Configuration Options"? does this belong there? (I don't know what are Advanced Configuration Options either, this is a genuine question, not suggestion.)

inboundFullnodeCount atomic.Int64
inboundFullnodeCap int64

// hashVault is the block-hash equivocation guard. runExecute owns its lifecycle: it builds the

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: AppHash, we don't check blockHash


// hashVault is the block-hash equivocation guard. runExecute owns its lifecycle: it builds the
// vault on entry (durable under PersistentStateDir, or a no-op when disabled / in-memory) and
// closes it on exit. Set before executeBlock is ever called, which is the only other reader. See

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, are we setting it before executeBlock is ever called?

// before the hash is proposed for AppQC voting via PushAppHash below). On restart the block is
// re-executed and the identical hash is re-committed idempotently. A returned error is a benign
// shutdown cancellation; genuine faults panic inside the call. See commitHashToVault.
if err := commitHashToVault(ctx, r.hashVault, b.GlobalNumber, resp.AppHash); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm thinking commitAppHashToVault may make the intent more obvious we never want to commit other Hash.

return commitResp, nil
}

// buildHashVault constructs the block-hash equivocation guard runExecute owns. By default it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: AppHash, maybe find and replace?

if errors.Is(err, hashvault.ErrHashMismatch) {
// The HashVault has already logged the conflicting hashes, its data directory, and the
// bypass/slashing guidance immediately before returning this error; don't duplicate it.
logger.Error("FATAL: HashVault detected a block-hash equivocation; halting. See the preceding "+

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a tiny risk the logger doesn't flush before panicing, maybe either flush the log buffer or pass the error message to the panic string?

// before the hash is proposed for AppQC voting via PushAppHash below). On restart the block is
// re-executed and the identical hash is re-committed idempotently. A returned error is a benign
// shutdown cancellation; genuine faults panic inside the call. See commitHashToVault.
if err := commitHashToVault(ctx, r.hashVault, b.GlobalNumber, resp.AppHash); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pompon0 we are now inserting commitHashToVault before commit. I think this is what we said we wanted during the March offsite, but double checking with you.

return nil, fmt.Errorf("failed to create hashvault data dir %q: %w", config.DataDir, err)
}

db, err := pebble.Open(config.DataDir, &pebble.Options{})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a log.Info if there are no data on the disk? Might be useful if it is a surprise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants