Autobahn hashvault integration#3602
Conversation
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3602 +/- ##
==========================================
- Coverage 58.98% 58.27% -0.71%
==========================================
Files 2247 2149 -98
Lines 185002 175110 -9892
==========================================
- Hits 109129 102054 -7075
+ Misses 66118 63981 -2137
+ Partials 9755 9075 -680
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
PR SummaryHigh Risk Overview
Reviewed by Cursor Bugbot for commit 7812a81. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
|
||
| // Commit this block's hash to the equivocation guard before saving state. See commitHashToVault. | ||
| // A returned error is a benign shutdown cancellation; genuine faults panic inside the call. | ||
| if err := commitHashToVault(ctx, blockExec.hashVault, block.Height, block.Hash()); err != nil { |
There was a problem hiding this comment.
I think the intent is to vault AppHash? block.Hash() is the Tendermint block header hash.
I think you should do:
if err := commitHashToVault(ctx, blockExec.hashVault, block.Height, state.AppHash); err != nil {
return state, err
}
instead.
There was a problem hiding this comment.
Side note, this file is now reverted. We decided not to integrate the hashvault into cosmos pathways.
| h.eventBus, | ||
| sm.NopMetrics(), | ||
| h.consensusPolicy, | ||
| h.hashVault) |
There was a problem hiding this comment.
This only covers non-Autobahn part. Is the intent to do Autobahn part later? I think you would need to:
- GigaRouter struct — add the vault field:
type GigaRouter struct {
// ... existing fields ...
hashVault hashvault.HashVault
}
-
NewGigaRouter — accept it as a parameter and set the field.
-
executeBlock — vault
resp.AppHash(the FinalizeBlock output, not the Autobahn header hash) right after FinalizeBlock succeeds, before Commit:
resp, err := app.FinalizeBlock(ctx, ...)
if err != nil { ... }
// Seal the computed AppHash before commit — same equivocation guard as the Cosmos path.
if err := commitHashToVault(ctx, r.hashVault, int64(b.GlobalNumber), resp.AppHash); err != nil {
return nil, err
}
commitResp, err := app.Commit(ctx)
buildGigaRouterin setup.go — thread the vault from makeNode through to NewGigaRouter.
There was a problem hiding this comment.
Good point, I think I did this backwards. I've reverted the cosmos changes and re-integrated with autobanhn.
| return fmt.Errorf("hashvault CommitToHash aborted at height %d: %w", height, err) | ||
| } | ||
| if errors.Is(err, hashvault.ErrHashMismatch) { | ||
| logger.Error("FATAL: HashVault detected a block-hash mismatch — the node has equivocated. "+ |
There was a problem hiding this comment.
Chatted with Greg today, since re-execution of a block should only happen during restart, it's fine to just panic here. I think you can print previous and current hash, then hint if the human is really really sure, he can remove directory to proceed, but warn that this may lead to slashing etc
There was a problem hiding this comment.
Code now logs both hashes, as well as instructions about how to bypass the problem.
| "Halting. DO NOT RESTART WITHOUT HUMAN INTERVENTION.", | ||
| "height", height, "hash", fmt.Sprintf("%X", hash), "err", err) | ||
| } else { | ||
| logger.Error("FATAL: HashVault could not commit the block hash (operational error, not a "+ |
There was a problem hiding this comment.
Could a transient I/O error land here?
What's the behavior if the on-disk file is corrupted? If the on-disk file is corrupted maybe we should just ask human to remove or just proceed? I just don't want one corrupted bit on disk to take down the whole validator. What do you feel, log an error and proceed, but don't overwrite the corrupted entry?
There was a problem hiding this comment.
Yes, and this is by design. If we can't read and write to the hashvault, we have no idea if the hash we are providing will cause slashing. The only way we can be 100% confident is if the hash vault is able to read/write the file system and we find no conflict in the data.
The challenge with random file system errors is that there isn't a clear way to recover from them automatically. If the files are corrupted, the correct solution is probably going to be for a human operator to delete the hashvault archive. But if the error is from a full disk or bad file permissions, the recovery pathway will look very different.
Let me know if you'd like to discuss.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 5429566. Configure here.
| @@ -217,9 +217,14 @@ | |||
|
|
|||
| func (p *PebbleHashVault) logHashMismatch(blockHeight uint64, existing, incoming []byte) { | |||
| logger.Error("Hashvault detected hash mismatch; node attempted to change its mind. "+ | |||
There was a problem hiding this comment.
nit: make it clear this is AppHash mismatch
| # hash-vault-disabled-unsafe disables the block-hash equivocation guard (HashVault). | ||
| # DO NOT set this to true unless you are knowingly running an UNSAFE node as a last-resort | ||
| # recovery measure. A node with this enabled has NO protection against changing its mind about | ||
| # a committed block's hash, and will log error-level warnings on every startup. |
There was a problem hiding this comment.
nit: we can probably tell them something like:
It's safer to not disable hash vault, just remove hash vault files as instructed and let it run. They should only disable hash vault if they are very sure the hash vault is totally wrong and they will keep running into the same panic on new blocks.
| # Placed here (as a top-level key, before any [section] header) so the TOML parser sees it at | ||
| # root scope where mapstructure expects it — viper would otherwise nest it under the | ||
| # immediately preceding section. | ||
| hash-vault-disabled-unsafe = {{ .HashVaultDisabledUnsafe }} |
There was a problem hiding this comment.
what is "Advanced Configuration Options"? does this belong there? (I don't know what are Advanced Configuration Options either, this is a genuine question, not suggestion.)
| inboundFullnodeCount atomic.Int64 | ||
| inboundFullnodeCap int64 | ||
|
|
||
| // hashVault is the block-hash equivocation guard. runExecute owns its lifecycle: it builds the |
There was a problem hiding this comment.
nit: AppHash, we don't check blockHash
|
|
||
| // hashVault is the block-hash equivocation guard. runExecute owns its lifecycle: it builds the | ||
| // vault on entry (durable under PersistentStateDir, or a no-op when disabled / in-memory) and | ||
| // closes it on exit. Set before executeBlock is ever called, which is the only other reader. See |
There was a problem hiding this comment.
wait, are we setting it before executeBlock is ever called?
| // before the hash is proposed for AppQC voting via PushAppHash below). On restart the block is | ||
| // re-executed and the identical hash is re-committed idempotently. A returned error is a benign | ||
| // shutdown cancellation; genuine faults panic inside the call. See commitHashToVault. | ||
| if err := commitHashToVault(ctx, r.hashVault, b.GlobalNumber, resp.AppHash); err != nil { |
There was a problem hiding this comment.
nit: I'm thinking commitAppHashToVault may make the intent more obvious we never want to commit other Hash.
| return commitResp, nil | ||
| } | ||
|
|
||
| // buildHashVault constructs the block-hash equivocation guard runExecute owns. By default it |
There was a problem hiding this comment.
nit: AppHash, maybe find and replace?
| if errors.Is(err, hashvault.ErrHashMismatch) { | ||
| // The HashVault has already logged the conflicting hashes, its data directory, and the | ||
| // bypass/slashing guidance immediately before returning this error; don't duplicate it. | ||
| logger.Error("FATAL: HashVault detected a block-hash equivocation; halting. See the preceding "+ |
There was a problem hiding this comment.
There is a tiny risk the logger doesn't flush before panicing, maybe either flush the log buffer or pass the error message to the panic string?
| // before the hash is proposed for AppQC voting via PushAppHash below). On restart the block is | ||
| // re-executed and the identical hash is re-committed idempotently. A returned error is a benign | ||
| // shutdown cancellation; genuine faults panic inside the call. See commitHashToVault. | ||
| if err := commitHashToVault(ctx, r.hashVault, b.GlobalNumber, resp.AppHash); err != nil { |
There was a problem hiding this comment.
@pompon0 we are now inserting commitHashToVault before commit. I think this is what we said we wanted during the March offsite, but double checking with you.
| return nil, fmt.Errorf("failed to create hashvault data dir %q: %w", config.DataDir, err) | ||
| } | ||
|
|
||
| db, err := pebble.Open(config.DataDir, &pebble.Options{}) |
There was a problem hiding this comment.
Do we have a log.Info if there are no data on the disk? Might be useful if it is a surprise.

Describe your changes and provide context
Wire in the hash vault to autobahn. This prevents the app hash from changing for a particular block without human intervention. This does not impact cosmos, and there is no intention to add this to cosmos.