Skip to content

Vault: canonicalize owner at storage and validate Ethereum addresses#22828

Open
prashantkumar1982 wants to merge 2 commits into
developfrom
fix/vault-owner-address-validation
Open

Vault: canonicalize owner at storage and validate Ethereum addresses#22828
prashantkumar1982 wants to merge 2 commits into
developfrom
fix/vault-owner-address-validation

Conversation

@prashantkumar1982

@prashantkumar1982 prashantkumar1982 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR hardens vault secret identifier handling around workflow owner addresses (M2/L3 follow-up).

Defense in depth (M2): ValidateSecretIdentifier now requires owner to be a valid Ethereum address (common.IsHexAddress). List requests validate owner with a placeholder key instead of incorrectly passing owner as the key field (which broke once addresses became 42-byte strings).

Storage-only canonicalization (L3): Owner addresses are normalized to EIP-55 checksum form at the storage boundary only:

  • vaultutils.KeyFor and CanonicalSecretIdentifier for secret/metadata keys
  • OCR plugin state transitions canonicalize identifiers before create/update/delete writes
  • KV store normalizes owner strings on metadata read/write

Request payloads are not rewritten in gateway handlers. We explicitly avoided stamping owner onto fan-out requests, because mutating params after user-gateway auth would break vault-node JWT re-auth (request digest includes params).

Refactor: Utility functions moved out of vaulttypes into vaultutils:

  • identifier.go — namespace/owner normalization, KeyFor, CanonicalSecretIdentifier
  • signatures.goValidateSignatures (OCR signed response verification)

No migration or dual-read: Secrets written with non-canonical owner casing were already unreadable from all primary read paths — workflows (normalizeOwner), confidential relay, and confidential-compute (util.HexToAddress(...).String() in executor.go) all canonicalize owner before calling vault. A lowercase write therefore never matched a checksum lookup on read. We are not adding a migration or dual-key fallback for legacy OCR state.

No feature flag: We do not believe any current production use case writes secrets with non-canonical owner strings (authorized owner binding already uses canonical addresses from allowlist/JWT). The KV store changes align writes with existing read behavior and should be a no-op for correctly written data.

@prashantkumar1982 prashantkumar1982 requested review from a team as code owners June 12, 2026 04:03
@prashantkumar1982 prashantkumar1982 marked this pull request as draft June 12, 2026 04:04
@github-actions

Copy link
Copy Markdown
Contributor

✅ No conflicts with other open PRs targeting develop

@github-actions

Copy link
Copy Markdown
Contributor

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

@cl-sonarqube-production

Copy link
Copy Markdown

@trunk-io

trunk-io Bot commented Jun 12, 2026

Copy link
Copy Markdown

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

@prashantkumar1982 prashantkumar1982 marked this pull request as ready for review June 12, 2026 06:08
@cedric-cordenier

Copy link
Copy Markdown
Contributor

@prashantkumar1982 Is L3 safe to do? This is effectively going to modify all requests made to the gateway so that addresses are fetched under the checksummed representation; but what if the data isn't stored in that format?

if !isValidIDComponent(idKey) || !isValidIDComponent(idOwner) || (idNamespace != "" && !isValidIDComponent(idNamespace)) {
return errors.New("key, owner and namespace must only contain alphanumeric characters")
}
if !common.IsHexAddress(idOwner) {

@cedric-cordenier cedric-cordenier Jun 12, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment on the overall PR: the fact that this doesn't exist already worries me a little. Might be worth adding a feature flag around this is case it causes issue with the rollout

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little uncomfortable with some of these changes: KeyFor + CanonicalSecretIdentifier both have the possibility of causing some non-determinism issues depending on what format the address is actually in today.

Is it worth being more cautious with the rollout and putting this behind a feature flag?

I also think it would be better if we handled the transformation to a checksummed addresses in a central place -- for example before adding the message to the incoming queue, rather than forcing the change throughout the plugin. That makes any bugs a lot harder to track and reason about.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we realistically have to worry about hitting non-determinism. If you created a secret with a lowercase address it would get stored with that lowercase address and then be unreadable because all of the workflows send the checksummed address. So we can be reasonably sure only people not using the CLI could cause a non-determinism issue on rollout.

With that said, we also agreed we want to put everything behind flags to be cautious so +1 for a feature flag and +1 for centralizing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants