Bind vMCP sessions to OIDC identity, not raw token bytes#5378
Open
jhrozek wants to merge 4 commits into
Open
Conversation
The vMCP session-binding format needs to be parsed and produced in two places: pkg/vmcp/session/types (from ShouldAllowAnonymous) and pkg/vmcp/session/internal/security (from BindSession / validateCaller). A private helper in each would drift over time, so this commit introduces a single-owner leaf package. Format encodes a bound identity as iss + "\x00" + sub, rejecting empty halves and stray NULs. Parse mirrors Format strictly — including rejecting trailing NULs in the sub half so values that did not pass through Format (e.g. direct writes to Redis) fail loudly. A literal "unauthenticated" sentinel covers sessions created without an authenticated identity. No callers wired yet — those land in the next commits. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The hijack-prevention decorator needs a stable per-identity key in session metadata. Add MetadataKeyIdentityBinding alongside the existing token-hash keys (the legacy keys stay temporarily so already-running sessions can be invalidated cleanly during the migration; final removal happens in the operator-side follow-up). ShouldAllowAnonymous treated every empty-token identity as anonymous, which lumped LocalUserMiddleware identities into the same equivalence class even when they carried distinct (iss, sub) claims — letting one local user reuse another's session ID. Tighten the rule so any identity with a valid (iss, sub) pair from Claims goes through the bound path, even when Token is empty. Pull iss and sub from Identity.Claims rather than Identity.Subject so the introspection path and the JWT path canonicalize against the same source. Fail-closed on non-string iss/sub claims: a misbehaving validator that stores a numeric or array value is treated as bound (not anonymous), with a WARN logged so the misbehavior surfaces to operators. The new key is unused until the security and factory layers are wired in the next commit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The hijack-prevention decorator hashed the incoming bearer token at session creation and rejected any subsequent request whose token hashed differently. Every legitimate OAuth refresh produces a new access token with different bytes — same identity, same iss, same sub — so the decorator misclassified the refresh as a hijack and terminated the session with Unauthorized: caller identity does not match session owner. Drop the HMAC plumbing entirely and bind to a stable (iss, sub) tuple extracted from the OIDC identity's Claims. The binding lives in session metadata under MetadataKeyIdentityBinding, written exclusively by the new BindSession constructor (renamed from PreventSessionHijacking). Validation reads the caller's claims through the same path so the JWT and introspection code paths canonicalize against the same source. The session-upgrade defense (anonymous session, caller presents a token) moves to the unauthenticated-sentinel branch and works exactly as before. RestoreSession reconstructs identity from the stored binding rather than from a separate identity-subject key. Sessions written under the legacy token-hash schema return the bare transportsession.ErrSessionNotFound sentinel so the client receives the standard "re-initialise" signal one forced re-auth at deploy is preferable to a rebind-on-first-use window that would re-introduce the hijack the check exists to block. Downstream callers of the removed WithHMACSecret option (cli/serve.go) and the Phase-2 marker switch (sessionmanager) follow in the next commits; tests catch up after that. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Use MetadataKeyIdentityBinding as the Phase-2 marker BindSession writes MetadataKeyIdentityBinding on every successful session creation (sentinel for anonymous, real binding for authenticated), so its presence is the new way to distinguish a fully-initialised Phase-2 session from a Generate()-only placeholder. Without this swap, the now-unwritten MetadataKeyTokenHash would always read as absent and Terminate would take the placeholder path on every session, breaking termination semantics. The behaviour for legacy sessions still in Redis from before the migration is intentional and documented in the plan: Terminate writes the placeholder-terminated marker (Update with MetadataKeyTerminated) which sits for the TTL; loadSession returns transportsession.ErrSessionNotFound so the client transparently re-initialises. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
createSessionFactory no longer takes an HMAC secret or a Kubernetes detection flag. VMCP_SESSION_HMAC_SECRET stays readable from the environment for one deploy cycle (DEBUG-logged and ignored) so the operator-side env-var injection can be removed in a follow-up PR without forcing a coordinated cut-over. Add a startup WARN when incoming auth is configured as "anonymous": AnonymousMiddleware populates the same (iss, sub) for every request, so all callers collide on one identity binding and per-identity hijack prevention degrades to dev-only behaviour. Surface that to operators rather than letting them assume the binding still scopes per user. Document the trust boundary in docs/arch/13-vmcp-scalability.md: the new scheme stores plaintext (iss, sub) at rest in Redis/Valkey, which trades the HMAC's at-rest opacity for refresh correctness. Operators must layer Redis ACLs or NetworkPolicies if a Redis dump revealing identity is unacceptable. Add a TODO on extractBindingID for the forward-looking RFC 7662 case (probe IdP introspection responses for iss + sub) so it surfaces if that becomes a top-level incoming-auth type. The five HMAC-specific serve_test.go cases collapse to one table-driven test of the new signature; the removed cases targeted HMAC validation that no longer exists. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5378 +/- ##
==========================================
- Coverage 68.74% 68.73% -0.02%
==========================================
Files 626 627 +1
Lines 63496 63498 +2
==========================================
- Hits 43650 43645 -5
- Misses 16602 16608 +6
- Partials 3244 3245 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
vMCP's session-hijack-prevention decorator hashed the bearer token at session
creation and rejected requests whose token hashed differently. Every legitimate
OAuth refresh produces new token bytes for the same
(iss, sub)identity, sothe decorator misclassified each refresh as a hijack and tore the session down
with
Unauthorized: caller identity does not match session owner. Users wereforced to re-authenticate on every access-token rotation — minutes in production
deployments using short ATs.
(iss, sub)tuple extractedfrom OIDC
Claims, stored underMetadataKeyIdentityBinding. Format-and-parseencapsulated in a new leaf package
pkg/vmcp/session/binding/.introspection code paths, so token rotation produces identical bindings.
via the
unauthenticatedsentinel branch.RestoreSessionreconstructs identity from the stored binding instead of aseparate identity-subject key. Sessions written under the legacy token-hash
schema return
transportsession.ErrSessionNotFound(forced re-init), avoidingthe rebind-on-first-use window that would re-introduce the hijack vector.
WithHMACSecret, salt generation, hex hashing) and its k8ssecret requirement removed end-to-end.
Closes #5306
Type of change
Test plan
task test) — full vmcp tree greentask lint-fix) — 0 issuesOAuth refresh; vMCP logs `WARN reason=token_hash_mismatch`.
`identity_binding_mismatch` warnings on the refresh path. Bug trigger
condition (`same sub=True, same tsid=True, same bytes=False`) met and
request succeeded.
API Compatibility
Does this introduce a user-facing change?
Yes. Two operator-relevant behaviors:
All existing vMCP sessions invalidate on upgrade across this version.
The session-binding storage key changes from `vmcp.token.hash` to
`vmcp.identity.binding`. Sessions written by the previous version return
`ErrSessionNotFound` on first read by the new version — clients re-initialise
transparently (one forced re-auth pulse during the rollout window).
Operators wanting a zero-disruption upgrade should use a blue/green deploy
or a maintenance window. Mid-rollout (mixed old/new pods) is safe in both
directions: sessions invalidate symmetrically, never corrupt.
The `--vmcp-session-hmac-secret` flag / `VMCP_SESSION_HMAC_SECRET` env
var are no longer used. Both are accepted silently (DEBUG log only) for
one release to ease migration; remove from your deployment manifests at
convenience.
Implementation plan
Approved implementation plan
Design RFC: THV-0073 — vMCP Session Binding by Identity Tuple
Commit-by-commit shape on this branch:
(`Format` / `Parse` / `IsUnauthenticated` + `UnauthenticatedSentinel`).
the new metadata key + tightened anonymous-classification logic that
reads `iss`/`sub` from `Claims` (fail-closed on non-string claims).
rewrite (`security.go`), factory wiring (`factory.go`), and Phase-2
marker swap in `sessionmanager`. Bundled because the production code
doesn't compile cleanly across them in isolation.
removes the now-unused k8s secret requirement and adds operator
guidance on Redis access control.
Special notes for reviewers
doc states the trust boundary — the decorator trusts that
`identity.Claims["iss"]` / `Claims["sub"]` came from a validated token.
Token validation happens upstream in the OIDC middleware; this layer does
not re-validate.
and documented inline — both `iss` (issuer URL, public via discovery) and
`sub` (per-issuer opaque identifier) are not secret.
sessions reconstruct `*auth.Identity` with empty `Token` and nil
`UpstreamTokens`. Outgoing-auth strategies that need a live token
(`tokenexchange`, `aws_sts`, `upstream_inject`) fail backend init on
restore and the affected backends are skipped — same behavior as the
pre-patch code (HMAC binding also never persisted the token). Tracked
separately; not in scope for this PR.
10-file guideline. Bundled because the binding-format change is one
logical refactor that doesn't decompose into smaller PRs that build
standalone (the security decorator and the factory wiring are tightly
coupled around the on-the-wire format).
🤖 Generated with Claude Code