Skip to content

feat: add per-key key-scope option for Windows machine vs user key store#1021

Draft
joshdrake wants to merge 2 commits intomasterfrom
feat/add-tpm-key-scope
Draft

feat: add per-key key-scope option for Windows machine vs user key store#1021
joshdrake wants to merge 2 commits intomasterfrom
feat/add-tpm-key-scope

Conversation

@joshdrake
Copy link
Copy Markdown
Contributor

Summary

  • Adds key-scope=machine|user URI parameter to kms/tpmkms, controlling whether the underlying private key lives in the local machine key store (NCRYPT_MACHINE_KEY_FLAG) or the current user's key store. Independent of store-location, so callers can pair a machine-store certificate with a user-owned key (e.g., a LocalSystem service that wants its cert visible machine-wide but its key only openable by SYSTEM) or vice versa.
  • Threads MachineKey per-key through tpm.CreateKeyConfig / tpm.AttestKeyConfig / tpm.CreateAKConfig and into the underlying NCrypt PCP create call. Per-key, not per-TPM-instance — a single tpm.TPM handle can mint or open keys in either scope.
  • t.open() reads the requested scope from context and applies it to attestConfig.MachineKey under the lock just before attest.OpenTPM. Safe because attestTPM is recreated on every open().
  • Persists MachineKey in storage.AK and storage.Key so reload after process restart uses the matching scope without the URI needing to carry it. JSON omitempty so existing on-disk stores remain readable and default to false (= existing user-scope behavior).
  • tpmkms.CreateKey preserves key-scope=machine in the returned URI so downstream CreateSigner/DeleteKey calls don't lose scope (closes a returned-URI bug we hit during testing).
  • AttestKey reads the AK's persisted scope and uses it for the attest session — attested keys inherit their AK's scope; conflicting AttestKeyConfig.MachineKey is rejected with a clear error.
  • DeleteKey/DeleteAK always clean up the file-store entry even if the in-TPM (NCrypt) deletion fails, preventing repeated retries from re-encountering the same failing entry. When the file-store cleanup succeeded but the in-TPM step did not, returns a typed *PartialDeleteError so callers can distinguish a partial delete from total failure.
  • Bumps github.com/smallstep/go-attestation to v0.4.9 for the OpenConfig.MachineKey field this change drives.

Defaults: when key-scope is unset, derive from store-location for back-compat (store-location=machinekey-scope=machine).

Background

Code review and a Windows test matrix (interactive admin + LocalSystem via psexec -s -i 0, both x86_64) showed that the agent's TPM-backed device-identity keys land under LocalSystem's user-profile PCPKSP scope rather than the machine scope, even when store-location=machine is set. The kms/capi machine-key fix from #802 doesn't apply because the agent uses kms/tpmkms (with enable-cng=true) — that path goes through tpm/internal/key/pcp_windows.go, where nCryptCreatePersistedKey was hardcoded to dwFlags=0. Fixing it requires (a) plumbing a MachineKey flag through the tpm package and (b) deciding how to express user intent at the URI layer.

This PR is an alternative shape to #1006. Differences:

  • Per-key, not per-TPM-instance. Herman's review feedback on Add store-location=machine URI option for Windows machine key store #1006 pushed back on WithMachineKey() being a per-TPM option (would affect all keys going through that handle); the discussion converged on per-key being the right model. This PR puts MachineKey on the per-key configs (CreateKeyConfig, AttestKeyConfig, CreateAKConfig) and uses a context value to thread it into t.open() for the operations that need it.
  • MachineKey lives on options, not attestConfig. Also from Herman's review.
  • key-scope is decoupled from store-location. Cert location and key ownership are orthogonal Windows concepts. There are real scenarios where a service wants its cert in LocalMachine\My for visibility but its key restricted to one identity (e.g., LocalSystem-only) — key-scope=user with store-location=machine expresses that. Default-derive from store-location for back-compat.
  • Typed *PartialDeleteError for DeleteKey/DeleteAK. Also from Herman's review.

Test plan

  • go build ./... clean on linux/amd64 and windows/amd64
  • go vet ./... clean (one pre-existing master warning in kms/capi/ncrypt_windows.go:569, unrelated to this change)
  • go test ./... green across all packages
  • Empirical Windows matrix (throwaway diagnostic harness, run interactive admin + psexec -s -i 0 LocalSystem):
    • key-scope=machine lands keys at C:\ProgramData\Microsoft\Crypto\PCPKSP\<sid-hash>\<file>.PCPKEY (interactive admin + LocalSystem).
    • key-scope=user lands keys at <userprofile>\AppData\Local\Microsoft\Crypto\PCPKSP\<sid-hash>\<file>.PCPKEY (LocalSystem profile when running as SYSTEM).
    • Decoupling proven: key-scope=machine + store-location=user puts cert in user store and key in machine PCPKSP; key-scope=user + store-location=machine puts cert in machine store and key in user PCPKSP.
    • Back-compat: key-scope unset + store-location=machine correctly resolves to machine scope.
    • CreateSigner via the URI returned by CreateKey round-trips successfully in every TPM scenario; sign+verify ok.

Out of scope

  • Agent-side change to set key-scope=machine on the device-identity TPM URI; lands separately in the agent repo.
  • Equivalent fix for kms/capi's returned-URI bug (different root cause, different file). Tracked separately.

🤖 Generated with Claude Code

Introduces a key-scope=machine|user URI parameter on tpmkms that
controls whether the underlying private key lives in the local machine
key store (NCRYPT_MACHINE_KEY_FLAG) or in the current user's key store.
The parameter is independent of store-location, which controls cert
location, so callers can request machine-store certs paired with
user-owned keys (or vice versa) for security-hardening scenarios such
as a service whose cert needs machine visibility but whose key should
only be openable by the service identity.

Architecture:
  - MachineKey is a per-key concern, not a per-TPM-instance one. Threaded
    through CreateKeyConfig, AttestKeyConfig, and CreateAKConfig, then
    via context into TPM.open() which sets attestConfig.MachineKey under
    the lock just before attest.OpenTPM. Persisted in storage so reload
    after restart uses the matching scope without the URI carrying it.
  - tpmkms.CreateKey now preserves key-scope=machine in the returned URI
    so downstream CreateSigner/DeleteKey calls don't lose scope.
  - AttestKey reads the AK's persisted scope and uses it for the attest
    session; the new key inherits the AK's scope. Conflicting explicit
    AttestKeyConfig.MachineKey is rejected with a clear error.
  - Defaults for back-compat: when key-scope is unset, derive from
    store-location (machine cert -> machine key) so existing URIs that
    happened to set store-location=machine continue to work.

Also fixes related delete-path issues:
  - DeleteKey/DeleteAK now always clean up the file-store entry even if
    the in-TPM (NCrypt) deletion fails, preventing repeated retries from
    re-encountering the same failing entry.
  - Returns a typed *PartialDeleteError when the in-TPM deletion failed
    but the file-store cleanup succeeded, so callers can distinguish
    that case from a complete failure.

Bumps github.com/smallstep/go-attestation to v0.4.9, which adds the
OpenConfig.MachineKey field this change drives.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Previously the check only fired when the attested key was requested as
machine-scope while the AK was user-scope; the reverse case (AK
machine-scope, attested key requested user-scope) silently inherited
the AK's scope and produced a machine-scope key, which is surprising
behaviour for a caller who explicitly asked for user. Reject both
mismatch directions so the inheritance rule is loud, not silent.

Documented in the error message that attested keys inherit the AK's
scope and that callers must set AttestKeyConfig.MachineKey to match
the AK's scope explicitly.

Caught by the capi-diag attest-roundtrip matrix: the
ak-machine+key-user_xfail row was producing a successful machine-scope
attested key instead of erroring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants