Skip to content

broker: mint-aws-creds is broken after cloud-setup §4 federation lands #71

@hanwencheng

Description

@hanwencheng

Context

After applying docs/cloud-setup.md §4 (OIDC federation), the legacy POST /v1/mint-aws-creds endpoint stops working — and any path that revives it silently undoes the cloud-enforced per-user isolation that §4 just bought us.

This isn't a "we should optimize this away someday" issue — it's a correctness gap operators will hit the moment they finish §4.

Where it breaks

Three independent failure points, all introduced by §4:

  1. §4.2 — trust-policy replacement. agentkeys-data-role's trust policy flips from Principal: {AWS: agentkeys-daemon} to Principal: {Federated: oidc-provider}. It's a replace, not an append. The broker's daemon IAM user can no longer call sts:AssumeRole on the role at all — every mint-aws-creds call returns 502 sts_error (visible in the audit log).
  2. §4.4 — bucket policy assumes PrincipalTag. Even if §4.2's trust policy were augmented to allow both principals, the resulting AssumeRole (non-WebIdentity) session has no PrincipalTag, so ${aws:PrincipalTag/agentkeys_user_wallet}/* expands to empty → AccessDenied on every S3 op.
  3. §4.4.1 — inline-policy strip. §4.4.1 explicitly removes the broad s3:* grant from agentkeys-data-role-inline. That grant was the only thing that made the untagged AssumeRole session usable. Reintroducing it silently undoes §4's whole isolation property.

Migration options

Option What changes Recommendation
A. Drop the endpoint Retire the route. Rust daemon/CLI calls mint-oidc-jwt itself, does AssumeRoleWithWebIdentity, injects raw AWS_* env vars into the scraper subprocess. ~30 lines of Rust. Cleanest. Pick once no external caller depends on the wire shape.
B. Pivot internals, keep wire shape mint-aws-creds body becomes: validate session → mint internal JWT → AssumeRoleWithWebIdentity → return creds. ~50-line patch in crates/agentkeys-broker-server/src/handlers/mint.rs. Audit row labeled oidc_jwt+sts (or similar). Pick this first. Backward-compat for scrapers / CI that already hit the endpoint.
C. Augment trust policy to allow both principals Principal: {AWS: agentkeys-daemon, Federated: oidc-provider} Do not pick. Leaves the untagged-session loophole open; defeats §4.

Side benefit of option B

After the pivot, the broker no longer needs any static AWS credentials at runtime. It signs JWTs; that's the entire IAM surface. Today it still needs agentkeys-daemon's long-lived access key (or instance-profile equivalent) to do AssumeRole. Post-pivot, broker compromise leaks only the OIDC signing key — no STS-callable IAM principal. That's a real reduction in blast radius worth doing on its own merits.

Suggested sequence

  1. Land option B (mint-aws-creds keeps its public contract, internally pivots to AssumeRoleWithWebIdentity).
  2. Update docs/stage7-wip.md and docs/dev-setup.md to describe the new internal flow + the dropped IAM-principal requirement.
  3. Migrate provisioner-scripts / CLI to fetch JWT directly (option A path).
  4. Once option A is the only caller path, retire mint-aws-creds and remove the route.

Out of scope for this issue

  • Generalizing aud beyond sts.amazonaws.com for cross-cloud (GCP WIF, Tencent CAM) — separate follow-up.
  • TEE-derived signer swap — already tracked in stage7-wip §4.6.

Acceptance criteria

  • POST /v1/mint-aws-creds succeeds end-to-end against a §4-federated cloud account (verifiable via the existing docs/cloud-setup.md §4.5 walk-through, but starting from mint-aws-creds instead of mint-oidc-jwt).
  • Returned creds carry the agentkeys_user_wallet PrincipalTag (provable by listing your own wallet prefix succeeds; listing 0xdeadbeef/ returns AccessDenied).
  • The broker host can run with no AWS access key in its environment after the pivot (instance-profile-only is fine; what's gone is the sts:AssumeRole requirement).
  • Audit DB distinguishes pivoted mints from legacy mints (e.g., requested_role = "data-role-via-jwt" vs the historical role ARN).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions