Skip to content

[comp] Production Deploy#3057

Merged
tofikwest merged 6 commits into
releasefrom
main
Jun 8, 2026
Merged

[comp] Production Deploy#3057
tofikwest merged 6 commits into
releasefrom
main

Conversation

@github-actions

@github-actions github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

This is an automated pull request to release the candidate branch into production, which will trigger a deployment.
It was created by the [Production PR] action.


Summary by cubic

Retry STS role assumes and resolve AWS check-path sessions in ECS via a new internal endpoint. Prevents first-scan “Could not assume role” false failures and lets Trigger.dev checks run without base AWS creds.

  • New Features

    • Added a service-token-only internal endpoint to resolve short-lived AWS credentials in ECS; org-scoped and returns not_configured/assume_failed as needed.
    • Trigger check tasks call this endpoint and inject temp creds; assumeAwsSession consumes injected creds or surfaces the injected error, keeping the cross-tenant roleAssumer in ECS.
  • Bug Fixes

    • Wrapped both AssumeRole hops in checks and scans with retryAssume from @trycompai/integration-platform (capped backoff + jitter for transient IAM/STS, 5xx, and network errors).
    • Exported the helper and added tests for retry logic, classification, and the new session-resolution path.

Written for commit 245f66b. Summary will update on new commits.

Review in cubic

github-actions Bot and others added 4 commits June 5, 2026 22:53
…"could not assume role")

Customer-reported: the first AWS scan fails with a "Could not assume AWS role"
finding; switching the scan engine and back, then re-running, works — on every
account, with no change in AWS. Customer correctly concluded it's not the role
(it assumes fine the second time).

Root cause (verified): assuming the customer's cross-account role is a two-hop
STS flow with NO application-level retry, and the AWS SDK's default retry only
covers throttling/5xx/network — NOT the classes that actually occur here:
- IAM/STS eventual consistency on a new or just-edited role/trust-policy returns
  AccessDenied on the first assume and succeeds seconds later (no AWS change).
- The base/roleAssumer session can briefly return ExpiredToken.
So a single transient failure surfaced as a sticky finding. The "switch switch"
is INCIDENTAL — updateMode only writes metadata (idempotent, no cache/cred
refresh); what actually fixes it is the elapsed time before the retry.

The visible finding is written by the integration-platform checks path
(resolveAwsSessionOrFail in manifests/aws/checks/shared.ts), not the Cloud
Tests scan (which returns success:false with no finding) — so both assume
implementations are fixed.

- new shared helper retryAssume / isRetryableAssumeError
  (manifests/aws/checks/assume-retry.ts): bounded exponential backoff + jitter,
  retries only transient / eventual-consistency classes (AccessDenied,
  ExpiredToken, IDPCommunicationError, throttling, 5xx, network). Hard config
  errors (bad ARN, ValidationError) and a persistently-denied role still
  propagate; backoff capped so a genuinely broken role fails within seconds.
- wrap both AssumeRole hops in shared.ts (checks path) and
  aws-security.service.ts assumeRole (Cloud Tests scan path).
- export retryAssume from the package barrel for apps/api.
- tests: retry-then-succeed, no-retry on config errors, give-up after N, classifier.

Note: not yet live-verified; the exact STS class (AccessDenied vs ExpiredToken)
is already captured in evidence.error on the failing finding — the fix covers
both. cloud-security suite green (311); helper tests green; typecheck clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…er.dev

The integration-platform AWS checks run inside the Trigger.dev runtime and
assumed the cross-account role there via process.env.SECURITY_HUB_ROLE_ASSUMER_ARN.
That env var (and any base AWS credentials) only exist on the ECS API task role,
not in Trigger.dev, so every AWS connection's checks failed with "Could not
assume AWS role" — while scans worked, because the scan task calls back to ECS.

Resolve the session in ECS instead: a new internal, service-token-only
resolve-session endpoint performs the partition-aware two-hop assume (reusing
the scan path's logic) and returns only short-lived, read-only, single-customer
temp creds. The check task injects those into the check credentials, and
assumeAwsSession uses them directly. The cross-tenant roleAssumer credential
never leaves ECS. The in-runtime two-hop is kept as a fallback for the ECS
controller callers and local dev.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix(cloud-security): retry transient AssumeRole failures (first-scan "could not assume role")
@vercel

vercel Bot commented Jun 8, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
app (staging) Ready Ready Preview, Comment Jun 8, 2026 8:08pm
comp-framework-editor (staging) Ready Ready Preview, Comment Jun 8, 2026 8:08pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
portal (staging) Skipped Skipped Jun 8, 2026 8:08pm

Request Review

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 5 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Re-trigger cubic

@vercel vercel Bot temporarily deployed to staging – portal June 8, 2026 20:05 Inactive
@tofikwest tofikwest merged commit dbf5552 into release Jun 8, 2026
14 checks passed
@claudfuen

Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.73.2 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants