Conversation
…"could not assume role") Customer-reported: the first AWS scan fails with a "Could not assume AWS role" finding; switching the scan engine and back, then re-running, works — on every account, with no change in AWS. Customer correctly concluded it's not the role (it assumes fine the second time). Root cause (verified): assuming the customer's cross-account role is a two-hop STS flow with NO application-level retry, and the AWS SDK's default retry only covers throttling/5xx/network — NOT the classes that actually occur here: - IAM/STS eventual consistency on a new or just-edited role/trust-policy returns AccessDenied on the first assume and succeeds seconds later (no AWS change). - The base/roleAssumer session can briefly return ExpiredToken. So a single transient failure surfaced as a sticky finding. The "switch switch" is INCIDENTAL — updateMode only writes metadata (idempotent, no cache/cred refresh); what actually fixes it is the elapsed time before the retry. The visible finding is written by the integration-platform checks path (resolveAwsSessionOrFail in manifests/aws/checks/shared.ts), not the Cloud Tests scan (which returns success:false with no finding) — so both assume implementations are fixed. - new shared helper retryAssume / isRetryableAssumeError (manifests/aws/checks/assume-retry.ts): bounded exponential backoff + jitter, retries only transient / eventual-consistency classes (AccessDenied, ExpiredToken, IDPCommunicationError, throttling, 5xx, network). Hard config errors (bad ARN, ValidationError) and a persistently-denied role still propagate; backoff capped so a genuinely broken role fails within seconds. - wrap both AssumeRole hops in shared.ts (checks path) and aws-security.service.ts assumeRole (Cloud Tests scan path). - export retryAssume from the package barrel for apps/api. - tests: retry-then-succeed, no-retry on config errors, give-up after N, classifier. Note: not yet live-verified; the exact STS class (AccessDenied vs ExpiredToken) is already captured in evidence.error on the failing finding — the fix covers both. cloud-security suite green (311); helper tests green; typecheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…er.dev The integration-platform AWS checks run inside the Trigger.dev runtime and assumed the cross-account role there via process.env.SECURITY_HUB_ROLE_ASSUMER_ARN. That env var (and any base AWS credentials) only exist on the ECS API task role, not in Trigger.dev, so every AWS connection's checks failed with "Could not assume AWS role" — while scans worked, because the scan task calls back to ECS. Resolve the session in ECS instead: a new internal, service-token-only resolve-session endpoint performs the partition-aware two-hop assume (reusing the scan path's logic) and returns only short-lived, read-only, single-customer temp creds. The check task injects those into the check credentials, and assumeAwsSession uses them directly. The cross-tenant roleAssumer credential never leaves ECS. The in-runtime two-hop is kept as a fallback for the ECS controller callers and local dev. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix(cloud-security): retry transient AssumeRole failures (first-scan "could not assume role")
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
…ion-via-ecs fix(cloud-security): resolve AWS check-path session in ECS, not Trigger.dev
Contributor
|
🎉 This PR is included in version 3.73.2 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is an automated pull request to release the candidate branch into production, which will trigger a deployment.
It was created by the [Production PR] action.
Summary by cubic
Retry STS role assumes and resolve AWS check-path sessions in ECS via a new internal endpoint. Prevents first-scan “Could not assume role” false failures and lets Trigger.dev checks run without base AWS creds.
New Features
assumeAwsSessionconsumes injected creds or surfaces the injected error, keeping the cross-tenant roleAssumer in ECS.Bug Fixes
retryAssumefrom@trycompai/integration-platform(capped backoff + jitter for transient IAM/STS, 5xx, and network errors).Written for commit 245f66b. Summary will update on new commits.