fix(cloud-security): run AWS integration checks on our server (scheduled false-fails) by tofikwest · Pull Request #3133 · trycompai/comp

tofikwest · 2026-06-12T20:30:44Z

Problem

AWS integration checks (S3 Block Public Access, encryption, EC2, RDS, KMS, CloudTrail, IAM) false-fail on the scheduled / auto-run paths but pass on the manual "Run" — repeatedly (CS reports over the last ~2 weeks). Customer-visible error:

AccessDenied: ... is not authorized to perform: s3:GetBucketPublicAccessBlock ...
because no VPC endpoint policy allows the s3:GetBucketPublicAccessBlock action

Root cause

It is not a code bug — it's the same check code with the same credentials. The only difference is where it executes, and therefore which network the AWS API call exits through:

	Executes in	S3 call exits via	Result
Manual "Run"	our server (ECS), in `comp-production-vpc`	our S3 endpoint (allows `s3:Get*`)	✅
Scheduled / auto-run	Trigger.dev's runtime (their VPC)	Trigger.dev's S3 endpoint (blocks cross-account)	❌

AWS evaluates allow/deny based on the VPC the request exits from — which, for the scheduled path, lives in Trigger.dev's AWS account and is not ours to change. (Confirmed Trigger.dev is Cloud, not self-hosted: our infra only whitelists Trigger.dev's static IPs; there's no Trigger VPC in our account. Their PrivateLink/static-IP features don't cover public-S3 egress.)

This is AWS-specific: GCP/Azure/dynamic/legacy checks make plain HTTPS calls to public APIs with no VPC endpoint involved, so they work identically in either environment.

Fix (AWS only)

When providerSlug === 'aws', the scheduled task (run-task-integration-checks) and the auto-run task (run-connection-checks) delegate execution to our server via a new internal, service-token-only endpoint, then persist the returned result with the existing shared logic. Trigger.dev still owns scheduling, retries, and the long-run tolerance — only where the AWS API calls run changed.

New ConnectionCheckRunnerService.runChecks() — runs a connection's checks on our server and returns the raw result (no persistence).
New POST /v1/integrations/internal/run-connection-checks/:connectionId (HybridAuthGuard + ServiceTokenOnlyGuard + PermissionGuard) — excluded from public docs/MCP.
New run-checks-on-server.ts — the thin fetch helper the AWS branch calls.
The two Trigger tasks gain a single providerSlug === 'aws' ? runChecksOnServer(...) : runAllChecks(...) branch. The else branch is the original code, unchanged (the diff is just re-indentation).

Explicitly NOT touched

GCP, Azure, dynamic, and legacy integrations — byte-for-byte unchanged; keep executing in Trigger.dev.
The manual "Run" path (already on our server), the orchestrator, and the shared persistence / task-status / email logic.
Remediation ("Find & Apply") is a separate flow that already runs in the API (ECS) — not affected.

Verification

✅ 16 tests pass (new service/controller/helper specs + existing trigger specs).
✅ Changed files are type-clean (remaining repo typecheck errors are pre-existing on main, in untouched files).
✅ openapi.json unchanged — the internal endpoint is excluded from public docs.

Rollout

⚠️ This changes where AWS scheduled checks execute for all orgs. The AWS-on-our-server path is the same one the manual "Run" and the Cloud Tests scan already use in prod, but it isn't end-to-end tested locally. Deploy to the staging Trigger.dev env first, trigger integration-checks-schedule manually, confirm AWS checks pass and the API handles the load, then prod.

Optional tidy-up (left out to keep the diff minimal): for AWS the task still makes two now-unused pre-flight ECS calls (ensure-valid-credentials + resolve-session) — harmless, removable in a follow-up.

🤖 Generated with Claude Code

Summary by cubic

Run AWS integration checks on our server for scheduled and auto-run paths to stop false failures caused by Trigger.dev VPC endpoint policies. This aligns scheduled runs with manual runs, removes AccessDenied errors, and avoids 429s during fan-out.

Bug Fixes
- Root cause: scheduled/auto runs executed in Trigger.dev’s VPC, whose S3 endpoint blocked cross-account reads (e.g., s3:GetBucketPublicAccessBlock).
- Fix (AWS only): Trigger tasks call a new internal service-token endpoint on our API to execute checks in our VPC; results are persisted by existing logic.
- New: ConnectionCheckRunnerService, POST /v1/integrations/internal/run-connection-checks/:connectionId (internal-only, throttle-exempt), and a thin run-checks-on-server helper; excluded /v1/integrations/internal from public OpenAPI.
- Reliability: skipped Trigger-side credential/session preflight for AWS, added a 10‑minute AbortController timeout on the internal fetch, validated credentials by auth type on the server-run path (oauth2/api_key/basic/custom), and handled per-check transport errors individually on AWS so one blip doesn’t abort sibling checks; non‑AWS errors still throw as before.
- Unchanged: GCP, Azure, dynamic/legacy providers, and manual “Run”.
Migration
- Deploy to staging Trigger.dev first; manually trigger the schedule and verify AWS checks pass, then roll out to production.
- Ensure SERVICE_TOKEN_TRIGGER is set for the Trigger runtime.

^{Written for commit 91ca27b. Summary will update on new commits.}

…led false-fails) AWS integration checks (S3, EC2, RDS, KMS, CloudTrail, IAM) false-failed on the scheduled and auto-run paths while the manual "Run" passed. Same code, same credentials — the only difference is where the check executes: - Manual runs on our server (ECS), inside our VPC, whose S3 endpoint allows our cross-account audit reads. - Scheduled/auto-run execute in the Trigger.dev runtime, whose VPC S3 endpoint policy blocks cross-account reads ("no VPC endpoint policy allows ..."). AWS allows/denies based on the VPC the request exits from, which lives in Trigger.dev's account and isn't ours to change. Fix (AWS only): when providerSlug === 'aws', the scheduled and auto-run Trigger tasks delegate execution to our server via a new service-token endpoint (POST /v1/integrations/internal/run-connection-checks/:connectionId) and persist the returned result with the existing shared logic. Trigger.dev still handles scheduling and retries; only where the AWS API calls run changed. GCP, Azure, dynamic and legacy integrations are untouched — they make plain HTTPS calls (no VPC endpoint) and keep executing in Trigger.dev exactly as before. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel · 2026-06-12T20:30:48Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
comp-framework-editor	Ready	Preview, Comment	Jun 12, 2026 8:51pm

2 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
app	Skipped		Jun 12, 2026 8:51pm
portal	Skipped		Jun 12, 2026 8:51pm

cubic-dev-ai

3 issues found across 10 files

Confidence score: 3/5

The highest risk is in apps/api/src/trigger/integration-platform/run-task-integration-checks.ts and apps/api/src/trigger/integration-platform/run-connection-checks.ts: the AWS path still performs Trigger-side credential/session preflight calls before server-side checks, so transient preflight failures can incorrectly fail scheduled AWS runs. This weakens the intended server-run-only flow and can create user-visible false negatives—skip those preflight calls when providerSlug === 'aws' before merging.
In apps/api/src/trigger/integration-platform/run-checks-on-server.ts, the internal fetch has no timeout, so a network stall can hang AWS scheduled checks instead of failing fast and retrying. Add an abort timeout to bound request duration and let retry/error handling recover predictably.

_{Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic}

Cubic (and an independent adversarial review) flagged three issues on the AWS server-run path: 1. The AWS branch still ran the Trigger-side credential/session preflight (requestValidCredentials + injectAwsResolvedSession) before delegating to the server. Those calls are unused on the server path (it decrypts creds and assumes the role itself), so a transient preflight failure would falsely fail an AWS run. Skip the preflight entirely for AWS and drop the now-dead injectAwsResolvedSession call (both Trigger tasks). 2. The internal fetch had no timeout, so a hung connection could block the task until maxDuration. Add a generous AbortController timeout (10m — well below the 15m maxDuration but high enough never to abort a legitimately long run) so a stalled socket surfaces as an error and the task retries. 3. (Review) The scheduled per-check loop was inside one outer try/catch; because runChecksOnServer throws on transport failure and several AWS checks share a task, a blip on one check aborted its siblings and skipped lastSyncAt/status. Catch the throw per-check, record that check as failed, and continue — matching runAllChecks' per-check resilience (hasExecutionErrors keeps integrationLastRunAt unwritten so the next tick retries). Non-AWS (GCP/Azure/dynamic/legacy) paths remain unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tofikwest · 2026-06-12T20:45:37Z

@cubic-dev-ai review it

cubic-dev-ai · 2026-06-12T20:45:47Z

@cubic-dev-ai review it

@tofikwest I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

1 issue found across 10 files

Confidence score: 3/5

In apps/api/src/trigger/integration-platform/run-task-integration-checks.ts, the shared catch path appears to treat non-AWS exceptions like AWS transport blips, which can mask real runAllChecks failures and let unexpected integration-check errors be silently downgraded; before merging, re-throw non-AWS errors so only true AWS transient failures follow the retry/degraded path.

_{Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic}

The per-check catch added for AWS server-run resilience wrapped both branches, so a (rare) runAllChecks throw on a non-AWS provider would be downgraded to a per-check failure instead of propagating. Re-throw when providerSlug !== 'aws' so non-AWS behavior is unchanged from before this PR; only AWS transport blips get the degrade-and-continue treatment. (cubic review) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tofikwest · 2026-06-12T20:51:47Z

@cubic-dev-ai ultrareview it

cubic-dev-ai · 2026-06-12T20:51:53Z

@cubic-dev-ai ultrareview it

@tofikwest Starting ultrareview. I'll post findings when complete.

cubic-dev-ai

Ultrareview completed in 28m 1s

2 issues found across 10 files

Confidence score: 3/5

In apps/api/src/integration-platform/controllers/internal-checks.controller.ts, the new internal AWS check-run endpoint still goes through global rate limiting, so bursty scheduled runs can hit 429s and repeat the same check failures after merge — exempt this internal route from the global limiter (or add a dedicated higher-limit policy) before merging.
In apps/api/src/integration-platform/services/connection-check-runner.service.ts, the server-run path skips auth-type credential validation used by the existing run flow, which can let malformed credentials execute and produce inconsistent check outcomes — align server-run validation with the existing path before merging.

_{Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic}

… + validate creds by auth type cubic Ultrareview findings: 1. The internal run-connection-checks endpoint went through the global ThrottlerGuard, so the 6 AM AWS fan-out could hit 429s and re-fail checks. Add @SkipThrottle() (matching the Trigger-called resolve-session endpoint). 2. ConnectionCheckRunnerService only checked for missing credentials; the in-app run paths also validate by auth type. Align it (oauth2 / api_key / basic / custom) so a server-run rejects malformed credentials up front with a clear error instead of executing the check on bad input. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tofikwest · 2026-06-12T21:27:12Z

@cubic-dev-ai review it

cubic-dev-ai · 2026-06-12T21:27:20Z

@cubic-dev-ai review it

@tofikwest I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

No issues found across 10 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

_{Re-trigger cubic}

claudfuen · 2026-06-12T21:39:22Z

🎉 This PR is included in version 3.82.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

vercel Bot temporarily deployed to Preview – app June 12, 2026 20:30 Inactive

vercel Bot temporarily deployed to Preview – portal June 12, 2026 20:30 Inactive

vercel Bot deployed to Preview – comp-framework-editor June 12, 2026 20:31 View deployment

cubic-dev-ai Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread apps/api/src/trigger/integration-platform/run-task-integration-checks.ts Outdated

Comment thread apps/api/src/trigger/integration-platform/run-checks-on-server.ts Outdated

Comment thread apps/api/src/trigger/integration-platform/run-connection-checks.ts

Merge branch 'main' into tofik/aws-checks-on-server

918d95a

vercel Bot deployed to Preview – comp-framework-editor June 12, 2026 20:37 View deployment

vercel Bot deployed to Preview – portal June 12, 2026 20:39 View deployment

vercel Bot deployed to Preview – app June 12, 2026 20:39 View deployment

vercel Bot temporarily deployed to Preview – portal June 12, 2026 20:44 Inactive

vercel Bot temporarily deployed to Preview – app June 12, 2026 20:44 Inactive

vercel Bot deployed to Preview – comp-framework-editor June 12, 2026 20:46 View deployment

cubic-dev-ai Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread apps/api/src/trigger/integration-platform/run-task-integration-checks.ts

vercel Bot temporarily deployed to Preview – portal June 12, 2026 20:50 Inactive

vercel Bot temporarily deployed to Preview – app June 12, 2026 20:50 Inactive

vercel Bot deployed to Preview – comp-framework-editor June 12, 2026 20:51 View deployment

cubic-dev-ai Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread apps/api/src/integration-platform/services/connection-check-runner.service.ts

Comment thread apps/api/src/integration-platform/controllers/internal-checks.controller.ts

cubic-dev-ai Bot reviewed Jun 12, 2026

View reviewed changes

tofikwest merged commit a7399d8 into main Jun 12, 2026
7 checks passed

tofikwest deleted the tofik/aws-checks-on-server branch June 12, 2026 21:33

claudfuen added the released label Jun 12, 2026

vercel Bot temporarily deployed to Preview – portal June 12, 2026 22:09 Inactive

vercel Bot temporarily deployed to Preview – app June 12, 2026 22:09 Inactive

vercel Bot deployed to Preview – comp-framework-editor June 12, 2026 22:10 View deployment

Conversation

tofikwest commented Jun 12, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Fix (AWS only)

Explicitly NOT touched

Verification

Rollout

Summary by cubic

Uh oh!

vercel Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tofikwest commented Jun 12, 2026

Uh oh!

cubic-dev-ai Bot commented Jun 12, 2026

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tofikwest commented Jun 12, 2026

Uh oh!

cubic-dev-ai Bot commented Jun 12, 2026

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tofikwest commented Jun 12, 2026

Uh oh!

cubic-dev-ai Bot commented Jun 12, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claudfuen commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tofikwest commented Jun 12, 2026 •

edited by cubic-dev-ai Bot

Loading

vercel Bot commented Jun 12, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading