diff --git a/docs/guides/SLACK_SETUP_GUIDE.md b/docs/guides/SLACK_SETUP_GUIDE.md index b56fc5c3..03e7d42b 100644 --- a/docs/guides/SLACK_SETUP_GUIDE.md +++ b/docs/guides/SLACK_SETUP_GUIDE.md @@ -6,7 +6,7 @@ This guide walks through setting up the ABCA Slack integration. Once configured, - ABCA CDK stack deployed (see [Developer guide](./DEVELOPER_GUIDE.md)) - A Cognito user account configured (see [User guide](./USER_GUIDE.md)) -- A Slack workspace where you can install apps (use a personal free workspace if your corporate Slack restricts app installs) +- A Slack workspace where you are authorized to install apps (if your workspace requires admin approval for app installs, request it through your Slack administrator) - AWS CLI configured with credentials for your ABCA account ## Quick start diff --git a/docs/public/imgs/enable-events-after.png b/docs/public/imgs/enable-events-after.png new file mode 100644 index 00000000..83b3a5fd Binary files /dev/null and b/docs/public/imgs/enable-events-after.png differ diff --git a/docs/public/imgs/enable-events-before.png b/docs/public/imgs/enable-events-before.png new file mode 100644 index 00000000..2aee2da4 Binary files /dev/null and b/docs/public/imgs/enable-events-before.png differ diff --git a/docs/public/imgs/find-credentials.png b/docs/public/imgs/find-credentials.png new file mode 100644 index 00000000..74bd252c Binary files /dev/null and b/docs/public/imgs/find-credentials.png differ diff --git a/docs/public/imgs/find-even-subscriptions.png b/docs/public/imgs/find-even-subscriptions.png new file mode 100644 index 00000000..8629b5de Binary files /dev/null and b/docs/public/imgs/find-even-subscriptions.png differ diff --git a/docs/scripts/sync-starlight.mjs b/docs/scripts/sync-starlight.mjs index 1a1a1be1..4ce7705c 100644 --- a/docs/scripts/sync-starlight.mjs +++ b/docs/scripts/sync-starlight.mjs @@ -99,7 +99,17 @@ function ensureFrontmatter(content, title) { .replaceAll('../diagrams/', `${docsBase}/diagrams/`) .replace(/\[([^\]]+)\]\(([^)]+)\)/g, (match, label, target) => { const rewritten = rewriteDocsLinkTarget(target); - return rewritten ? `[${label}](${rewritten})` : match; + if (!rewritten) { + return match; + } + // The site is served under `base` (docsBase), so root-relative routes + // must carry that prefix — otherwise they resolve to the domain root + // and 404. Starlight prefixes its own nav links automatically, but our + // rewritten body links are raw markdown and need it added explicitly + // (same reason the image rewrites above include docsBase). Anchors + // (`#…`) stay untouched. (Fixes the broken in-body design-doc links.) + const prefixed = rewritten.startsWith('#') ? rewritten : `${docsBase}${rewritten}`; + return `[${label}](${prefixed})`; }); const trimmed = normalized.trimStart(); @@ -156,6 +166,25 @@ function mirrorDirectory(sourceDir, targetDirRelative) { } } +// Recursively copy a source asset directory into the site's public/ tree so +// every committed image/diagram is served at its rewritten absolute URL. +// Filenames are preserved verbatim (markdown references them as-is). +function copyAssetDir(sourceDir, targetDir) { + if (!fs.existsSync(sourceDir)) { + return; + } + fs.mkdirSync(targetDir, { recursive: true }); + for (const entry of fs.readdirSync(sourceDir, { withFileTypes: true })) { + const from = path.join(sourceDir, entry.name); + const to = path.join(targetDir, entry.name); + if (entry.isDirectory()) { + copyAssetDir(from, to); + } else if (entry.isFile()) { + fs.copyFileSync(from, to); + } + } +} + function splitGuide(sourcePath, targetDirRelative, introTitle) { if (!fs.existsSync(sourcePath)) { return; @@ -279,5 +308,12 @@ mirrorDirectory(path.join(docsRoot, 'design'), path.join('src', 'content', 'docs // --- Decision records (ADRs): mirror to decisions/ --- mirrorDirectory(path.join(docsRoot, 'decisions'), path.join('src', 'content', 'docs', 'decisions')); +// --- Static assets: copy source image dir into the site's public/ --- +// Guides reference images as `../imgs/foo.png`; ensureFrontmatter() turns +// those into absolute `//imgs/foo.png` URLs, which Astro serves from +// public/. Copy the source dir here so every committed image is published — +// otherwise a new image lands in docs/imgs/ but 404s on the site (#90). +copyAssetDir(path.join(docsRoot, 'imgs'), path.join(docsRoot, 'public', 'imgs')); + // Guardrail: ensure target tree exists when running in a clean checkout. fs.mkdirSync(targetRoot, { recursive: true }); diff --git a/docs/src/content/docs/architecture/Api-contract.md b/docs/src/content/docs/architecture/Api-contract.md index 13433ee3..6358630d 100644 --- a/docs/src/content/docs/architecture/Api-contract.md +++ b/docs/src/content/docs/architecture/Api-contract.md @@ -7,7 +7,7 @@ title: Api contract The REST API is the single entry point for all platform interactions. The CLI, webhook integrations, and any future clients use this API to submit tasks, check status, and manage integrations. This is a design-level specification; the source of truth for types is `cdk/src/handlers/shared/types.ts`. - **Use this doc for:** endpoint paths, payload shapes, auth requirements, and error codes. -- **Related docs:** [INPUT_GATEWAY.md](/architecture/input-gateway) for the gateway's role, [ORCHESTRATOR.md](/architecture/orchestrator) for the task state machine, [SECURITY.md](/architecture/security) for the authentication model. +- **Related docs:** [INPUT_GATEWAY.md](/sample-autonomous-cloud-coding-agents/architecture/input-gateway) for the gateway's role, [ORCHESTRATOR.md](/sample-autonomous-cloud-coding-agents/architecture/orchestrator) for the task state machine, [SECURITY.md](/sample-autonomous-cloud-coding-agents/architecture/security) for the authentication model. ## Base URL @@ -310,7 +310,7 @@ Returns a summary subset of fields. Use `GET /v1/tasks/{task_id}` for full detai DELETE /v1/tasks/{task_id} ``` -Cancels a task. See [ORCHESTRATOR.md](/architecture/orchestrator) for cancellation behavior by state. +Cancels a task. See [ORCHESTRATOR.md](/sample-autonomous-cloud-coding-agents/architecture/orchestrator) for cancellation behavior by state. **Response: `200 OK`** — a compact confirmation body (not a full `TaskDetail`): diff --git a/docs/src/content/docs/architecture/Architecture.md b/docs/src/content/docs/architecture/Architecture.md index 477efe17..5a35da10 100644 --- a/docs/src/content/docs/architecture/Architecture.md +++ b/docs/src/content/docs/architecture/Architecture.md @@ -10,7 +10,7 @@ This document outlines the overall architecture of the platform. Each component ## Design principles -For long-term direction and review tenets, see [VISION.md](/architecture/vision). +For long-term direction and review tenets, see [VISION.md](/sample-autonomous-cloud-coding-agents/architecture/vision). - **Extensibility** - Extend the system without modifying core code. Critical components are accessed through internal interfaces (ComputeStrategy, MemoryStore) so implementations can be swapped. - **Flexibility** - This field moves fast. Components should be replaceable as better options emerge. @@ -39,13 +39,13 @@ flowchart LR The orchestrator and agent are deliberately separated. The orchestrator handles everything deterministic (cheap Lambda invocations); the agent handles everything that needs LLM reasoning (expensive compute + tokens). This separation provides reliability (crashed agents don't leave orphaned state), cost efficiency (bookkeeping doesn't burn tokens), security (the agent can't bypass platform invariants), and testability (deterministic steps are unit-tested without LLM calls). -For the full orchestrator design, see [ORCHESTRATOR.md](/architecture/orchestrator). For the API contract, see [API_CONTRACT.md](/architecture/api-contract). +For the full orchestrator design, see [ORCHESTRATOR.md](/sample-autonomous-cloud-coding-agents/architecture/orchestrator). For the API contract, see [API_CONTRACT.md](/sample-autonomous-cloud-coding-agents/architecture/api-contract). ## Repository onboarding Onboarding is CDK-based. Each repository is an instance of the `Blueprint` construct in the stack. The construct writes a `RepoConfig` record to DynamoDB; the orchestrator reads it at task time. -Blueprints configure how the orchestrator executes steps for each repo: compute strategy, model selection, turn limits, GitHub token, and optional custom steps. See [REPO_ONBOARDING.md](/architecture/repo-onboarding) for the full design. +Blueprints configure how the orchestrator executes steps for each repo: compute strategy, model selection, turn limits, GitHub token, and optional custom steps. See [REPO_ONBOARDING.md](/sample-autonomous-cloud-coding-agents/architecture/repo-onboarding) for the full design. ## Model selection @@ -68,7 +68,7 @@ The dominant cost is Bedrock inference + compute, not infrastructure. Memory, La | Medium (small team) | 200-500 | $500-3,000 | | High (org-wide) | 2,000-5,000 | $5,000-30,000 | -For the full breakdown, see [COST_MODEL.md](/architecture/cost-model). +For the full breakdown, see [COST_MODEL.md](/sample-autonomous-cloud-coding-agents/architecture/cost-model). ## Known architectural risks diff --git a/docs/src/content/docs/architecture/Attachments.md b/docs/src/content/docs/architecture/Attachments.md index ddafd2cb..e92b6477 100644 --- a/docs/src/content/docs/architecture/Attachments.md +++ b/docs/src/content/docs/architecture/Attachments.md @@ -7,7 +7,7 @@ title: Attachments End-to-end support for attaching files, images, and URLs to agent tasks. Attachments let users provide non-text context — screenshots of bugs, design mockups, CSV data, log files, code snippets — that the agent can reference during execution. Every channel (CLI, webhook, Slack, Linear) feeds the same schema; every attachment passes through security screening before reaching the agent. - **Use this doc for:** understanding the attachment data model, upload flow, security screening pipeline, storage layout, agent consumption, and per-channel behaviour. -- **Related docs:** [API_CONTRACT.md](/architecture/api-contract) for the `attachments` request schema (must be updated in tandem — see [API contract sync](#api-contract-sync)), [ORCHESTRATOR.md](/architecture/orchestrator) for the task lifecycle this extends, [SECURITY.md](/architecture/security) for guardrail and Cedar context, [ARCHITECTURE.md](/architecture/architecture) for the platform overview. +- **Related docs:** [API_CONTRACT.md](/sample-autonomous-cloud-coding-agents/architecture/api-contract) for the `attachments` request schema (must be updated in tandem — see [API contract sync](#api-contract-sync)), [ORCHESTRATOR.md](/sample-autonomous-cloud-coding-agents/architecture/orchestrator) for the task lifecycle this extends, [SECURITY.md](/sample-autonomous-cloud-coding-agents/architecture/security) for guardrail and Cedar context, [ARCHITECTURE.md](/sample-autonomous-cloud-coding-agents/architecture/architecture) for the platform overview. ## Motivation @@ -197,7 +197,7 @@ The `` segment ensures uniqueness even if multiple attachments sh | Max inline data per attachment | 500 KB decoded | The Lambda synchronous invocation payload limit is **6 MB**. At 500 KB decoded (~667 KB base64) per attachment, even 5 inline attachments plus request JSON stays under 6 MB. The presigned path handles anything larger. | | Max total inline data per request | 3 MB decoded | Hard cap on total base64-decoded bytes in a single request. Even with base64 overhead (~4 MB encoded) plus JSON fields, this stays under the 6 MB Lambda payload limit. | | Max total size per task | 50 MB | Prevents abuse; bounds total screening and transfer time | -| Max task_description length | 10,000 chars | Increased from 2,000. **This is a standalone API change that affects all tasks** (not just attachment tasks). Rationale: (a) attachments need rich explanatory context ("implement this design per the attached mockup, paying attention to the header layout"), (b) multiple users have reported the 2K limit as a friction point for complex task descriptions even without attachments, (c) the guardrail screening cost increase is minimal (text screening is cheap), (d) DynamoDB item size impact is negligible (~8 KB vs ~2 KB for the description field). **Requires updating [API_CONTRACT.md](/architecture/api-contract) line 82 in tandem.** | +| Max task_description length | 10,000 chars | Increased from 2,000. **This is a standalone API change that affects all tasks** (not just attachment tasks). Rationale: (a) attachments need rich explanatory context ("implement this design per the attached mockup, paying attention to the header layout"), (b) multiple users have reported the 2K limit as a friction point for complex task descriptions even without attachments, (c) the guardrail screening cost increase is minimal (text screening is cheap), (d) DynamoDB item size impact is negligible (~8 KB vs ~2 KB for the description field). **Requires updating [API_CONTRACT.md](/sample-autonomous-cloud-coding-agents/architecture/api-contract) line 82 in tandem.** | | Allowed image MIME types | `image/png`, `image/jpeg` | Bedrock-supported formats; GIF/WebP removed to eliminate native image processing dependency | | Allowed file MIME types | `text/plain`, `text/csv`, `text/markdown`, `application/json`, `application/pdf`, `text/x-log` | Useful for code/data context; no executables | | Max URL fetch size | 10 MB | Same per-attachment limit for fetched content | @@ -1437,7 +1437,7 @@ The `upload_url` and `upload_expires_at` fields are only present in the initial ## API contract sync -This design introduces changes that conflict with the current [API_CONTRACT.md](/architecture/api-contract). The following updates must be made to API_CONTRACT.md in tandem with implementation: +This design introduces changes that conflict with the current [API_CONTRACT.md](/sample-autonomous-cloud-coding-agents/architecture/api-contract). The following updates must be made to API_CONTRACT.md in tandem with implementation: | Section | Current value | New value | |---|---|---| diff --git a/docs/src/content/docs/architecture/Cedar-hitl-gates.md b/docs/src/content/docs/architecture/Cedar-hitl-gates.md index f52c1d8b..2b87d83c 100644 --- a/docs/src/content/docs/architecture/Cedar-hitl-gates.md +++ b/docs/src/content/docs/architecture/Cedar-hitl-gates.md @@ -5,7 +5,7 @@ title: Cedar hitl gates # Cedar HITL Approval Gates > **Status:** Core implemented; this document remains the authoritative design reference. -> **Companion:** [`INTERACTIVE_AGENTS.md`](/architecture/interactive-agents) §9.3 (pointing here), §7 (state machine). +> **Companion:** [`INTERACTIVE_AGENTS.md`](/sample-autonomous-cloud-coding-agents/architecture/interactive-agents) §9.3 (pointing here), §7 (state machine). > **Visual:** [`/sample-autonomous-cloud-coding-agents/diagrams/phase3-cedar-hitl.drawio`](/sample-autonomous-cloud-coding-agents/diagrams/phase3-cedar-hitl.drawio) (12 pages; supplemented by inline Mermaid diagrams below). > **Design locked:** 2026-04-23 (Sam ↔ assistant discussion). > **Rev:** 5 (2026-05-06 — fold in parallel adversarial + advocate review of the timeout design: late-approval re-read on TIMED_OUT ConditionCheckFailed; user-visible timeout-cap milestones; ceiling-shrink milestone; Runtime JWT bound verified as auto-refreshed IAM; three new tuning metrics; explicit off-hours trade-off section; notification-delivery-failure boundary. IMPL-24 through IMPL-28 added.). diff --git a/docs/src/content/docs/architecture/Compute.md b/docs/src/content/docs/architecture/Compute.md index c22fd5c3..c7f230df 100644 --- a/docs/src/content/docs/architecture/Compute.md +++ b/docs/src/content/docs/architecture/Compute.md @@ -7,7 +7,7 @@ title: Compute Every task runs in an isolated cloud compute environment. Nothing runs on the user's machine. The agent clones the repo, writes code, runs tests, and opens a PR inside a MicroVM that is created for the task and destroyed when it ends. - **Use this doc for:** understanding the compute environment, agent harness, network architecture, and the constraints that shape the platform's design. -- **Related docs:** [ORCHESTRATOR.md](/architecture/orchestrator) for session management and liveness monitoring, [SECURITY.md](/architecture/security) for isolation and egress controls, [REPO_ONBOARDING.md](/architecture/repo-onboarding) for per-repo compute configuration. +- **Related docs:** [ORCHESTRATOR.md](/sample-autonomous-cloud-coding-agents/architecture/orchestrator) for session management and liveness monitoring, [SECURITY.md](/sample-autonomous-cloud-coding-agents/architecture/security) for isolation and egress controls, [REPO_ONBOARDING.md](/sample-autonomous-cloud-coding-agents/architecture/repo-onboarding) for per-repo compute configuration. ## Compute options @@ -25,7 +25,7 @@ The default runtime is **Amazon Bedrock AgentCore Runtime**, which runs each ses | **Cost model** | vCPU-hrs + GB-hrs | vCPU + mem/sec | EC2 + EBS | EKS control + EC2 | Underlying compute | Request + duration | EC2 metal + your ops | | **Fit** | **Default choice** | Repos > 2 GB image | GPU, heavy toolchains | Max flexibility | Queued batch jobs | **Poor** (15 min cap) | Best potential, highest cost | -The backend is selected per repo via `compute_type` in the Blueprint config. The orchestrator resolves the strategy and delegates session start, polling, and termination to the strategy implementation. See [REPO_ONBOARDING.md](/architecture/repo-onboarding) for the `ComputeStrategy` interface. +The backend is selected per repo via `compute_type` in the Blueprint config. The orchestrator resolves the strategy and delegates session start, polling, and termination to the strategy implementation. See [REPO_ONBOARDING.md](/sample-autonomous-cloud-coding-agents/architecture/repo-onboarding) for the `ComputeStrategy` interface. ## What runs in the session @@ -75,7 +75,7 @@ The platform works around this by splitting storage: | Max session duration | 8 hours | Hard limit enforced by AgentCore | | Idle timeout | 8 hours (configured) | Overridden from the default via `idleRuntimeSessionTimeout: Duration.hours(8)` so sessions blocked on long approval waits or heavy builds are not evicted while idle. Agent reports `HealthyBusy` via `/ping` to stay alive | -See [ORCHESTRATOR.md](/architecture/orchestrator) for how the orchestrator handles these timeouts. +See [ORCHESTRATOR.md](/sample-autonomous-cloud-coding-agents/architecture/orchestrator) for how the orchestrator handles these timeouts. ## Agent harness @@ -109,7 +109,7 @@ The harness enforces tool-call policy via Cedar-based hooks: - **PreToolUse** (`agent/src/hooks.py` + `agent/src/policy.py`) - Evaluates tool calls before execution. `pr_review` agents cannot use `Write`/`Edit`. Writes to `.git/*` are blocked. Destructive bash commands are denied. Fail-closed: if Cedar is unavailable, all calls are denied. - **PostToolUse** (`agent/src/hooks.py` + `agent/src/output_scanner.py`) - Screens tool outputs for secrets and redacts before re-entering agent context. -Per-repo custom Cedar policies are supported via Blueprint `security.cedarPolicies`. See [SECURITY.md](/architecture/security) for the full policy enforcement model. +Per-repo custom Cedar policies are supported via Blueprint `security.cedarPolicies`. See [SECURITY.md](/sample-autonomous-cloud-coding-agents/architecture/security) for the full policy enforcement model. ## Network architecture diff --git a/docs/src/content/docs/architecture/Cost-model.md b/docs/src/content/docs/architecture/Cost-model.md index d77c959f..e683e92e 100644 --- a/docs/src/content/docs/architecture/Cost-model.md +++ b/docs/src/content/docs/architecture/Cost-model.md @@ -4,7 +4,7 @@ title: Cost model # Cost model -This document provides an order-of-magnitude cost model for the platform. Cost efficiency is a first-class design principle (see [ARCHITECTURE.md](/architecture/architecture)). The model covers infrastructure baseline costs, per-task variable costs, and cost attribution guidance. +This document provides an order-of-magnitude cost model for the platform. Cost efficiency is a first-class design principle (see [ARCHITECTURE.md](/sample-autonomous-cloud-coding-agents/architecture/architecture)). The model covers infrastructure baseline costs, per-task variable costs, and cost attribution guidance. Detailed cost management (per-user budgets, cost attribution dashboards, token budget enforcement) builds on this baseline analysis and focuses on the dominant cost drivers. @@ -14,7 +14,7 @@ These costs are incurred regardless of task volume: | Component | Estimated cost | Notes | |---|---|---| -| NAT Gateway (1×) | ~$32/month | Fixed hourly cost + data processing. Single AZ (see [COMPUTE.md - Network architecture](/architecture/compute)). | +| NAT Gateway (1×) | ~$32/month | Fixed hourly cost + data processing. Single AZ (see [COMPUTE.md - Network architecture](/sample-autonomous-cloud-coding-agents/architecture/compute)). | | VPC Interface Endpoints (7×, 2 AZs) | ~$102/month | $0.01/hr × 7 endpoints × 2 AZs × 730 hrs. | | VPC Flow Logs | ~$3/month | CloudWatch ingestion. | | DynamoDB (on-demand, idle) | ~$0/month | Pay-per-request; 7 core tables (Tasks, Events, Nudges, Approvals, UserConcurrency, Webhooks, Repo). Integration tables add more when enabled (Slack: installation, user-mapping; Linear: project-mapping, user-mapping, workspace-registry, webhook-dedup). No cost when idle. | @@ -27,7 +27,7 @@ These costs are incurred regardless of task volume: ### Scale-to-zero characteristics -Most platform components are fully serverless and incur zero cost when idle: DynamoDB (PAY_PER_REQUEST, 7 core tables plus integration tables when Slack/Linear are enabled), Lambda, API Gateway, S3 (trace artifacts auto-expire in 7 days), SQS (fanout DLQ), ECS Fargate (cluster is free, when enabled), AgentCore Runtime (per-session), Bedrock (per-token), and Cognito (free tier). The stranded task reconciler adds <$0.01/month even when idle (288 Lambda invocations/day, early-exit). The always-on cost floor (~$140–150/month) is dominated by VPC networking infrastructure (NAT Gateway + 7 interface endpoints across 2 AZs) which is required for private subnet connectivity to AWS services and GitHub. See the [Deployment guide](/getting-started/deployment-guide) for the full scale-to-zero breakdown. +Most platform components are fully serverless and incur zero cost when idle: DynamoDB (PAY_PER_REQUEST, 7 core tables plus integration tables when Slack/Linear are enabled), Lambda, API Gateway, S3 (trace artifacts auto-expire in 7 days), SQS (fanout DLQ), ECS Fargate (cluster is free, when enabled), AgentCore Runtime (per-session), Bedrock (per-token), and Cognito (free tier). The stranded task reconciler adds <$0.01/month even when idle (288 Lambda invocations/day, early-exit). The always-on cost floor (~$140–150/month) is dominated by VPC networking infrastructure (NAT Gateway + 7 interface endpoints across 2 AZs) which is required for private subnet connectivity to AWS services and GitHub. See the [Deployment guide](/sample-autonomous-cloud-coding-agents/getting-started/deployment-guide) for the full scale-to-zero breakdown. ## Per-task variable costs @@ -53,7 +53,7 @@ Assuming a typical task: 1–2 hours, Claude Sonnet, ~100K input tokens, ~20K ou ### Optional: deploy-preview screenshots -The screenshot pipeline (see [Deploy preview screenshots guide](/using/deploy-preview-screenshots-guide)) is opt-in per repo and deterministic — no LLM, no agent runtime. Only fires when a connected deploy provider posts `deployment_status: success`. +The screenshot pipeline (see [Deploy preview screenshots guide](/sample-autonomous-cloud-coding-agents/using/deploy-preview-screenshots-guide)) is opt-in per repo and deterministic — no LLM, no agent runtime. Only fires when a connected deploy provider posts `deployment_status: success`. | Component | Estimated cost per screenshot | Notes | |---|---|---| @@ -92,7 +92,7 @@ These estimates assume Claude Sonnet with prompt caching enabled and average tas For multi-user deployments, cost should be attributable to individual users and repositories: -- **Per-task:** Token usage and compute duration are captured in task metadata (`agent.cost_usd`, `agent.turns` - see [OBSERVABILITY.md](/architecture/observability)). +- **Per-task:** Token usage and compute duration are captured in task metadata (`agent.cost_usd`, `agent.turns` - see [OBSERVABILITY.md](/sample-autonomous-cloud-coding-agents/architecture/observability)). - **Per-user:** Aggregate task costs by `user_id`. - **Per-repo:** Aggregate task costs by `repo`. - **Dashboard:** Cost attribution dashboards should be built from the same task-level metrics. @@ -116,8 +116,8 @@ For multi-user deployments, cost should be attributable to individual users and ## Reference -- [COMPUTE.md](/architecture/compute) -- Compute option billing models and network architecture. -- [ORCHESTRATOR.md](/architecture/orchestrator) -- Polling cost analysis. -- [OBSERVABILITY.md](/architecture/observability) -- Cost-related metrics (`agent.cost_usd`, token usage). -- [Deployment guide](/getting-started/deployment-guide) -- Deployment choices, scale-to-zero analysis, AWS services inventory. -- [DEPLOYMENT_ROLES.md](/architecture/deployment-roles) -- Least-privilege IAM policies for deployment. +- [COMPUTE.md](/sample-autonomous-cloud-coding-agents/architecture/compute) -- Compute option billing models and network architecture. +- [ORCHESTRATOR.md](/sample-autonomous-cloud-coding-agents/architecture/orchestrator) -- Polling cost analysis. +- [OBSERVABILITY.md](/sample-autonomous-cloud-coding-agents/architecture/observability) -- Cost-related metrics (`agent.cost_usd`, token usage). +- [Deployment guide](/sample-autonomous-cloud-coding-agents/getting-started/deployment-guide) -- Deployment choices, scale-to-zero analysis, AWS services inventory. +- [DEPLOYMENT_ROLES.md](/sample-autonomous-cloud-coding-agents/architecture/deployment-roles) -- Least-privilege IAM policies for deployment. diff --git a/docs/src/content/docs/architecture/Deployment-roles.md b/docs/src/content/docs/architecture/Deployment-roles.md index d48b144f..bcc8499e 100644 --- a/docs/src/content/docs/architecture/Deployment-roles.md +++ b/docs/src/content/docs/architecture/Deployment-roles.md @@ -710,6 +710,6 @@ These policies are conservative-but-scoped starting points. To tighten further: ## Reference -- [SECURITY.md](/architecture/security) -- Runtime IAM, memory isolation, custom step trust boundaries. -- [COMPUTE.md](/architecture/compute) -- Compute backend options (AgentCore vs ECS Fargate). -- [COST_MODEL.md](/architecture/cost-model) -- Infrastructure baseline costs and scale-to-zero analysis. +- [SECURITY.md](/sample-autonomous-cloud-coding-agents/architecture/security) -- Runtime IAM, memory isolation, custom step trust boundaries. +- [COMPUTE.md](/sample-autonomous-cloud-coding-agents/architecture/compute) -- Compute backend options (AgentCore vs ECS Fargate). +- [COST_MODEL.md](/sample-autonomous-cloud-coding-agents/architecture/cost-model) -- Infrastructure baseline costs and scale-to-zero analysis. diff --git a/docs/src/content/docs/architecture/Evaluation.md b/docs/src/content/docs/architecture/Evaluation.md index 94fc06f6..6a53b0a2 100644 --- a/docs/src/content/docs/architecture/Evaluation.md +++ b/docs/src/content/docs/architecture/Evaluation.md @@ -7,7 +7,7 @@ title: Evaluation The evaluation pipeline measures agent performance and feeds learnings back into prompts, memory, and configuration. In MVP, evaluation is manual (inspect PRs and logs). Automated evaluation is built incrementally across iterations. - **Use this doc for:** understanding what gets evaluated, the tiered validation pipeline, memory effectiveness metrics, and the feedback loop. -- **Related docs:** [MEMORY.md](/architecture/memory) for how evaluation insights are stored, [OBSERVABILITY.md](/architecture/observability) for telemetry data sources, [ORCHESTRATOR.md](/architecture/orchestrator) for prompt versioning in the data model. +- **Related docs:** [MEMORY.md](/sample-autonomous-cloud-coding-agents/architecture/memory) for how evaluation insights are stored, [OBSERVABILITY.md](/sample-autonomous-cloud-coding-agents/architecture/observability) for telemetry data sources, [ORCHESTRATOR.md](/sample-autonomous-cloud-coding-agents/architecture/orchestrator) for prompt versioning in the data model. ## What to evaluate @@ -32,7 +32,7 @@ Evaluation consumes the same data that observability and code attribution captur | Agent logs and traces | CloudWatch logs, X-Ray spans, tool calls, reasoning steps | | Code artifacts | PR description, commits, diff, repo/branch/issue links | | PR outcome signals | Merged vs. closed-without-merge (via GitHub webhooks). Positive/negative signal on task episodes. | -| Review feedback | PR review comments captured via the review feedback memory loop (see [MEMORY.md](/architecture/memory)) | +| Review feedback | PR review comments captured via the review feedback memory loop (see [MEMORY.md](/sample-autonomous-cloud-coding-agents/architecture/memory)) | ## Agent self-feedback diff --git a/docs/src/content/docs/architecture/Input-gateway.md b/docs/src/content/docs/architecture/Input-gateway.md index 48ee3b0c..d6f4a7d4 100644 --- a/docs/src/content/docs/architecture/Input-gateway.md +++ b/docs/src/content/docs/architecture/Input-gateway.md @@ -42,7 +42,7 @@ In short: **every input channel connects through this central point; the gateway Every channel-specific payload must be transformed into the same internal message structure. The rest of the system only ever sees this normalized form. - **Validation** - The gateway must validate normalized messages (required fields, types, allowed actions, target repo/issue refs, size limits) and reject malformed or invalid requests with clear errors. Task descriptions are additionally screened by Amazon Bedrock Guardrails for prompt injection at submission time (fail-closed). See [SECURITY.md](/architecture/security). + The gateway must validate normalized messages (required fields, types, allowed actions, target repo/issue refs, size limits) and reject malformed or invalid requests with clear errors. Task descriptions are additionally screened by Amazon Bedrock Guardrails for prompt injection at submission time (fail-closed). See [SECURITY.md](/sample-autonomous-cloud-coding-agents/architecture/security). - **Access control** The gateway enforces who can do what (e.g. only the task owner can cancel; only authenticated users can create tasks). This may be defined per channel or globally. diff --git a/docs/src/content/docs/architecture/Interactive-agents.md b/docs/src/content/docs/architecture/Interactive-agents.md index 05d70d66..aee15775 100644 --- a/docs/src/content/docs/architecture/Interactive-agents.md +++ b/docs/src/content/docs/architecture/Interactive-agents.md @@ -23,7 +23,7 @@ This document describes the interactivity surfaces layered on top of that model 3. **Watch** — `bgagent watch ` polls `TaskEventsTable` with an adaptive interval (500 ms when events are arriving, back-off to 5 s when idle). Same endpoint used under the hood for foreground-block UX on `ask` and for HITL approval waits. 4. **Nudge** — `bgagent nudge ""` writes a row into `TaskNudgesTable`. The agent reads pending nudges between turns, acknowledges with a `nudge_acknowledged` milestone event, and integrates the nudge on its next turn. 5. **Ask** — `bgagent ask ""` (Phase 2) writes a question row. The agent answers at the next between-turns boundary; the answer surfaces as a `status_response` event. CLI default is foreground block-and-poll with a spinner; task and answer are both durable if the CLI disconnects. -6. **Approval gates** — Phase 3 Cedar-driven hard gates. Agent emits `approval_requested`, waits for a decision from `bgagent approve` / `bgagent deny` or a Slack button-press. Detailed design in [`CEDAR_HITL_GATES.md`](/architecture/cedar-hitl-gates). +6. **Approval gates** — Phase 3 Cedar-driven hard gates. Agent emits `approval_requested`, waits for a decision from `bgagent approve` / `bgagent deny` or a Slack button-press. Detailed design in [`CEDAR_HITL_GATES.md`](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates). ### Core architectural choices @@ -226,7 +226,7 @@ Consumer: agent between-turns hook reads pending nudges, emits `nudge_acknowledg ### 3.7 TaskApprovalsTable (Phase 3) -Phase 3 approval-request spine. Detailed schema in [`CEDAR_HITL_GATES.md`](/architecture/cedar-hitl-gates). Semantics summary: +Phase 3 approval-request spine. Detailed schema in [`CEDAR_HITL_GATES.md`](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates). Semantics summary: - Agent writes an approval row with the request context. - Agent transitions `RUNNING → AWAITING_APPROVAL` and enters a poll loop. - User responds via REST (`POST /tasks/{id}/approvals/{request_id}`) or via a Slack button dispatched by the notification plane. @@ -413,7 +413,7 @@ Flags: ### 5.6 `bgagent approve` / `deny` / `pending` / `policies` (Phase 3) -HITL approval commands. All flows are REST + DDB; no streaming. Detailed design in [`CEDAR_HITL_GATES.md`](/architecture/cedar-hitl-gates). Summary: +HITL approval commands. All flows are REST + DDB; no streaming. Detailed design in [`CEDAR_HITL_GATES.md`](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates). Summary: - Agent emits `approval_required` with the tool context. - Notification plane dispatches the event (Slack with action buttons, email, GitHub). @@ -554,7 +554,7 @@ RUNNING ──▶ AWAITING_APPROVAL ──▶ RUNNING (approve or deny-with-s └──▶ FAILED (stranded reconciler catches abandoned approval) ``` -The `AWAITING_APPROVAL` state holds the user's concurrency slot (paused but alive). See [`CEDAR_HITL_GATES.md`](/architecture/cedar-hitl-gates) for full semantics. +The `AWAITING_APPROVAL` state holds the user's concurrency slot (paused but alive). See [`CEDAR_HITL_GATES.md`](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates) for full semantics. ### 8.3 Write rules @@ -737,7 +737,7 @@ Opt-in per task: 4 KB previews + full trajectory to S3 with TTL. - Hard-gate approval gates with Cedar policy evaluation - `bgagent approve` / `deny` / `pending` / `policies` - `AWAITING_APPROVAL` state + orchestrator handling -- Full design in [`CEDAR_HITL_GATES.md`](/architecture/cedar-hitl-gates) +- Full design in [`CEDAR_HITL_GATES.md`](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates) ### Phase 4 — Dispatcher polish @@ -749,8 +749,8 @@ Opt-in per task: 4 KB previews + full trajectory to S3 with TTL. ### Deferred -- **LLM-synthesized status summary** — `bgagent ask` without targeting the agent; Lambda calls an LLM to narrate state. Cost + hallucination trade-offs; revisit if v1 feedback warrants. Tracked on the product roadmap as **LLM-synthesized status summary (optional)** under **Smart progress updates** ([ROADMAP.md](/roadmap/roadmap)). -- **Cedar `effect: "advise"` tier** — non-blocking FYI policy tier for post-v1. Design sketch in [`CEDAR_HITL_GATES.md`](/architecture/cedar-hitl-gates). +- **LLM-synthesized status summary** — `bgagent ask` without targeting the agent; Lambda calls an LLM to narrate state. Cost + hallucination trade-offs; revisit if v1 feedback warrants. Tracked on the product roadmap as **LLM-synthesized status summary (optional)** under **Smart progress updates** ([ROADMAP.md](/sample-autonomous-cloud-coding-agents/roadmap/roadmap)). +- **Cedar `effect: "advise"` tier** — non-blocking FYI policy tier for post-v1. Design sketch in [`CEDAR_HITL_GATES.md`](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates). - **Outbound WebSocket from agent** — only if a concrete sub-200 ms latency requirement surfaces. Agent-initiated egress avoids dual-auth problems and works on any compute. - **Multi-user watch** — multiple users attached to the same task's live event stream (teams). diff --git a/docs/src/content/docs/architecture/Memory.md b/docs/src/content/docs/architecture/Memory.md index 272f0941..9712f344 100644 --- a/docs/src/content/docs/architecture/Memory.md +++ b/docs/src/content/docs/architecture/Memory.md @@ -7,7 +7,7 @@ title: Memory Agents are stateless by default: each task starts from scratch with no knowledge of what happened before. The memory system fixes this by giving agents access to repository knowledge, past task episodes, and review feedback across sessions. A well-configured `CLAUDE.md` in the repository is often more impactful than any external memory, but external memory fills gaps the repo cannot: execution history, reviewer preferences, operational quirks, and cross-task patterns. - **Use this doc for:** understanding what memory stores, how it flows through the pipeline, the security threat model, and the tiered implementation plan. -- **Related docs:** [SECURITY.md](/architecture/security) for prompt injection and memory poisoning mitigations, [EVALUATION.md](/architecture/evaluation) for how memory quality is measured, [ORCHESTRATOR.md](/architecture/orchestrator) for context hydration. +- **Related docs:** [SECURITY.md](/sample-autonomous-cloud-coding-agents/architecture/security) for prompt injection and memory poisoning mitigations, [EVALUATION.md](/sample-autonomous-cloud-coding-agents/architecture/evaluation) for how memory quality is measured, [ORCHESTRATOR.md](/sample-autonomous-cloud-coding-agents/architecture/orchestrator) for context hydration. ## Design principles @@ -72,7 +72,7 @@ After the PR is opened, the agent writes: 1. **Task episode** - Structured summary: approach, files changed, PR number, difficulties, outcome 2. **Repo learnings** - New knowledge discovered about the codebase -3. **Self-feedback** - What context was missing that would have helped (see [EVALUATION.md](/architecture/evaluation)) +3. **Self-feedback** - What context was missing that would have helped (see [EVALUATION.md](/sample-autonomous-cloud-coding-agents/architecture/evaluation)) If the agent crashes before writing memory, the orchestrator writes a minimal episode as fallback (also fail-open). @@ -110,7 +110,7 @@ The most novel component and the primary feedback loop between human reviewers a - **Reviewer authority** - Maintainer feedback should carry more weight than contributor feedback - **Rule expiry** - Rules not relevant in N tasks may be stale. Consider TTL or relevance checks. - **Extraction quality** - The LLM prompt that extracts rules is critical. Vague extraction produces vague rules that match poorly on retrieval. -- **Security** - PR review comments are attacker-controlled input. See [SECURITY.md](/architecture/security). +- **Security** - PR review comments are attacker-controlled input. See [SECURITY.md](/sample-autonomous-cloud-coding-agents/architecture/security). ### User preference memory @@ -210,4 +210,4 @@ flowchart TB | 5. Write-ahead validation | Open | Planned: separate model evaluates proposed memory updates before commit | | 6. Anomaly detection | Open | Planned: write pattern monitoring, behavioral drift detection, automatic halt | -See [ROADMAP.md](/roadmap/roadmap) for the phased implementation plan and [SECURITY.md](/architecture/security) for the broader security context. +See [ROADMAP.md](/sample-autonomous-cloud-coding-agents/roadmap/roadmap) for the phased implementation plan and [SECURITY.md](/sample-autonomous-cloud-coding-agents/architecture/security) for the broader security context. diff --git a/docs/src/content/docs/architecture/Observability.md b/docs/src/content/docs/architecture/Observability.md index 98d03b04..73b9a211 100644 --- a/docs/src/content/docs/architecture/Observability.md +++ b/docs/src/content/docs/architecture/Observability.md @@ -7,7 +7,7 @@ title: Observability For a system where agents run for hours and burn tokens autonomously, observability is load-bearing infrastructure. The platform captures task lifecycle, agent reasoning, tool use, and outcomes so operators can monitor health, debug failures, and improve agent performance over time. - **Use this doc for:** understanding what the platform observes, how telemetry flows, metrics, dashboards, alarms, and deployment safety. -- **Related docs:** [ORCHESTRATOR.md](/architecture/orchestrator) for task state machine, [MEMORY.md](/architecture/memory) for code attribution and cross-session learning, [EVALUATION.md](/architecture/evaluation) for agent performance measurement. +- **Related docs:** [ORCHESTRATOR.md](/sample-autonomous-cloud-coding-agents/architecture/orchestrator) for task state machine, [MEMORY.md](/sample-autonomous-cloud-coding-agents/architecture/memory) for code attribution and cross-session learning, [EVALUATION.md](/sample-autonomous-cloud-coding-agents/architecture/evaluation) for agent performance measurement. ## Telemetry architecture @@ -134,7 +134,7 @@ The CloudWatch GenAI Observability console provides additional views: per-sessio Every agent commit carries `Task-Id:` and `Prompt-Version:` trailers (via a git hook installed during repo setup). This links code changes to the task and prompt that produced them, enabling queries like "what prompt led to this change?" and supporting the evaluation pipeline. -Task conversations, tool calls, decisions, and outcomes are persisted with metadata (`task_id`, `session_id`, `repo`, `branch`, `commit SHAs`, `pr_url`) in a searchable store. The agent retrieves relevant past context via memory search at task start. See [MEMORY.md](/architecture/memory) for the memory lifecycle and retrieval strategy. +Task conversations, tool calls, decisions, and outcomes are persisted with metadata (`task_id`, `session_id`, `repo`, `branch`, `commit SHAs`, `pr_url`) in a searchable store. The agent retrieves relevant past context via memory search at task start. See [MEMORY.md](/sample-autonomous-cloud-coding-agents/architecture/memory) for the memory lifecycle and retrieval strategy. ## Audit and retention diff --git a/docs/src/content/docs/architecture/Orchestrator.md b/docs/src/content/docs/architecture/Orchestrator.md index 331609a8..350cfe8c 100644 --- a/docs/src/content/docs/architecture/Orchestrator.md +++ b/docs/src/content/docs/architecture/Orchestrator.md @@ -9,7 +9,7 @@ The orchestrator drives the task lifecycle from submission to completion. It run The orchestrator is implemented as a Lambda Durable Function. Durable execution provides checkpoint/replay across process restarts, suspension without compute charges during long waits, and condition-based polling for session completion. See the Implementation section for details. - **Use this doc for:** task state machine, admission/finalization flow, cancellation behavior, failure recovery, and concurrency management. -- **Related docs:** [ARCHITECTURE.md](/architecture/architecture) for the high-level blueprint model, [COMPUTE.md](/architecture/compute) for the session runtime, [MEMORY.md](/architecture/memory) for context sources, [REPO_ONBOARDING.md](/architecture/repo-onboarding) for per-repo customization. +- **Related docs:** [ARCHITECTURE.md](/sample-autonomous-cloud-coding-agents/architecture/architecture) for the high-level blueprint model, [COMPUTE.md](/sample-autonomous-cloud-coding-agents/architecture/compute) for the session runtime, [MEMORY.md](/sample-autonomous-cloud-coding-agents/architecture/memory) for context sources, [REPO_ONBOARDING.md](/sample-autonomous-cloud-coding-agents/architecture/repo-onboarding) for per-repo customization. ## API and agent contracts @@ -42,11 +42,11 @@ The orchestrator is deliberately scoped. It handles coordination and bookkeeping | Component | Owner | Reference | |---|---|---| -| Request authentication | Input gateway | [INPUT_GATEWAY.md](/architecture/input-gateway) | -| Agent logic (clone, code, test, PR) | Agent runtime | [COMPUTE.md](/architecture/compute) | -| Compute session lifecycle (VM, image pull) | AgentCore Runtime | [COMPUTE.md](/architecture/compute) | -| Memory storage and retrieval | AgentCore Memory | [MEMORY.md](/architecture/memory) | -| Repository onboarding | Blueprint construct | [REPO_ONBOARDING.md](/architecture/repo-onboarding) | +| Request authentication | Input gateway | [INPUT_GATEWAY.md](/sample-autonomous-cloud-coding-agents/architecture/input-gateway) | +| Agent logic (clone, code, test, PR) | Agent runtime | [COMPUTE.md](/sample-autonomous-cloud-coding-agents/architecture/compute) | +| Compute session lifecycle (VM, image pull) | AgentCore Runtime | [COMPUTE.md](/sample-autonomous-cloud-coding-agents/architecture/compute) | +| Memory storage and retrieval | AgentCore Memory | [MEMORY.md](/sample-autonomous-cloud-coding-agents/architecture/memory) | +| Repository onboarding | Blueprint construct | [REPO_ONBOARDING.md](/sample-autonomous-cloud-coding-agents/architecture/repo-onboarding) | ## Task state machine @@ -151,7 +151,7 @@ Multiple timeout mechanisms work together to prevent runaway tasks. Time-based l ## Blueprint execution -Every task follows a blueprint: a sequence of deterministic steps wrapping one agentic step. The default blueprint is the sequence described in [ARCHITECTURE.md](/architecture/architecture). Per-repo customization (see [REPO_ONBOARDING.md](/architecture/repo-onboarding)) changes which steps run without affecting the framework guarantees. +Every task follows a blueprint: a sequence of deterministic steps wrapping one agentic step. The default blueprint is the sequence described in [ARCHITECTURE.md](/sample-autonomous-cloud-coding-agents/architecture/architecture). Per-repo customization (see [REPO_ONBOARDING.md](/sample-autonomous-cloud-coding-agents/architecture/repo-onboarding)) changes which steps run without affecting the framework guarantees. ```mermaid flowchart LR @@ -243,7 +243,7 @@ Every step in the pipeline satisfies these properties: ### Extension points -Per [REPO_ONBOARDING.md](/architecture/repo-onboarding), blueprints customize execution through three layers: +Per [REPO_ONBOARDING.md](/sample-autonomous-cloud-coding-agents/architecture/repo-onboarding), blueprints customize execution through three layers: 1. **Parameterized strategies** - Select built-in implementations without code. Example: `compute.type: 'agentcore'` vs `compute.type: 'ecs'`. 2. **Lambda-backed custom steps** - Inject custom logic at `pre-agent` or `post-agent` phases. Example: SAST scan before the agent, custom lint after. @@ -413,7 +413,7 @@ Three DynamoDB tables back the orchestrator: one for task state, one for the aud ### TaskEvents table -Append-only audit log. See [OBSERVABILITY.md](/architecture/observability). +Append-only audit log. See [OBSERVABILITY.md](/sample-autonomous-cloud-coding-agents/architecture/observability). | Field | Type | Description | |---|---|---| diff --git a/docs/src/content/docs/architecture/Repo-onboarding.md b/docs/src/content/docs/architecture/Repo-onboarding.md index ca784a10..3a74f3d9 100644 --- a/docs/src/content/docs/architecture/Repo-onboarding.md +++ b/docs/src/content/docs/architecture/Repo-onboarding.md @@ -7,8 +7,8 @@ title: Repo onboarding Before users can submit tasks for a repository, that repository must be onboarded to the platform. Onboarding registers the repo and produces a per-repo configuration that the orchestrator uses at task time: compute strategy, model, credentials, networking, and pipeline customizations. If a user submits a task for a non-onboarded repo, the API returns `422 REPO_NOT_ONBOARDED`. - **Use this doc for:** the Blueprint construct interface, RepoConfig schema, override precedence, compute strategy interface, and pipeline customization model. -- **For practical usage:** see [Quick Start](/getting-started/quick-start) for onboarding your first repo and [User Guide](/using/overview) for per-repo overrides. -- **Related docs:** [ORCHESTRATOR.md](/architecture/orchestrator) for how the orchestrator consumes blueprint config, [COMPUTE.md](/architecture/compute) for compute backends, [SECURITY.md](/architecture/security) for custom step trust boundaries. +- **For practical usage:** see [Quick Start](/sample-autonomous-cloud-coding-agents/getting-started/quick-start) for onboarding your first repo and [User Guide](/sample-autonomous-cloud-coding-agents/using/overview) for per-repo overrides. +- **Related docs:** [ORCHESTRATOR.md](/sample-autonomous-cloud-coding-agents/architecture/orchestrator) for how the orchestrator consumes blueprint config, [COMPUTE.md](/sample-autonomous-cloud-coding-agents/architecture/compute) for compute backends, [SECURITY.md](/sample-autonomous-cloud-coding-agents/architecture/security) for custom step trust boundaries. ## Why onboarding? @@ -130,9 +130,9 @@ The orchestrator reads `RepoConfig` at task time. Each pipeline step consumes sp ## Pipeline customization -Blueprints customize the orchestrator pipeline through three progressively powerful layers. See [ORCHESTRATOR.md](/architecture/orchestrator) for how the framework enforces invariants regardless of customization. +Blueprints customize the orchestrator pipeline through three progressively powerful layers. See [ORCHESTRATOR.md](/sample-autonomous-cloud-coding-agents/architecture/orchestrator) for how the framework enforces invariants regardless of customization. -> **Implementation status:** Only **Layer 1** is shipped today. The Blueprint construct's `pipeline` prop currently exposes a single override, `pollIntervalMs` (`cdk/src/constructs/blueprint.ts`); there is no `customSteps`/`stepSequence` support, no `CustomStepConfig`/`StepRef` wiring, and no `INVALID_STEP_SEQUENCE` validation in code. **Layer 2 (Lambda-backed custom steps)** and **Layer 3 (custom step sequences)** below describe a planned design — see the "Blueprint custom steps and step sequences" item in [ROADMAP.md](/roadmap/roadmap). The interfaces and validation rules in those subsections are forward-looking, not current behavior. +> **Implementation status:** Only **Layer 1** is shipped today. The Blueprint construct's `pipeline` prop currently exposes a single override, `pollIntervalMs` (`cdk/src/constructs/blueprint.ts`); there is no `customSteps`/`stepSequence` support, no `CustomStepConfig`/`StepRef` wiring, and no `INVALID_STEP_SEQUENCE` validation in code. **Layer 2 (Lambda-backed custom steps)** and **Layer 3 (custom step sequences)** below describe a planned design — see the "Blueprint custom steps and step sequences" item in [ROADMAP.md](/sample-autonomous-cloud-coding-agents/roadmap/roadmap). The interfaces and validation rules in those subsections are forward-looking, not current behavior. ### Layer 1: Parameterized strategies @@ -250,4 +250,4 @@ The onboarding pipeline can produce two kinds of customization artifacts that he **Dynamic artifacts** are generated by the pipeline when repo hygiene is weak: codebase summaries, dependency graphs, suggested rules from the repo layout. These compensate for missing documentation and are attached to the repo's agent configuration. -For prompt writing guidelines, see the [Prompt Guide](/customizing/prompt-engineering). +For prompt writing guidelines, see the [Prompt Guide](/sample-autonomous-cloud-coding-agents/customizing/prompt-engineering). diff --git a/docs/src/content/docs/architecture/Security.md b/docs/src/content/docs/architecture/Security.md index 5546e061..bb1f5cd7 100644 --- a/docs/src/content/docs/architecture/Security.md +++ b/docs/src/content/docs/architecture/Security.md @@ -7,7 +7,7 @@ title: Security ABCA agents execute code with repository access. This document describes how the platform contains that risk: isolated sessions, scoped credentials, input screening, policy enforcement, and memory integrity controls. The design aligns with [AWS prescriptive guidance for agentic AI security](https://docs.aws.amazon.com/prescriptive-guidance/latest/agentic-ai-security/best-practices.html). - **Use this doc for:** understanding the security boundaries, what can go wrong, and how the platform mitigates each threat. -- **Related docs:** [COMPUTE.md](/architecture/compute) for runtime isolation details, [MEMORY.md](/architecture/memory) for memory threat analysis, [REPO_ONBOARDING.md](/architecture/repo-onboarding) for per-repo security configuration, [INPUT_GATEWAY.md](/architecture/input-gateway) for authentication flows. +- **Related docs:** [COMPUTE.md](/sample-autonomous-cloud-coding-agents/architecture/compute) for runtime isolation details, [MEMORY.md](/sample-autonomous-cloud-coding-agents/architecture/memory) for memory threat analysis, [REPO_ONBOARDING.md](/sample-autonomous-cloud-coding-agents/architecture/repo-onboarding) for per-repo security configuration, [INPUT_GATEWAY.md](/sample-autonomous-cloud-coding-agents/architecture/input-gateway) for authentication flows. ## Design principle @@ -40,7 +40,7 @@ Two authentication mechanisms protect the platform, matching the two input chann **Authorization** is user-scoped: any authenticated user can submit tasks, but users can only view and cancel their own tasks (`user_id` enforcement). Webhook management enforces ownership with 404 (not 403) to avoid leaking webhook existence. -**Agent credentials** - GitHub access currently uses a PAT stored in Secrets Manager. The orchestrator reads the secret at hydration time and passes it to the agent runtime. The model never receives the token in its context. Planned: replace the shared PAT with a GitHub App via AgentCore Identity Token Vault, providing per-task, repo-scoped, short-lived tokens (see [ROADMAP.md](/roadmap/roadmap)). +**Agent credentials** - GitHub access currently uses a PAT stored in Secrets Manager. The orchestrator reads the secret at hydration time and passes it to the agent runtime. The model never receives the token in its context. Planned: replace the shared PAT with a GitHub App via AgentCore Identity Token Vault, providing per-task, repo-scoped, short-lived tokens (see [ROADMAP.md](/sample-autonomous-cloud-coding-agents/roadmap/roadmap)). **Per-session IAM scoping** - The agent does not use its long-lived compute role (the AgentCore Runtime `ExecutionRole` or the ECS Fargate task role) for tenant data. Instead, at task startup it assumes a per-task **SessionRole** via `sts:AssumeRole` with session tags `{user_id, repo, task_id}`, and uses the resulting short-lived credentials for all DynamoDB and S3 tenant-data access. The SessionRole's policies self-constrain on those tags: @@ -59,7 +59,7 @@ Input screening happens at two points in the pipeline, forming a defense-in-dept - **Input validation** - Required fields, types, and size limits are enforced before any processing. Task descriptions are capped at 10,000 characters. - **Bedrock Guardrails** - A `PROMPT_ATTACK` content filter at `MEDIUM` input strength screens task descriptions for prompt injection. -- **Attachment screening** - All attachments (images, text files, URLs) pass through security screening before reaching the agent. Images (PNG and JPEG only) are validated via magic bytes and dimension checks, then screened through Bedrock Guardrails (image content blocks). Text files and PDFs are extracted and screened through Bedrock Guardrails text content screening. URL attachments undergo SSRF protection (DNS resolution pinning, private IP blocking, redirect validation) and content screening during hydration. See [ATTACHMENTS.md](/architecture/attachments) for the full screening pipeline. +- **Attachment screening** - All attachments (images, text files, URLs) pass through security screening before reaching the agent. Images (PNG and JPEG only) are validated via magic bytes and dimension checks, then screened through Bedrock Guardrails (image content blocks). Text files and PDFs are extracted and screened through Bedrock Guardrails text content screening. URL attachments undergo SSRF protection (DNS resolution pinning, private IP blocking, redirect validation) and content screening during hydration. See [ATTACHMENTS.md](/sample-autonomous-cloud-coding-agents/architecture/attachments) for the full screening pipeline. - **Fail-closed** - If the Bedrock API is unavailable, submissions are rejected (HTTP 503). Unscreened content never reaches the agent. ### Hydration-time screening @@ -82,11 +82,11 @@ Per-repo tool profiles are stored in onboarding config and loaded during context ## Blueprint custom steps -The blueprint framework ([REPO_ONBOARDING.md](/architecture/repo-onboarding)) allows per-repo custom Lambda steps in the orchestrator pipeline. These are a trust boundary that requires specific attention. +The blueprint framework ([REPO_ONBOARDING.md](/sample-autonomous-cloud-coding-agents/architecture/repo-onboarding)) allows per-repo custom Lambda steps in the orchestrator pipeline. These are a trust boundary that requires specific attention. **Deployment control** - Custom steps are defined in the `Blueprint` CDK construct and deployed via `cdk deploy`. Only principals with CDK deployment permissions can add or modify them. There is no runtime API for custom step CRUD. -The **same deploy-only property extends to `Blueprint.security.cedarPolicies`** — user-authored Cedar policies live in the CDK source, are typed as `readonly string[]` on the construct, and reach `RepoTable` only through a CloudFormation custom resource invoked at deploy time. The Cedar-driven HITL approval gates feature (see [`CEDAR_HITL_GATES.md`](/architecture/cedar-hitl-gates)) is load-bearing on this property: the engine treats Cedar policies loaded at task start as trusted content. If the blueprint model ever changes to accept user-uploaded policy text via an API path, the §12 trust model in that doc must be re-evaluated (add per-blueprint policy count cap, per-eval timeout, size cap). +The **same deploy-only property extends to `Blueprint.security.cedarPolicies`** — user-authored Cedar policies live in the CDK source, are typed as `readonly string[]` on the construct, and reach `RepoTable` only through a CloudFormation custom resource invoked at deploy time. The Cedar-driven HITL approval gates feature (see [`CEDAR_HITL_GATES.md`](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates)) is load-bearing on this property: the engine treats Cedar policies loaded at task start as trusted content. If the blueprint model ever changes to accept user-uploaded policy text via an API path, the §12 trust model in that doc must be re-evaluated (add per-blueprint policy count cap, per-eval timeout, size cap). **Input filtering** - The framework strips credential ARNs (`github_token_secret_arn`) and networking configuration (`egress_allowlist`) from the config before passing it to custom Lambda steps. If a custom step needs secrets, it must declare them explicitly and the operator must grant IAM permissions. @@ -116,11 +116,11 @@ The platform is self-hosted in the customer's AWS account. No code or repo data | Audit | Bedrock model invocation logging (90-day retention) | Prompt injection investigation, compliance | | Deployment | CDK infrastructure as code | Consistent, auditable deployments | -**DNS Firewall note:** Currently in observation mode (non-allowlisted domains are logged as ALERT but not blocked). Per-repo `egressAllowlist` entries are aggregated into the platform-wide policy. DNS Firewall does not block direct IP connections, which is acceptable for the "confused agent" threat model but not for sophisticated adversaries. See [COMPUTE.md](/architecture/compute) for the enforcement rollout process. +**DNS Firewall note:** Currently in observation mode (non-allowlisted domains are logged as ALERT but not blocked). Per-repo `egressAllowlist` entries are aggregated into the platform-wide policy. DNS Firewall does not block direct IP connections, which is acceptable for the "confused agent" threat model but not for sophisticated adversaries. See [COMPUTE.md](/sample-autonomous-cloud-coding-agents/architecture/compute) for the enforcement rollout process. ## Policy enforcement -The platform enforces policies at multiple points in the task lifecycle. Today, these are implemented inline across handlers, constructs, and agent code. A centralized Cedar-based policy framework is planned (see [ROADMAP.md](/roadmap/roadmap)). +The platform enforces policies at multiple points in the task lifecycle. Today, these are implemented inline across handlers, constructs, and agent code. A centralized Cedar-based policy framework is planned (see [ROADMAP.md](/sample-autonomous-cloud-coding-agents/roadmap/roadmap)). ### Current enforcement map @@ -161,7 +161,7 @@ flowchart LR | Finalization | Build/lint verification | `agent/src/post_hooks.py` | Task record and PR body | | Infrastructure | DNS Firewall, WAF | CDK constructs | CloudWatch logs | -**Audit gap:** Submission-time rejections currently return HTTP errors without structured audit events. Planned: a unified `PolicyDecisionEvent` schema across all phases (see [ROADMAP.md](/roadmap/roadmap)). +**Audit gap:** Submission-time rejections currently return HTTP errors without structured audit events. Planned: a unified `PolicyDecisionEvent` schema across all phases (see [ROADMAP.md](/sample-autonomous-cloud-coding-agents/roadmap/roadmap)). ### Mid-execution enforcement @@ -176,7 +176,7 @@ Once an agent session starts, two mechanisms enforce policy without requiring an ## Memory threats -The platform's memory system ([MEMORY.md](/architecture/memory)) faces threats from both intentional attacks and emergent corruption. OWASP classifies memory poisoning as **ASI06** in the 2026 Top 10 for Agentic Applications, recognizing that persistent memory attacks are fundamentally different from single-session prompt injection: poisoned entries influence every subsequent interaction. +The platform's memory system ([MEMORY.md](/sample-autonomous-cloud-coding-agents/architecture/memory)) faces threats from both intentional attacks and emergent corruption. OWASP classifies memory poisoning as **ASI06** in the 2026 Top 10 for Agentic Applications, recognizing that persistent memory attacks are fundamentally different from single-session prompt injection: poisoned entries influence every subsequent interaction. ### Attack vectors @@ -198,7 +198,7 @@ The platform's memory system ([MEMORY.md](/architecture/memory)) faces threats f 5. **Review feedback quorum** - Only promote feedback to persistent rules if the same pattern appears from multiple trusted reviewers across multiple PRs. Single review comments never become permanent rules. 6. **Blast radius containment** - Even if poisoned rules get through, the agent cannot modify CI/CD pipelines, change branch protection, access secrets beyond its scoped token, or push to protected branches. -**Planned:** Trust-scored retrieval with temporal decay, anomaly detection on write patterns, and write-ahead guardian validation (see [ROADMAP.md](/roadmap/roadmap)). +**Planned:** Trust-scored retrieval with temporal decay, anomaly detection on write patterns, and write-ahead guardian validation (see [ROADMAP.md](/sample-autonomous-cloud-coding-agents/roadmap/roadmap)). ## Data protection diff --git a/docs/src/content/docs/architecture/Vision.md b/docs/src/content/docs/architecture/Vision.md index 6cfc8842..0438f585 100644 --- a/docs/src/content/docs/architecture/Vision.md +++ b/docs/src/content/docs/architecture/Vision.md @@ -7,7 +7,7 @@ title: Vision This document states the long-term direction of **ABCA (Autonomous Background Coding Agents on AWS)** and the **tenets** that should guide design, implementation, and review. Use it when evaluating pull requests, RFCs, and ADRs: if a change clearly advances the vision and respects the tenets, it belongs; if it trades tenets away without an explicit, documented rationale, it needs more discussion. - **Use this doc for:** alignment checks in review — “does this fit where we are going?” -- **Not a substitute for:** [ARCHITECTURE.md](/architecture/architecture) (system shape), [ROADMAP.md](/roadmap/roadmap) (what ships when), or [docs/decisions/](../decisions/) (specific accepted choices). +- **Not a substitute for:** [ARCHITECTURE.md](/sample-autonomous-cloud-coding-agents/architecture/architecture) (system shape), [ROADMAP.md](/sample-autonomous-cloud-coding-agents/roadmap/roadmap) (what ships when), or [docs/decisions/](../decisions/) (specific accepted choices). ## Vision @@ -17,7 +17,7 @@ We are building toward **lights-sparse**, **graduated** autonomy (defined below) ### What "lights-sparse" means -**Lights-sparse** is project vocabulary (not general industry jargon): it names the autonomy posture ABCA targets today, drawn from the **software dark factory** analogy in the [introduction](/architecture/index). +**Lights-sparse** is project vocabulary (not general industry jargon): it names the autonomy posture ABCA targets today, drawn from the **software dark factory** analogy in the [introduction](/sample-autonomous-cloud-coding-agents/architecture/index). - **Lights-out** (the analogy’s end state): humans set goals, policy, and constraints; production runs without people on the floor. - **Lights-sparse** (where teams are now): the **implementation loop** — edit code, run tests, open pull requests — is increasingly **unattended**, while **governance, merge authority, and production release** stay **supervised**. Humans are not at the keyboard for every step; they are still accountable for what ships. @@ -26,7 +26,7 @@ ABCA is built for that posture: asynchronous tasks, policy-gated escalation when ### What "graduated" means here -**Graduated** autonomy is not a single on/off switch: operators tighten or loosen gates over time (Cedar policies, `--pre-approve`, per-repo posture) without forking the deployment. See tenet 2 and [CEDAR_HITL_GATES.md](/architecture/cedar-hitl-gates). +**Graduated** autonomy is not a single on/off switch: operators tighten or loosen gates over time (Cedar policies, `--pre-approve`, per-repo posture) without forking the deployment. See tenet 2 and [CEDAR_HITL_GATES.md](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates). Success looks like teams that can **submit work and walk away**, trust that doomed work fails fast and cheaply, inspect every important decision in an audit trail, and see **measurable improvement** over time (fewer revision cycles, higher first-review merge rates, predictable cost). @@ -39,7 +39,7 @@ Tenets are durable preferences. They can conflict; resolving conflict is a desig **The normal path is asynchronous and unattended** — submit a task, leave, and come back to a PR, review, or failure reason. Human involvement during a run is **by exception and policy-driven**, not the default way to “drive” the agent. - Progress and outcomes surface through **status, events, and notifications** (GitHub comment, Slack, email) — the platform reaches the human; the human does not babysit a terminal. -- **Human-in-the-loop (HITL)** is how we escalate when autonomy must pause: Cedar **soft-deny** rules become approval gates; **hard-deny** rules still fail closed; **`--pre-approve`** scopes let trusted work proceed without repeated gates. See [CEDAR_HITL_GATES.md](/architecture/cedar-hitl-gates) and [INTERACTIVE_AGENTS.md](/architecture/interactive-agents). +- **Human-in-the-loop (HITL)** is how we escalate when autonomy must pause: Cedar **soft-deny** rules become approval gates; **hard-deny** rules still fail closed; **`--pre-approve`** scopes let trusted work proceed without repeated gates. See [CEDAR_HITL_GATES.md](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates) and [INTERACTIVE_AGENTS.md](/sample-autonomous-cloud-coding-agents/architecture/interactive-agents). - Real-time steering (**nudge**, **watch**) is for **operator intervention**, not the primary product shape. - **In review:** Do not conflate “background agent” with “no human ever.” Ask whether the change preserves fire-and-forget for the submitter while making escalation **reachable, attributable, and policy-gated** when risk warrants it. @@ -48,7 +48,7 @@ Tenets are durable preferences. They can conflict; resolving conflict is a desig **The same deployment should support different autonomy postures** so teams can adopt incrementally: tight gates early, broader pre-approval and fewer interrupts later — without forking the platform. - Autonomy is expressed through **configuration and policy** (Blueprint, Cedar policies, submit-time `--pre-approve`, per-repo overrides) — not hard-coded per customer in core orchestrator logic. -- A repo can run **fully gated** (many soft-deny rules, narrow pre-approve), **mostly autonomous** (`all_session` pre-approve with hard-deny still enforced), or anywhere between; platform maturity moves along the [ROADMAP.md](/roadmap/roadmap) scorecard, not a single global on/off switch. +- A repo can run **fully gated** (many soft-deny rules, narrow pre-approve), **mostly autonomous** (`all_session` pre-approve with hard-deny still enforced), or anywhere between; platform maturity moves along the [ROADMAP.md](/sample-autonomous-cloud-coding-agents/roadmap/roadmap) scorecard, not a single global on/off switch. - **Merge and release authority** stay human regardless of autonomy level; raising autonomy means fewer *in-run* interruptions, not unsupervised production promotion. - **In review:** Prefer knobs that let operators tighten or loosen autonomy per repo/task; flag designs that lock everyone to one posture or that bypass policy to “make demos easier.” @@ -143,12 +143,12 @@ These are out of scope for the project vision. Proposals that primarily serve th | Document | Role | |----------|------| -| [ARCHITECTURE.md](/architecture/architecture) | Component design and design principles for the current system | -| [ROADMAP.md](/roadmap/roadmap) | Sequenced delivery and maturity scorecard | -| [SECURITY.md](/architecture/security) | Threat model and controls (tenets 4–5 in depth) | -| [CEDAR_HITL_GATES.md](/architecture/cedar-hitl-gates) | HITL approval gates, pre-approve scopes, graduated in-run autonomy | -| [INTERACTIVE_AGENTS.md](/architecture/interactive-agents) | Async UX, watch/nudge, notification plane, approval state machine | +| [ARCHITECTURE.md](/sample-autonomous-cloud-coding-agents/architecture/architecture) | Component design and design principles for the current system | +| [ROADMAP.md](/sample-autonomous-cloud-coding-agents/roadmap/roadmap) | Sequenced delivery and maturity scorecard | +| [SECURITY.md](/sample-autonomous-cloud-coding-agents/architecture/security) | Threat model and controls (tenets 4–5 in depth) | +| [CEDAR_HITL_GATES.md](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates) | HITL approval gates, pre-approve scopes, graduated in-run autonomy | +| [INTERACTIVE_AGENTS.md](/sample-autonomous-cloud-coding-agents/architecture/interactive-agents) | Async UX, watch/nudge, notification plane, approval state machine | | [docs/decisions/](../decisions/) | Recorded choices when tenets conflict or ambiguity is resolved | -| [docs/src/content/docs/index.md](/architecture/index) (synced intro) | Public-facing narrative including dark-factory attribute table | +| [docs/src/content/docs/index.md](/sample-autonomous-cloud-coding-agents/architecture/index) (synced intro) | Public-facing narrative including dark-factory attribute table | When tenets and architecture principles overlap, **tenets win for review judgment**; **architecture and ADRs win for implementation detail** once a direction is chosen. diff --git a/docs/src/content/docs/architecture/Workflows.md b/docs/src/content/docs/architecture/Workflows.md index cb5a4b20..7d36ec98 100644 --- a/docs/src/content/docs/architecture/Workflows.md +++ b/docs/src/content/docs/architecture/Workflows.md @@ -9,8 +9,8 @@ A **workflow** is a versioned, declarative document that describes how the agent The three shipped task types — `new_task`, `pr_iteration`, `pr_review` — become the first three first-party workflows, and **the `task_type` enum is removed**: this work replaces it, it does not coexist with it. New domains (research, document drafting, data analysis) are new workflow files, not new orchestrator branches. Crucially, a workflow can declare `requires_repo: false`, unlocking **repo-optional tasks**: knowledge work with no GitHub clone and no PR scaffolding. - **Use this doc for:** the workflow file schema, step-kind catalog, the agent-side step runner model, how a workflow reference flows from API to agent, and the plan for replacing the `task_type` enum with workflows (it is removed, not aliased). -- **Related docs:** [ARCHITECTURE.md](/architecture/architecture) for the deterministic-steps-wrapping-one-agentic-step model, [ORCHESTRATOR.md](/architecture/orchestrator) for the durable lifecycle the workflow runs inside, [REPO_ONBOARDING.md](/architecture/repo-onboarding) for the per-repo **Blueprint** (a distinct concept — see [Naming](#naming-workflow-vs-blueprint)), [CEDAR_HITL_GATES.md](/architecture/cedar-hitl-gates) for the policy engine a workflow's `agent_config` feeds, [SECURITY.md](/architecture/security) for tool tiers, and [API_CONTRACT.md](/architecture/api-contract) for the `workflow_ref` wire field. -- **Decision record:** [ADR-014](/architecture/adr-014-workflow-driven-tasks). +- **Related docs:** [ARCHITECTURE.md](/sample-autonomous-cloud-coding-agents/architecture/architecture) for the deterministic-steps-wrapping-one-agentic-step model, [ORCHESTRATOR.md](/sample-autonomous-cloud-coding-agents/architecture/orchestrator) for the durable lifecycle the workflow runs inside, [REPO_ONBOARDING.md](/sample-autonomous-cloud-coding-agents/architecture/repo-onboarding) for the per-repo **Blueprint** (a distinct concept — see [Naming](#naming-workflow-vs-blueprint)), [CEDAR_HITL_GATES.md](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates) for the policy engine a workflow's `agent_config` feeds, [SECURITY.md](/sample-autonomous-cloud-coding-agents/architecture/security) for tool tiers, and [API_CONTRACT.md](/sample-autonomous-cloud-coding-agents/architecture/api-contract) for the `workflow_ref` wire field. +- **Decision record:** [ADR-014](/sample-autonomous-cloud-coding-agents/architecture/adr-014-workflow-driven-tasks). - **Tracking issue:** [#248](https://github.com/aws-samples/sample-autonomous-cloud-coding-agents/issues/248). Pairs with the agent asset registry ([#246](https://github.com/aws-samples/sample-autonomous-cloud-coding-agents/issues/246)) and attribution ([#245](https://github.com/aws-samples/sample-autonomous-cloud-coding-agents/issues/245)). Scoped-down, current-architecture track of the broader AKW vision ([#99](https://github.com/aws-samples/sample-autonomous-cloud-coding-agents/issues/99)). ## The problem @@ -105,7 +105,7 @@ A workflow file has the following top-level fields. (Full machine-readable schem | `steps` | Step[] | ✓ | Ordered pipeline phases (see [Step kinds](#step-kinds)). | | `required_inputs` | object | – | Validation contract, e.g. `{ one_of: [issue_number, task_description] }` or `{ all_of: [pr_number] }`. Replaces the scattered required-input checks. | | `terminal_outcomes` | object | ✓ | What "done" *produces* — `pr_url` \| `review_posted` \| `artifact` \| `comment`. Records the expected artifact; it does **not** override success inference (see [Success inference](#success-inference-and-terminal-outcomes)). | -| `limits` | object | – | `{ max_turns, max_budget_usd }` defaults (per-task / per-repo still override, per [override precedence](/architecture/repo-onboarding#override-precedence)). | +| `limits` | object | – | `{ max_turns, max_budget_usd }` defaults (per-task / per-repo still override, per [override precedence](/sample-autonomous-cloud-coding-agents/architecture/repo-onboarding#override-precedence)). | | `promotion_gate` | object | – | The check contract a version must pass to reach `production` (see [Promotion is earned, not set](#promotion-is-earned-not-set)). `{ requires: [] }` — pre-#236 a concrete test target (`tests:agent/new_task`); post-#236 an eval id (`eval:web-research-quality`). Optional until #236; absent ⇒ test-tier fallback. | | `status` | enum | ✓ | `draft` \| `validated` \| `production` \| `deprecated`. Only `production` resolves for normal tasks. | @@ -206,7 +206,7 @@ This second example is the **target shape** for repo-less execution — the acce ## The agent-side step runner -Per [ADR-014](/architecture/adr-014-workflow-driven-tasks), the runner is **agent-side**: it lives in the container and interprets `workflow.steps`. The orchestrator's durable shape (`admission-control → pre-flight → hydrate-context → start-session → await-agent-completion → finalize`) is unchanged — the workflow drives *what happens inside* the `RUNNING` state, not the platform lifecycle. This keeps the blast radius off durable orchestration and matches the issue's "executes steps in order *inside the container*." +Per [ADR-014](/sample-autonomous-cloud-coding-agents/architecture/adr-014-workflow-driven-tasks), the runner is **agent-side**: it lives in the container and interprets `workflow.steps`. The orchestrator's durable shape (`admission-control → pre-flight → hydrate-context → start-session → await-agent-completion → finalize`) is unchanged — the workflow drives *what happens inside* the `RUNNING` state, not the platform lifecycle. This keeps the blast radius off durable orchestration and matches the issue's "executes steps in order *inside the container*." ```python # agent/src/workflow/runner.py (shape, not final code) @@ -226,12 +226,12 @@ def run_workflow(workflow: Workflow, config: TaskConfig, hc: HydratedContext) -> ### Step execution semantics -The step runner runs inside the compute substrate, which is **not** a throwaway container: AgentCore provides persistent session storage — a per-session filesystem at `/mnt/workspace` that survives stop/resume cycles (14-day TTL, see [COMPUTE.md](/architecture/compute)) — and the Claude Agent SDK supports resuming a prior session by its session UUID (the runner already captures that UUID from the first `ResultMessage`). So the durability model the runner should target is **resume from where the workflow stopped**, not replay from the beginning. The runner is designed resume-aware from the start so the structured "steps" become the natural checkpoint boundaries: +The step runner runs inside the compute substrate, which is **not** a throwaway container: AgentCore provides persistent session storage — a per-session filesystem at `/mnt/workspace` that survives stop/resume cycles (14-day TTL, see [COMPUTE.md](/sample-autonomous-cloud-coding-agents/architecture/compute)) — and the Claude Agent SDK supports resuming a prior session by its session UUID (the runner already captures that UUID from the first `ResultMessage`). So the durability model the runner should target is **resume from where the workflow stopped**, not replay from the beginning. The runner is designed resume-aware from the start so the structured "steps" become the natural checkpoint boundaries: - **Step completion is checkpointed; resume skips completed steps.** The runner records each step's outcome to a small `workflow_state.json` on the persistent mount (`/mnt/workspace`) as it goes. On resume (orchestrator re-invokes the same session, or — per the roadmap — a replacement worker rehydrates from the [S3-backed SDK session store](#relationship-to-the-resume-roadmap)), the runner reads that checkpoint, **skips already-completed deterministic steps** (`clone_repo` need not re-clone a populated `/workspace`; a completed `verify_build` is not re-run), and **resumes the agent loop** via the persisted SDK session UUID rather than restarting it from turn 0. This is the same property the orchestrator already relies on for session start being idempotent (pre-generated, reused session id). - **Side-effecting steps remain idempotent.** Independent of resume, `clone_repo`, `ensure_pr`, `post_review`, and `deliver_artifact` must tolerate a partial prior run (a resume can re-enter the step that was in flight when the worker died). Each documents its idempotency key — PR branch, review id, artifact S3 key = `task_id` — so re-entry reconciles rather than duplicates (today's `ensure_pr` already does this: it checks `gh pr view` before creating). - **`on_failure: continue` is forbidden after side effects** (validation rule 10). A failed `ensure_pr` (commits pushed, PR-create failed) must not reach a *succeeded* terminal — committed work with no PR and no compensation. `continue` is permitted only for non-side-effecting, advisory steps (e.g. an informational `verify_lint`). `skip_remaining` ends the workflow cleanly and runs terminal-outcome resolution against whatever completed; `fail` (default) is terminal `FAILED`. -- **Granularity boundary.** Resume is *workflow-step granular on the agent side*, not a new orchestrator-side durable checkpoint per step — the orchestrator still treats the whole session as one `await-agent-completion` step, so platform invariants stay agent-external ([ADR-014](/architecture/adr-014-workflow-driven-tasks)). What changes versus today is that the agent-side runner makes its *own* progress recoverable across a stop/resume, which today's monolithic `run_task` does not. +- **Granularity boundary.** Resume is *workflow-step granular on the agent side*, not a new orchestrator-side durable checkpoint per step — the orchestrator still treats the whole session as one `await-agent-completion` step, so platform invariants stay agent-external ([ADR-014](/sample-autonomous-cloud-coding-agents/architecture/adr-014-workflow-driven-tasks)). What changes versus today is that the agent-side runner makes its *own* progress recoverable across a stop/resume, which today's monolithic `run_task` does not. #### Relationship to the resume roadmap @@ -247,7 +247,7 @@ Until the S3 session store lands, resume is bounded to what persistent session s ### Agent configuration: the three planes -A Claude Agent SDK session here is shaped by more than tools — it loads **skills, plugins, subagents, rules/prompt-fragments, MCP servers, settings, and Cedar policy**. Today these arrive from two places: the agent's own code (`runner.py` hardcodes `allowed_tools`; `setting_sources=["project"]`) and the **cloned repo** (`prompt_builder.discover_project_config` reads `CLAUDE.md`, `.claude/rules/*.md`, `.claude/agents/*.md`, `.claude/settings.json`, `.mcp.json`). A workflow adds a third plane. The model is three layers, lowest-to-highest precedence — deliberately parallel to the existing platform/repo/task [override precedence](/architecture/repo-onboarding#override-precedence): +A Claude Agent SDK session here is shaped by more than tools — it loads **skills, plugins, subagents, rules/prompt-fragments, MCP servers, settings, and Cedar policy**. Today these arrive from two places: the agent's own code (`runner.py` hardcodes `allowed_tools`; `setting_sources=["project"]`) and the **cloned repo** (`prompt_builder.discover_project_config` reads `CLAUDE.md`, `.claude/rules/*.md`, `.claude/agents/*.md`, `.claude/settings.json`, `.mcp.json`). A workflow adds a third plane. The model is three layers, lowest-to-highest precedence — deliberately parallel to the existing platform/repo/task [override precedence](/sample-autonomous-cloud-coding-agents/architecture/repo-onboarding#override-precedence): | Plane | Source | What it carries | Precedence | |---|---|---|---| @@ -295,13 +295,13 @@ Scope discipline for #248: **`github` is the only implemented provider** — add Read-only is enforced by Cedar hard-deny rules. **As of #248 Phase 2a** these key off the `context.read_only` attribute (`read_only_forbid_write`, `read_only_forbid_edit`), not a principal literal — and `read_only: true` *also* makes the runner drop `Write`/`Edit` from the SDK `allowed_tools` list. Two layers: - **Defense in depth.** `read_only: true` makes the runner drop `Write`/`Edit` from `allowed_tools` *and* sends `context.read_only == true` on every Cedar request — closing the earlier gap where read-only was enforced only by a Cedar string-match on the principal, not by the tool list. -- **Property-keyed enforcement (security-relevant — was precise, not hand-waved).** Read-only enforcement attaches to the *property*, not a per-task-type literal: the principal keeps the legacy `Agent::TaskAgent::""` identity scheme (audit/attribution only), while the two hard-deny rules forbid `Write`/`Edit` **whenever `context.read_only == true`**. So the deny applies uniformly to *every* read-only workflow — not just `coding/pr-review` — and there is no literal a new read-only workflow could fail to match. This was a deliberate, recorded behavior change (see [ADR-014](/architecture/adr-014-workflow-driven-tasks) addendum 2026-06-08), gated by the `contracts/cedar-parity/` fixtures (`read-only-forbid-write`, `read-only-forbid-edit`, `read-only-false-permits-write`) run against *both* the `cedarpy` and `cedar-wasm` engines. +- **Property-keyed enforcement (security-relevant — was precise, not hand-waved).** Read-only enforcement attaches to the *property*, not a per-task-type literal: the principal keeps the legacy `Agent::TaskAgent::""` identity scheme (audit/attribution only), while the two hard-deny rules forbid `Write`/`Edit` **whenever `context.read_only == true`**. So the deny applies uniformly to *every* read-only workflow — not just `coding/pr-review` — and there is no literal a new read-only workflow could fail to match. This was a deliberate, recorded behavior change (see [ADR-014](/sample-autonomous-cloud-coding-agents/architecture/adr-014-workflow-driven-tasks) addendum 2026-06-08), gated by the `contracts/cedar-parity/` fixtures (`read-only-forbid-write`, `read-only-forbid-edit`, `read-only-false-permits-write`) run against *both* the `cedarpy` and `cedar-wasm` engines. This is the migration step where an error *silently weakens* enforcement (the rule stops matching) rather than failing loudly. The original plan was to ship it as an isolated PR ahead of the Phase 2b workflow migrations; because 2b shipped first behind a `read_only ⇒ "pr_review"` principal bridge (so read-only was never unprotected), Phase 2a instead removes that bridge and lands the property-keyed rules + parity fixtures together on the #248 branch. See the ADR-014 addendum and [Phasing](#phasing). **Policy floor (no privilege escalation by config).** `agent_config` and its `cedar_policy_modules` are author-supplied, so the schema/validator must enforce a floor rather than trusting the file: -- Built-in **hard-deny is always on** and not selectable (per [CEDAR_HITL_GATES.md](/architecture/cedar-hitl-gates)). +- Built-in **hard-deny is always on** and not selectable (per [CEDAR_HITL_GATES.md](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates)). - Built-in **soft-deny (`builtin/soft_deny`) is mandatory** for any workflow that can write (`read_only: false`); a workflow may *add* modules but may not drop the soft-deny floor. Removing it (e.g. to suppress the force-push / write-credentials HITL gates) requires an admin-approved exception, not a field edit. (Validation rule added below.) - `tier: elevated` + `read_only: false` + a permissive `allowed_tools` (or an `mcp_servers`/`plugins`/`skills` set granting reach) is exactly the shape that warrants governance — see [Authorship & governance](#authorship--governance). `tier` is the ceiling: the validator rejects an `agent_config` whose declared reach exceeds its `tier`. @@ -309,7 +309,7 @@ Registry-sourced `cedar_policy_modules` / `mcp_servers` are trusted content load ### Authorship & governance -A workflow file selects the agent's tool surface and policy posture, so **who may publish a `production` workflow is a trust decision, not a convenience**. Per [ADR-003](/architecture/adr-003-contribution-governance), publishing or promoting a first-party workflow follows the same issue → approval → review → merge path as any code change — a workflow YAML in `agent/workflows/**` is reviewed like code, and the synth-time validator (the [validation rules](#validation-rules)) is a required CI gate. When the registry (#246) makes workflows publishable out-of-band, publish/promote ACLs are Cedar-governed per #246 Phase 3; until then, the only way a `production` workflow exists is through a reviewed merge. The `description`/`guidance` discovery fields are author-controlled free text; when they feed an agent's workflow-*selection* context (Phase 4), they are treated as untrusted-external input and screened like other hydrated content. +A workflow file selects the agent's tool surface and policy posture, so **who may publish a `production` workflow is a trust decision, not a convenience**. Per [ADR-003](/sample-autonomous-cloud-coding-agents/architecture/adr-003-contribution-governance), publishing or promoting a first-party workflow follows the same issue → approval → review → merge path as any code change — a workflow YAML in `agent/workflows/**` is reviewed like code, and the synth-time validator (the [validation rules](#validation-rules)) is a required CI gate. When the registry (#246) makes workflows publishable out-of-band, publish/promote ACLs are Cedar-governed per #246 Phase 3; until then, the only way a `production` workflow exists is through a reviewed merge. The `description`/`guidance` discovery fields are author-controlled free text; when they feed an agent's workflow-*selection* context (Phase 4), they are treated as untrusted-external input and screened like other hydrated content. ## Wire contract: `workflow_ref` from API to agent @@ -333,7 +333,7 @@ A workflow file selects the agent's tool surface and policy posture, so **who ma ## Replacing task types -This work **removes** the `task_type` enum; it is not preserved as a legacy alias. After this change, `workflow_ref` is the only task-selection field. This is an intentional **breaking API change** — acceptable because the platform is pre-1.0 (per [ORCHESTRATOR.md](/architecture/orchestrator), the API surface is not frozen) and because carrying a dual `task_type`/`workflow_ref` surface would defeat the whole point of centralizing per-task-type behavior in one place. +This work **removes** the `task_type` enum; it is not preserved as a legacy alias. After this change, `workflow_ref` is the only task-selection field. This is an intentional **breaking API change** — acceptable because the platform is pre-1.0 (per [ORCHESTRATOR.md](/sample-autonomous-cloud-coding-agents/architecture/orchestrator), the API surface is not frozen) and because carrying a dual `task_type`/`workflow_ref` surface would defeat the whole point of centralizing per-task-type behavior in one place. What is removed, repo-wide: @@ -349,7 +349,7 @@ What is removed, repo-wide: So a submission with no `workflow_ref` lands on the repo default or the platform default — never coerced into the heavyweight `new_task` (clone + build + open-PR) path that the old `task_type` default implied. -**Migration for callers.** Existing callers that send `task_type` must move to `workflow_ref`. The mapping is one-to-one and published in [API_CONTRACT.md](/architecture/api-contract): `new_task → coding/new-task-v1`, `pr_iteration → coding/pr-iteration-v1`, `pr_review → coding/pr-review-v1`. The CLI's `--pr ` / `--review-pr ` flags are reworked to set `workflow_ref` (plus `pr_number`) instead of inferring a `task_type`; `--workflow [@]` is the general form. Because each migrated workflow must pass its promotion gate (see [Promotion is earned, not set](#promotion-is-earned-not-set)) before shipping, functional fidelity of the three coding paths is verified by tests/eval — but it is a *goal*, not a hard constraint: where a migrated workflow deliberately does the *right* thing differently from today (e.g. tighter read-only enforcement), that divergence is a recorded decision in the migration PR, not a regression to avoid. +**Migration for callers.** Existing callers that send `task_type` must move to `workflow_ref`. The mapping is one-to-one and published in [API_CONTRACT.md](/sample-autonomous-cloud-coding-agents/architecture/api-contract): `new_task → coding/new-task-v1`, `pr_iteration → coding/pr-iteration-v1`, `pr_review → coding/pr-review-v1`. The CLI's `--pr ` / `--review-pr ` flags are reworked to set `workflow_ref` (plus `pr_number`) instead of inferring a `task_type`; `--workflow [@]` is the general form. Because each migrated workflow must pass its promotion gate (see [Promotion is earned, not set](#promotion-is-earned-not-set)) before shipping, functional fidelity of the three coding paths is verified by tests/eval — but it is a *goal*, not a hard constraint: where a migrated workflow deliberately does the *right* thing differently from today (e.g. tighter read-only enforcement), that divergence is a recorded decision in the migration PR, not a regression to avoid. ### The default workflow (`default/agent-v1`) @@ -425,7 +425,7 @@ Enforced at author time (CDK synth / CI lint over `agent/workflows/**`) and at r ### Single source of truth and validator parity -A workflow file is validated on more than one side of the platform (CDK synth-time over `agent/workflows/**`, the Python runtime loader, and — Phase 4 — registry publish), so without discipline the cross-field rules would be re-implemented per side and **drift** — the same `(workflow file) → (valid? / which error)` hazard the repo already learned the hard way with the two Cedar engines (see the cedar-parity note in `CLAUDE.md` / [CEDAR_HITL_GATES.md](/architecture/cedar-hitl-gates) §15.6). The defense is deliberately the same: +A workflow file is validated on more than one side of the platform (CDK synth-time over `agent/workflows/**`, the Python runtime loader, and — Phase 4 — registry publish), so without discipline the cross-field rules would be re-implemented per side and **drift** — the same `(workflow file) → (valid? / which error)` hazard the repo already learned the hard way with the two Cedar engines (see the cedar-parity note in `CLAUDE.md` / [CEDAR_HITL_GATES.md](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates) §15.6). The defense is deliberately the same: 1. **The JSON Schema is the one canonical *shape* contract.** `agent/workflows/schema/workflow.schema.json` is the single artifact for field shape and for the schema-expressible conditionals (rules 3, 4, 7 via `allOf`). Both sides consume *that same file* through a standard library — `ajv` in TypeScript at synth, `jsonschema`/`check-jsonschema` in Python at load — so shape validation is never re-implemented, only re-run. 2. **The cross-field rules are implemented once, at author/CI time — not duplicated at runtime.** Rules not expressible in JSON Schema (1, 2, 5, 6, 8, 9, 11, 12, 13, 14) live in a **single validator module** that runs at CDK synth / CI lint over `agent/workflows/**`. In Phases 1–3 every workflow is a first-party file baked into the image and already cleared by that CI gate, so the **runtime Python loader performs only JSON-Schema shape validation** (defense-in-depth against a corrupt bundle) and *trusts* the CI-gated cross-field verdict rather than re-deriving it. There is therefore exactly **one** cross-field implementation in Phases 1–3, eliminating the drift surface before it exists. @@ -435,7 +435,7 @@ So: JSON Schema = canonical shape, consumed not copied; cross-field rules = one ## Promotion is earned, not set -`status: production` is not a label an author flips — it is a state a version *earns* by passing its declared `promotion_gate`. This makes the promotion lifecycle (`draft → validated → production → deprecated`) a machine-checked quality gate rather than a human's say-so, and it slots directly onto the existing [tiered validation pyramid](/architecture/adr-013-tiered-validation-pyramid): +`status: production` is not a label an author flips — it is a state a version *earns* by passing its declared `promotion_gate`. This makes the promotion lifecycle (`draft → validated → production → deprecated`) a machine-checked quality gate rather than a human's say-so, and it slots directly onto the existing [tiered validation pyramid](/sample-autonomous-cloud-coding-agents/architecture/adr-013-tiered-validation-pyramid): | Workflow status | Gate that must pass | Validation tier (ADR-013) | |---|---|---| @@ -476,10 +476,10 @@ Adapted from the issue's phases (the issue framed Phase 1 as a `task_type` *alia | Phase | Deliverable | Primary files | |---|---|---| -| 0 | This design doc + [ADR-014](/architecture/adr-014-workflow-driven-tasks) + JSON Schema + step-runner skeleton | `docs/design/WORKFLOWS.md`, `docs/decisions/`, `agent/workflows/schema/` | +| 0 | This design doc + [ADR-014](/sample-autonomous-cloud-coding-agents/architecture/adr-014-workflow-driven-tasks) + JSON Schema + step-runner skeleton | `docs/design/WORKFLOWS.md`, `docs/decisions/`, `agent/workflows/schema/` | | 1 | Step runner + `default/agent-v1` + migrate `new_task` to a workflow file; introduce `workflow_ref` and **remove the `task_type` enum** end-to-end (API/CLI/agent); the single workflow validator + `contracts/workflow-validation/` golden corpus | `agent/src/workflow/`, `agent/workflows/coding/new-task-v1.yaml`, `cdk/src/handlers/`, `cli/src/`, `contracts/workflow-validation/` | | 2b | Migrate `pr_iteration`, `pr_review` onto workflows behind a `read_only ⇒ "pr_review"` principal bridge (read-only stays enforced by the existing literal rules throughout) | `agent/workflows/coding/*`, `agent/tests/` | -| 2a | **Cedar property-keyed read-only migration** — literal `"pr_review"` hard-deny → `context.read_only == true` rules (`read_only_forbid_write/edit`), threaded via `context.read_only`; removes the 2b bridge; adds `read-only-*` `contracts/cedar-parity/` fixtures verified on *both* engines. (Originally planned as an isolated PR ahead of 2b; reordered after 2b shipped first behind the bridge — see [ADR-014](/architecture/adr-014-workflow-driven-tasks) addendum.) | `agent/policies/`, `cdk/src/handlers/shared/builtin-policies.ts`, `contracts/cedar-parity/`, `agent/src/policy.py`, `agent/src/workflow/loader.py` | +| 2a | **Cedar property-keyed read-only migration** — literal `"pr_review"` hard-deny → `context.read_only == true` rules (`read_only_forbid_write/edit`), threaded via `context.read_only`; removes the 2b bridge; adds `read-only-*` `contracts/cedar-parity/` fixtures verified on *both* engines. (Originally planned as an isolated PR ahead of 2b; reordered after 2b shipped first behind the bridge — see [ADR-014](/sample-autonomous-cloud-coding-agents/architecture/adr-014-workflow-driven-tasks) addendum.) | `agent/policies/`, `cdk/src/handlers/shared/builtin-policies.ts`, `contracts/cedar-parity/`, `agent/src/policy.py`, `agent/src/workflow/loader.py` | | 3 | Repo-optional `web_research` workflow (the repo-optional refactor — see [the requires_repo note](#domain--requiresrepo)) | `cdk/src/handlers/`, `agent/workflows/knowledge/` | | 4 | Registry-native workflows (#246); Blueprint workflow allow-list + `default_workflow`; inline/repo-local for dev | depends on #246 | @@ -489,9 +489,9 @@ Per #248, the following remain out of scope (deferred to #99 / separate issues): ## Open questions -These are genuine forks; the repo-optional items (1–2) were **prerequisites for Phase 3** and have been **resolved as recorded decisions in the [ADR-014](/architecture/adr-014-workflow-driven-tasks) addendum (2026-06-08)**, with the one implied schema reshape applied — so the Phase-0 schema is now **frozen**. They are kept here (struck-through) for traceability. +These are genuine forks; the repo-optional items (1–2) were **prerequisites for Phase 3** and have been **resolved as recorded decisions in the [ADR-014](/sample-autonomous-cloud-coding-agents/architecture/adr-014-workflow-driven-tasks) addendum (2026-06-08)**, with the one implied schema reshape applied — so the Phase-0 schema is now **frozen**. They are kept here (struck-through) for traceability. -1. ~~**Memory actorId for repo-less tasks.**~~ **RESOLVED (ADR-014 addendum):** per-user `actorId = user:{cognito_sub}` (caller-scoped, no cross-tenant bleed; mirrors the per-user trace prefix). Cross-workflow knowledge pooling is explicitly not adopted. **No schema field added** (fixed platform fallback, not author-configurable) — a Phase-3 `memory.py` change keys on `user:{user_id}` when `repo` is absent. Coordinate with [MEMORY.md](/architecture/memory). +1. ~~**Memory actorId for repo-less tasks.**~~ **RESOLVED (ADR-014 addendum):** per-user `actorId = user:{cognito_sub}` (caller-scoped, no cross-tenant bleed; mirrors the per-user trace prefix). Cross-workflow knowledge pooling is explicitly not adopted. **No schema field added** (fixed platform fallback, not author-configurable) — a Phase-3 `memory.py` change keys on `user:{user_id}` when `repo` is absent. Coordinate with [MEMORY.md](/sample-autonomous-cloud-coding-agents/architecture/memory). 2. ~~**Artifact delivery contract.**~~ **RESOLVED (ADR-014 addendum):** `deliver_artifact.target` is an **open string naming a registered Python deliverer** (`workflow/deliverers.py` → `DELIVERERS`), not a closed enum — new delivery methods are registered deliverers, not schema changes. Shared plumbing is **pinned**: task-scoped key `artifacts/{task_id}/`, a prefix-scoped SessionRole IAM grant, a per-artifact size limit, and `TaskDetail` URL surfacing; the SessionRole `repo` tenant tag gains a `workflow:{id}` repo-less form. Each deliverer declares the outcomes it `produces`; validator rule 11 consults that registry. Implementations land in Phase 3; only the contract is frozen here. 3. **Inline vs registry-only refs.** Should dev-time tasks accept an *inline* workflow body in the request (sandboxed, never `production`), or only refs? Leaning ref-only for production with an inline escape hatch gated behind a feature flag. 4. **Hydration ownership for steps.** `hydrate_context` is largely orchestrator-side today (and appears as both an orchestrator box and a step in the [Concepts](#concepts) diagram — intentionally, pending this decision). Keep it orchestrator-side with the step as a no-op consumer, or move source-specific fetching (esp. repo-less `urls`) into agent-side handlers? Current lean: orchestrator hydrates declared sources; the agent step only consumes. @@ -504,4 +504,4 @@ This design is a scoped-down reconciliation of the unmerged AKW port on `origin/ Two refinements layered on top of that port are worth calling out, because they shape the schema: 1. **Discovery is separate from execution.** A workflow carries optional `description` / `guidance` fields — a human- and agent-readable selection surface for registry search and workflow-selection (#246) — kept distinct from the machine-facing `prompt`. -2. **Promotion is earned, not set** — see [Promotion is earned, not set](#promotion-is-earned-not-set). `production` is gated by a declared `promotion_gate`, reusing the [ADR-013](/architecture/adr-013-tiered-validation-pyramid) validation pyramid rather than being a label an author flips. +2. **Promotion is earned, not set** — see [Promotion is earned, not set](#promotion-is-earned-not-set). `production` is gated by a declared `promotion_gate`, reusing the [ADR-013](/sample-autonomous-cloud-coding-agents/architecture/adr-013-tiered-validation-pyramid) validation pyramid rather than being a label an author flips. diff --git a/docs/src/content/docs/customizing/Cedar-policies.md b/docs/src/content/docs/customizing/Cedar-policies.md index f16a7a94..6fe17e11 100644 --- a/docs/src/content/docs/customizing/Cedar-policies.md +++ b/docs/src/content/docs/customizing/Cedar-policies.md @@ -6,9 +6,9 @@ title: Cedar policy guide This guide is for **blueprint authors** — repo owners writing the Cedar policies that govern what tool calls the agent can make unattended versus which ones pause for human approval. -> **If you are a task submitter** looking for how approvals work at the CLI, see [User guide — Approval gates](/using/overview#approval-gates-cedar-hitl). This guide is about *writing* the rules that cause approvals. +> **If you are a task submitter** looking for how approvals work at the CLI, see [User guide — Approval gates](/sample-autonomous-cloud-coding-agents/using/overview#approval-gates-cedar-hitl). This guide is about *writing* the rules that cause approvals. > -> **For the full design** (fail-closed posture, engine internals, concurrency), see [Cedar HITL gates design doc](/architecture/cedar-hitl-gates). +> **For the full design** (fail-closed posture, engine internals, concurrency), see [Cedar HITL gates design doc](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates). ## Two tiers, one language @@ -165,6 +165,6 @@ For unit coverage of your own rules without the cross-engine guarantee, add a ca ## Where to look next -- [`docs/design/CEDAR_HITL_GATES.md`](/architecture/cedar-hitl-gates) — full design: engine internals, fail-closed posture, late-approval races, concurrency. +- [`docs/design/CEDAR_HITL_GATES.md`](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates) — full design: engine internals, fail-closed posture, late-approval races, concurrency. - [`agent/policies/hard_deny.cedar`](../../agent/policies/hard_deny.cedar) + [`agent/policies/soft_deny.cedar`](../../agent/policies/soft_deny.cedar) — the built-in rule set, good starting point for copy-paste. -- [User guide — Approval gates](/using/overview#approval-gates-cedar-hitl) — the CLI side (`bgagent pending` / `approve` / `deny` / `policies`). +- [User guide — Approval gates](/sample-autonomous-cloud-coding-agents/using/overview#approval-gates-cedar-hitl) — the CLI side (`bgagent pending` / `approve` / `deny` / `policies`). diff --git a/docs/src/content/docs/customizing/Prompt-engineering.md b/docs/src/content/docs/customizing/Prompt-engineering.md index 0c2ada32..bbfd4e5d 100644 --- a/docs/src/content/docs/customizing/Prompt-engineering.md +++ b/docs/src/content/docs/customizing/Prompt-engineering.md @@ -10,7 +10,7 @@ Writing effective task descriptions for ABCA. ABCA agents are unattended - once a task is submitted, the agent works autonomously from start to finish. It cannot ask clarifying questions or pause for feedback. Every decision is made based on what you provide upfront, so prompt quality directly determines task success. -This guide covers how to write descriptions that lead to good pull requests. For submission mechanics (CLI flags, API fields, webhook setup), see the [User guide](/using/overview). +This guide covers how to write descriptions that lead to good pull requests. For submission mechanics (CLI flags, API fields, webhook setup), see the [User guide](/sample-autonomous-cloud-coding-agents/using/overview). ## Choosing the right input mode diff --git a/docs/src/content/docs/customizing/Repository-onboarding.md b/docs/src/content/docs/customizing/Repository-onboarding.md index 4a9a0727..d4ae5512 100644 --- a/docs/src/content/docs/customizing/Repository-onboarding.md +++ b/docs/src/content/docs/customizing/Repository-onboarding.md @@ -15,4 +15,4 @@ If you submit a task against a repository that has not been onboarded, the API r } ``` -Contact your platform administrator to onboard a new repository. For details on how administrators register repositories, see the [Developer guide](/developer-guide/introduction#repository-onboarding). \ No newline at end of file +Contact your platform administrator to onboard a new repository. For details on how administrators register repositories, see the [Developer guide](/sample-autonomous-cloud-coding-agents/developer-guide/introduction#repository-onboarding). \ No newline at end of file diff --git a/docs/src/content/docs/decisions/Adr-002-least-privilege-bootstrap-policies.md b/docs/src/content/docs/decisions/Adr-002-least-privilege-bootstrap-policies.md index 8255f458..52ab2eb7 100644 --- a/docs/src/content/docs/decisions/Adr-002-least-privilege-bootstrap-policies.md +++ b/docs/src/content/docs/decisions/Adr-002-least-privilege-bootstrap-policies.md @@ -68,7 +68,7 @@ The implementation is decomposed into 8 sub-issues, each independently reviewabl ## References -- [ADR-001](/architecture/adr-001-stacked-pull-requests) — delivery methodology (stacked PRs) +- [ADR-001](/sample-autonomous-cloud-coding-agents/architecture/adr-001-stacked-pull-requests) — delivery methodology (stacked PRs) - RFC #120 — parent issue with full design and sub-issue breakdown - `docs/design/DEPLOYMENT_ROLES.md` — current documentation (will become generated) - PR #46 — original policy derivation and validation methodology diff --git a/docs/src/content/docs/decisions/Adr-012-operational-knowledge-stack.md b/docs/src/content/docs/decisions/Adr-012-operational-knowledge-stack.md index 41c6e251..5c3399b3 100644 --- a/docs/src/content/docs/decisions/Adr-012-operational-knowledge-stack.md +++ b/docs/src/content/docs/decisions/Adr-012-operational-knowledge-stack.md @@ -155,7 +155,7 @@ Organized by persona: ```markdown # Contributor Workflow -> Operationalizes [ADR-003](/architecture/adr-003-contribution-governance) +> Operationalizes [ADR-003](/sample-autonomous-cloud-coding-agents/architecture/adr-003-contribution-governance) ## For Planners - Issue quality bar (what makes an issue "ready") diff --git a/docs/src/content/docs/decisions/Adr-014-workflow-driven-tasks.md b/docs/src/content/docs/decisions/Adr-014-workflow-driven-tasks.md index 61e6e623..75d4722f 100644 --- a/docs/src/content/docs/decisions/Adr-014-workflow-driven-tasks.md +++ b/docs/src/content/docs/decisions/Adr-014-workflow-driven-tasks.md @@ -26,7 +26,7 @@ Two forces constrain the design: ## Decision -Introduce **workflows**: versioned, declarative YAML files describing how the agent executes one kind of task (ordered `steps`, system prompt, `agent_config` (tools, MCP servers, skills, plugins, rules/prompt-fragments, Cedar policy — mirroring the #246 registry asset kinds), how repo-discovered config is layered/gated, hydration sources, terminal outcomes, `domain`, `requires_repo`, `read_only`). The three shipped task types become the first three first-party workflows. The full schema and worked examples live in [docs/design/WORKFLOWS.md](/architecture/workflows). +Introduce **workflows**: versioned, declarative YAML files describing how the agent executes one kind of task (ordered `steps`, system prompt, `agent_config` (tools, MCP servers, skills, plugins, rules/prompt-fragments, Cedar policy — mirroring the #246 registry asset kinds), how repo-discovered config is layered/gated, hydration sources, terminal outcomes, `domain`, `requires_repo`, `read_only`). The three shipped task types become the first three first-party workflows. The full schema and worked examples live in [docs/design/WORKFLOWS.md](/sample-autonomous-cloud-coding-agents/architecture/workflows). Four sub-decisions: @@ -81,7 +81,7 @@ What does **not** change: read-only is still enforced by *both* `allowed_tools` ## Addendum (2026-06-08): repo-optional open questions resolved — schema freeze -[WORKFLOWS.md](/architecture/workflows) open questions #1 (memory actorId for repo-less tasks) and #2 (artifact-delivery contract) were flagged as **blocking Phase 3 and requiring resolution before the Phase-0 schema is frozen**, because either might add or reshape a schema field. Both are now decided. The schema field reshape implied by #2 is applied in the same change as this addendum, so the schema can be treated as frozen. +[WORKFLOWS.md](/sample-autonomous-cloud-coding-agents/architecture/workflows) open questions #1 (memory actorId for repo-less tasks) and #2 (artifact-delivery contract) were flagged as **blocking Phase 3 and requiring resolution before the Phase-0 schema is frozen**, because either might add or reshape a schema field. Both are now decided. The schema field reshape implied by #2 is applied in the same change as this addendum, so the schema can be treated as frozen. **Decision 1 — Memory actorId for repo-less tasks: per-user (`user:{id}`).** A repo-less task uses `actorId = user:{cognito_sub}` (the platform user id already threaded as `TaskConfig.user_id`), not the `repo` used by coding tasks (`memory.py`). Rationale: it is caller-scoped (no cross-tenant knowledge bleed — the same isolation property the per-user trace prefix already relies on), and it matches the platform's existing user-scoping pattern. Cross-*workflow* knowledge pooling (e.g. "all `web_research` tasks share learnings") is explicitly **not** adopted now — it mixes tenants in one namespace and is a larger privacy decision deferrable to the registry phase. - **Schema impact: none.** This is a fixed platform fallback, not author-configurable, so it adds **no** `actor_namespace` selector to the workflow schema. (The earlier note that it "may introduce an `actor_namespace` selector" is resolved in the negative — keeping the schema smaller.) It is a Phase-3 `memory.py` change: when `repo` is absent, key on `user:{user_id}`; coding tasks are unchanged. @@ -100,10 +100,10 @@ With both resolved and the one schema reshape applied, the Phase-0 schema is **f - Issue [#245](https://github.com/aws-samples/sample-autonomous-cloud-coding-agents/issues/245) — attribution on resolved capability - Issue [#236](https://github.com/aws-samples/sample-autonomous-cloud-coding-agents/issues/236) — E2E verification (parity coverage) - Issue [#99](https://github.com/aws-samples/sample-autonomous-cloud-coding-agents/issues/99) — AKW integration (broader vision; out-of-scope items defer here) -- [docs/design/WORKFLOWS.md](/architecture/workflows) — the workflow schema, step catalog, and step-runner design -- [docs/design/ORCHESTRATOR.md](/architecture/orchestrator) — durable lifecycle and extension points -- [docs/design/REPO_ONBOARDING.md](/architecture/repo-onboarding) — the Blueprint construct and `step_sequence` model -- [docs/design/CEDAR_HITL_GATES.md](/architecture/cedar-hitl-gates) — policy engine the `agent_config` feeds +- [docs/design/WORKFLOWS.md](/sample-autonomous-cloud-coding-agents/architecture/workflows) — the workflow schema, step catalog, and step-runner design +- [docs/design/ORCHESTRATOR.md](/sample-autonomous-cloud-coding-agents/architecture/orchestrator) — durable lifecycle and extension points +- [docs/design/REPO_ONBOARDING.md](/sample-autonomous-cloud-coding-agents/architecture/repo-onboarding) — the Blueprint construct and `step_sequence` model +- [docs/design/CEDAR_HITL_GATES.md](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates) — policy engine the `agent_config` feeds - Prior art: `origin/merge/akw-integration` (commit `9d066a8`) — AKW YAML registry and models (reconciled, scoped down) -- [ADR-013](/architecture/adr-013-tiered-validation-pyramid) — the validation pyramid the `promotion_gate` layers onto -- [ADR-005](/architecture/adr-005-feedback-loop) — the feedback loop that workflow trajectory-evolution would extend (future, out of scope) +- [ADR-013](/sample-autonomous-cloud-coding-agents/architecture/adr-013-tiered-validation-pyramid) — the validation pyramid the `promotion_gate` layers onto +- [ADR-005](/sample-autonomous-cloud-coding-agents/architecture/adr-005-feedback-loop) — the feedback loop that workflow trajectory-evolution would extend (future, out of scope) diff --git a/docs/src/content/docs/developer-guide/Contributing.md b/docs/src/content/docs/developer-guide/Contributing.md index 3e1d2da5..e780a032 100644 --- a/docs/src/content/docs/developer-guide/Contributing.md +++ b/docs/src/content/docs/developer-guide/Contributing.md @@ -18,9 +18,9 @@ Describe what you intend to contribute. This avoids duplicate work and gives mai ### 2. Set up your environment -Follow the [Quick Start](/getting-started/quick-start) to clone, install, and build the project. See the [Developer guide](/developer-guide/introduction) for local testing and the development workflow. +Follow the [Quick Start](/sample-autonomous-cloud-coding-agents/getting-started/quick-start) to clone, install, and build the project. See the [Developer guide](/sample-autonomous-cloud-coding-agents/developer-guide/introduction) for local testing and the development workflow. -Use **[AGENTS.md](/architecture/agents)** to understand where to make changes (CDK vs CLI vs agent vs docs), which tests to extend, and common pitfalls (generated docs, mirrored API types, `mise` tasks). +Use **[AGENTS.md](/sample-autonomous-cloud-coding-agents/architecture/agents)** to understand where to make changes (CDK vs CLI vs agent vs docs), which tests to extend, and common pitfalls (generated docs, mirrored API types, `mise` tasks). ### 3. Implement your change @@ -32,7 +32,7 @@ Guidelines: - If you change API types in `cdk/src/handlers/shared/types.ts`, update `cli/src/types.ts` to match. - If you change docs sources (`docs/guides/`, `docs/design/`), run `mise //docs:sync` so generated content stays in sync. - For significant features, add a design document to `docs/design/`. -- For cross-cutting or hard-to-reverse decisions, add an ADR to `docs/decisions/` (see [ADR README](/architecture/readme)). +- For cross-cutting or hard-to-reverse decisions, add an ADR to `docs/decisions/` (see [ADR README](/sample-autonomous-cloud-coding-agents/architecture/readme)). ### 4. Commit diff --git a/docs/src/content/docs/developer-guide/Installation.md b/docs/src/content/docs/developer-guide/Installation.md index 6d6bae9f..94d50c39 100644 --- a/docs/src/content/docs/developer-guide/Installation.md +++ b/docs/src/content/docs/developer-guide/Installation.md @@ -2,7 +2,7 @@ title: Installation --- -Follow the [Quick Start](/getting-started/quick-start) to clone, install, deploy, and submit your first task. It covers prerequisites, toolchain setup, deployment, PAT configuration, Cognito user creation, and a smoke test. +Follow the [Quick Start](/sample-autonomous-cloud-coding-agents/getting-started/quick-start) to clone, install, deploy, and submit your first task. It covers prerequisites, toolchain setup, deployment, PAT configuration, Cognito user creation, and a smoke test. This section covers what the Quick Start does not: troubleshooting, local testing, and the development workflow. @@ -152,7 +152,7 @@ For the full list, see `agent/README.md`. ### Deployment -Follow the [Quick Start](/getting-started/quick-start) steps 3-6 for first-time deployment. For subsequent deploys after code changes: +Follow the [Quick Start](/sample-autonomous-cloud-coding-agents/getting-started/quick-start) steps 3-6 for first-time deployment. For subsequent deploys after code changes: ```bash mise run build diff --git a/docs/src/content/docs/developer-guide/Introduction.md b/docs/src/content/docs/developer-guide/Introduction.md index e52b1fe2..5b259c4c 100644 --- a/docs/src/content/docs/developer-guide/Introduction.md +++ b/docs/src/content/docs/developer-guide/Introduction.md @@ -14,6 +14,6 @@ The repository is organized around four main pieces: - **Infrastructure as code** in AWS CDK under `cdk/src/` - stacks, constructs, and handlers that define and deploy the platform on AWS. - **Documentation site** under `docs/` - source guides/design docs plus the generated Astro/Starlight documentation site. - **CLI package** under `cli/` - the `bgagent` command-line client used to authenticate, submit tasks, and inspect task status/events. -- **Claude Code plugin** under `docs/abca-plugin/` - a [Claude Code plugin](https://docs.anthropic.com/en/docs/claude-code/plugins) with guided skills and agents for setup, deployment, task submission, and troubleshooting. See the [plugin README](/architecture/readme) for details. +- **Claude Code plugin** under `docs/abca-plugin/` - a [Claude Code plugin](https://docs.anthropic.com/en/docs/claude-code/plugins) with guided skills and agents for setup, deployment, task submission, and troubleshooting. See the [plugin README](/sample-autonomous-cloud-coding-agents/architecture/readme) for details. > **Tip:** If you use Claude Code, run `claude --plugin-dir docs/abca-plugin` from the repo root. The plugin's `/setup` skill walks you through the entire setup process interactively. \ No newline at end of file diff --git a/docs/src/content/docs/developer-guide/Repository-preparation.md b/docs/src/content/docs/developer-guide/Repository-preparation.md index ed63e478..fac7c6e9 100644 --- a/docs/src/content/docs/developer-guide/Repository-preparation.md +++ b/docs/src/content/docs/developer-guide/Repository-preparation.md @@ -2,7 +2,7 @@ title: Repository preparation --- -The [Quick Start](/getting-started/quick-start) covers the basic setup: forking a sample repo, creating a PAT, registering a Blueprint, and storing the token in Secrets Manager. This section covers what you need beyond that. +The [Quick Start](/sample-autonomous-cloud-coding-agents/getting-started/quick-start) covers the basic setup: forking a sample repo, creating a PAT, registering a Blueprint, and storing the token in Secrets Manager. This section covers what you need beyond that. ### Pre-flight checks @@ -13,7 +13,7 @@ Permission requirements vary by task type: - `new_task` and `pr_iteration` require Contents (read/write) and Pull requests (read/write). - `pr_review` only needs Triage or higher since it does not push branches. -Classic PATs with `repo` + `read:org` scopes also work and are required when fine-grained tokens cannot reach the target repo (collaborator access, cross-org repos). See [agent/README.md](/architecture/readme#github-pat--minimal-permissions) for when to use which token type. +Classic PATs with `repo` + `read:org` scopes also work and are required when fine-grained tokens cannot reach the target repo (collaborator access, cross-org repos). See [agent/README.md](/sample-autonomous-cloud-coding-agents/architecture/readme#github-pat--minimal-permissions) for when to use which token type. ### Quick setup (single repo) @@ -57,7 +57,7 @@ new Blueprint(this, 'MyServiceBlueprint', { }); ``` -If you use a custom `compute.runtimeArn` or `credentials.githubTokenSecretArn`, pass the ARNs to `TaskOrchestrator` via `additionalRuntimeArns` and `additionalSecretArns` so the Lambda has IAM permission. See [Repo onboarding](/architecture/repo-onboarding) for the full model. +If you use a custom `compute.runtimeArn` or `credentials.githubTokenSecretArn`, pass the ARNs to `TaskOrchestrator` via `additionalRuntimeArns` and `additionalSecretArns` so the Lambda has IAM permission. See [Repo onboarding](/sample-autonomous-cloud-coding-agents/architecture/repo-onboarding) for the full model. Redeploy after changing Blueprints: `mise run //cdk:deploy`. @@ -69,9 +69,9 @@ The default image (`agent/Dockerfile`) includes Python, Node 24 (LTS), `git`, `g A blueprint can declare its own `security.cedarPolicies` rules on top of the built-in hard/soft-deny starter set. Hard-deny rules absolutely block a tool call; soft-deny rules pause the agent and ask a human before proceeding. -See the [Cedar policy guide](/customizing/cedar-policies) for the full authoring reference — vocabulary (`execute_bash`, `write_file`, `context.command`, `context.file_path`), annotations (`@rule_id`, `@tier`, `@approval_timeout_s`, `@severity`, `@category`), worked examples, multi-match rules, and cross-engine parity testing with [`contracts/cedar-parity/`](../../contracts/cedar-parity/) fixtures. +See the [Cedar policy guide](/sample-autonomous-cloud-coding-agents/customizing/cedar-policies) for the full authoring reference — vocabulary (`execute_bash`, `write_file`, `context.command`, `context.file_path`), annotations (`@rule_id`, `@tier`, `@approval_timeout_s`, `@severity`, `@category`), worked examples, multi-match rules, and cross-engine parity testing with [`contracts/cedar-parity/`](../../contracts/cedar-parity/) fixtures. ### Other options - **Stack name** - The default is `backgroundagent-dev` (set in `cdk/src/main.ts`). If you rename it, update all `--stack-name` references. -- **Making repos agent-friendly** - Add `CLAUDE.md`, `.claude/rules/`, and clear build commands. See the [Prompt guide](/customizing/prompt-engineering#repo-level-instructions) for details. \ No newline at end of file +- **Making repos agent-friendly** - Add `CLAUDE.md`, `.claude/rules/`, and clear build commands. See the [Prompt guide](/sample-autonomous-cloud-coding-agents/customizing/prompt-engineering#repo-level-instructions) for details. \ No newline at end of file diff --git a/docs/src/content/docs/developer-guide/Where-to-make-changes.md b/docs/src/content/docs/developer-guide/Where-to-make-changes.md index 50256daa..b455b3bb 100644 --- a/docs/src/content/docs/developer-guide/Where-to-make-changes.md +++ b/docs/src/content/docs/developer-guide/Where-to-make-changes.md @@ -12,4 +12,4 @@ Before editing, decide which part of the monorepo owns the behavior. This keeps | Agent runtime | `agent/` | Bundled into the image CDK deploys; run `mise run quality` in `agent/` or root build. | | Docs (source) | `docs/guides/`, `docs/design/` | After edits, run **`mise //docs:sync`** or **`mise //docs:build`**. Do not edit `docs/src/content/docs/` directly. | -For a concise duplicate of this table, common pitfalls, and a CDK test file map, see **[AGENTS.md](/architecture/agents)** at the repo root (oriented toward automation-assisted contributors). \ No newline at end of file +For a concise duplicate of this table, common pitfalls, and a CDK test file map, see **[AGENTS.md](/sample-autonomous-cloud-coding-agents/architecture/agents)** at the repo root (oriented toward automation-assisted contributors). \ No newline at end of file diff --git a/docs/src/content/docs/getting-started/Deployment-guide.md b/docs/src/content/docs/getting-started/Deployment-guide.md index e74e33dc..5d8e7499 100644 --- a/docs/src/content/docs/getting-started/Deployment-guide.md +++ b/docs/src/content/docs/getting-started/Deployment-guide.md @@ -4,7 +4,7 @@ title: Deployment guide # Deployment guide -This guide covers deploying ABCA into an AWS account, including compute backend choices, scale-to-zero characteristics, and the complete AWS service inventory. For day-to-day development workflow, see the [Developer guide](/developer-guide/introduction). For a quick first deployment, see the [Quick start](/getting-started/quick-start). For least-privilege IAM deployment roles, see [DEPLOYMENT_ROLES.md](/architecture/deployment-roles). +This guide covers deploying ABCA into an AWS account, including compute backend choices, scale-to-zero characteristics, and the complete AWS service inventory. For day-to-day development workflow, see the [Developer guide](/sample-autonomous-cloud-coding-agents/developer-guide/introduction). For a quick first deployment, see the [Quick start](/sample-autonomous-cloud-coding-agents/getting-started/quick-start). For least-privilege IAM deployment roles, see [DEPLOYMENT_ROLES.md](/sample-autonomous-cloud-coding-agents/architecture/deployment-roles). ## Architecture overview @@ -53,7 +53,7 @@ ECS Fargate is currently **opt-in** -- the `EcsAgentCluster` construct is presen The dominant idle cost is VPC networking: 7 interface endpoints across 2 AZs (~$102/month) plus the NAT Gateway (~$32/month). -For the full cost model including per-task costs, see [COST_MODEL.md](/architecture/cost-model). +For the full cost model including per-task costs, see [COST_MODEL.md](/sample-autonomous-cloud-coding-agents/architecture/cost-model). ## AWS services inventory @@ -146,7 +146,7 @@ Triggers via `workflow_run` when `build.yml` completes successfully. The pipelin - `intent: "labels"` → reads PR labels against an allowlist - `intent: ""` → deploys the specified type (e.g., `agentcore`) 4. **Requires approval** — The `deploy` job uses a GitHub Environment with required reviewers. Approvals are logged and the self-review rule prevents unilateral deploys. -5. **Deploys via OIDC** — Assumes an IAM role via GitHub OIDC federation (no long-lived credentials). The role is scoped to the `cdk deploy` action with least-privilege policies per [DEPLOYMENT_ROLES.md](/architecture/deployment-roles). +5. **Deploys via OIDC** — Assumes an IAM role via GitHub OIDC federation (no long-lived credentials). The role is scoped to the `cdk deploy` action with least-privilege policies per [DEPLOYMENT_ROLES.md](/sample-autonomous-cloud-coding-agents/architecture/deployment-roles). ### Security controls @@ -228,9 +228,9 @@ For users without AWS CLI access. ## Related docs -- [Quick start](/getting-started/quick-start) -- Zero-to-first-PR in 6 steps. -- [Developer guide](/developer-guide/introduction) -- Local development, testing, repository onboarding. -- [User guide](/using/overview) -- API reference, CLI usage, task management. -- [DEPLOYMENT_ROLES.md](/architecture/deployment-roles) -- Least-privilege IAM policies for CloudFormation execution. -- [COST_MODEL.md](/architecture/cost-model) -- Per-task costs, cost guardrails, cost at scale. -- [COMPUTE.md](/architecture/compute) -- Compute backend architecture and trade-offs. +- [Quick start](/sample-autonomous-cloud-coding-agents/getting-started/quick-start) -- Zero-to-first-PR in 6 steps. +- [Developer guide](/sample-autonomous-cloud-coding-agents/developer-guide/introduction) -- Local development, testing, repository onboarding. +- [User guide](/sample-autonomous-cloud-coding-agents/using/overview) -- API reference, CLI usage, task management. +- [DEPLOYMENT_ROLES.md](/sample-autonomous-cloud-coding-agents/architecture/deployment-roles) -- Least-privilege IAM policies for CloudFormation execution. +- [COST_MODEL.md](/sample-autonomous-cloud-coding-agents/architecture/cost-model) -- Per-task costs, cost guardrails, cost at scale. +- [COMPUTE.md](/sample-autonomous-cloud-coding-agents/architecture/compute) -- Compute backend architecture and trade-offs. diff --git a/docs/src/content/docs/getting-started/Quick-start.md b/docs/src/content/docs/getting-started/Quick-start.md index 4b0dfbba..04814504 100644 --- a/docs/src/content/docs/getting-started/Quick-start.md +++ b/docs/src/content/docs/getting-started/Quick-start.md @@ -4,7 +4,7 @@ title: Quick start # Quick start -Go from zero to your first agent-created pull request in about 30 minutes. This guide covers only the minimum path - see the [Developer guide](/developer-guide/introduction) and [User guide](/using/overview) for the full details. +Go from zero to your first agent-created pull request in about 30 minutes. This guide covers only the minimum path - see the [Developer guide](/sample-autonomous-cloud-coding-agents/developer-guide/introduction) and [User guide](/sample-autonomous-cloud-coding-agents/using/overview) for the full details. ## Prerequisites @@ -61,7 +61,7 @@ The agent authenticates to GitHub using a **fine-grained personal access token ( Keep the token value - you will store it in AWS Secrets Manager after deploying. -> **Collaborator or cross-org repos?** Fine-grained tokens only work for repos you own (or orgs that have opted in). If you're a collaborator on someone else's repo, create a **classic PAT** with `repo` + `read:org` scopes instead. See [agent/README.md](/architecture/readme#github-pat--minimal-permissions) for details. +> **Collaborator or cross-org repos?** Fine-grained tokens only work for repos you own (or orgs that have opted in). If you're a collaborator on someone else's repo, create a **classic PAT** with `repo` + `read:org` scopes instead. See [agent/README.md](/sample-autonomous-cloud-coding-agents/architecture/readme#github-pat--minimal-permissions) for details. ### Register the repo in CDK @@ -266,7 +266,7 @@ node lib/bin/bgagent.js submit --repo owner/repo --issue 42 \ --pre-approve write_path:tests/** ``` -Hard-deny rules (no `@tier("soft")` annotation) are always enforced — `--pre-approve` only short-circuits soft-deny rules. For the full command reference see [User guide — Approval gates](/using/overview#approval-gates-cedar-hitl); for authoring your own rules see the [Cedar policy guide](/customizing/cedar-policies). +Hard-deny rules (no `@tier("soft")` annotation) are always enforced — `--pre-approve` only short-circuits soft-deny rules. For the full command reference see [User guide — Approval gates](/sample-autonomous-cloud-coding-agents/using/overview#approval-gates-cedar-hitl); for authoring your own rules see the [Cedar policy guide](/sample-autonomous-cloud-coding-agents/customizing/cedar-policies). ## What happened behind the scenes @@ -312,7 +312,7 @@ new Blueprint(this, 'MyServiceBlueprint', { ### Per-repo configuration -Blueprints accept optional overrides to customize agent behavior per repository: which model to use, how many turns the agent gets, cost budget limits, extra system prompt instructions, and network egress rules. See the [User guide - Per-repo overrides](/using/overview) for the full list. +Blueprints accept optional overrides to customize agent behavior per repository: which model to use, how many turns the agent gets, cost budget limits, extra system prompt instructions, and network egress rules. See the [User guide - Per-repo overrides](/sample-autonomous-cloud-coding-agents/using/overview) for the full list. ```typescript new Blueprint(this, 'CustomBlueprint', { @@ -328,20 +328,20 @@ new Blueprint(this, 'CustomBlueprint', { ### Add a CLAUDE.md to your repository -The agent automatically loads project-level instructions from `CLAUDE.md` at the repository root (or `.claude/CLAUDE.md`). This is the most effective way to improve agent output for a specific repo - tell it your build commands, coding conventions, architecture boundaries, and constraints. See the [Prompt guide](/customizing/prompt-engineering) for examples and best practices. +The agent automatically loads project-level instructions from `CLAUDE.md` at the repository root (or `.claude/CLAUDE.md`). This is the most effective way to improve agent output for a specific repo - tell it your build commands, coding conventions, architecture boundaries, and constraints. See the [Prompt guide](/sample-autonomous-cloud-coding-agents/customizing/prompt-engineering) for examples and best practices. ### Set up webhook integrations -Webhooks let external systems (GitHub Actions, CI pipelines) create tasks without Cognito credentials, using HMAC-SHA256 authentication. This is useful for automating PR review on every PR, or triggering code changes from CI events. See the [User guide - Webhooks](/using/overview) for setup instructions. +Webhooks let external systems (GitHub Actions, CI pipelines) create tasks without Cognito credentials, using HMAC-SHA256 authentication. This is useful for automating PR review on every PR, or triggering code changes from CI events. See the [User guide - Webhooks](/sample-autonomous-cloud-coding-agents/using/overview) for setup instructions. ## Next steps - **Try an issue-based task**: `node lib/bin/bgagent.js submit --repo owner/repo --issue 42` - **Iterate on a PR**: `node lib/bin/bgagent.js submit --repo owner/repo --pr 1` - **Review a PR**: `node lib/bin/bgagent.js submit --repo owner/repo --review-pr 1` -- **Pick a workflow explicitly**: `node lib/bin/bgagent.js submit --repo owner/repo --task "..." --workflow coding/new-task-v1` — see [User guide - Workflows](/using/workflows) +- **Pick a workflow explicitly**: `node lib/bin/bgagent.js submit --repo owner/repo --task "..." --workflow coding/new-task-v1` — see [User guide - Workflows](/sample-autonomous-cloud-coding-agents/using/workflows) - **Watch a task live**: `node lib/bin/bgagent.js watch ` — stream progress events - **Steer a running task**: `node lib/bin/bgagent.js nudge "focus on tests"` — mid-run guidance - **Enable tracing**: `node lib/bin/bgagent.js submit --repo owner/repo --issue 42 --trace` then `node lib/bin/bgagent.js trace download ` - **Manage webhooks**: `node lib/bin/bgagent.js webhook create --name "My CI"` — automate task submission from external systems -- **Run locally first**: Test with `./agent/run.sh` before deploying - see the [Developer guide](/developer-guide/introduction) +- **Run locally first**: Test with `./agent/run.sh` before deploying - see the [Developer guide](/sample-autonomous-cloud-coding-agents/developer-guide/introduction) diff --git a/docs/src/content/docs/roadmap/Roadmap.md b/docs/src/content/docs/roadmap/Roadmap.md index 8fcaf97c..55c6d711 100644 --- a/docs/src/content/docs/roadmap/Roadmap.md +++ b/docs/src/content/docs/roadmap/Roadmap.md @@ -20,7 +20,7 @@ What's shipped and what's coming next. ### Task types -- [x] **Workflow-driven tasks** - Task types are declarative, versioned **workflow** files (`agent/workflows/**`) interpreted by an agent-side step runner, not hardcoded `task_type` branches. Selected via `workflow_ref` (the `task_type` enum is removed). New task types are authored as YAML + registered step handlers, not core-code changes ([ADR-014](/architecture/adr-014-workflow-driven-tasks), [WORKFLOWS.md](/architecture/workflows)) +- [x] **Workflow-driven tasks** - Task types are declarative, versioned **workflow** files (`agent/workflows/**`) interpreted by an agent-side step runner, not hardcoded `task_type` branches. Selected via `workflow_ref` (the `task_type` enum is removed). New task types are authored as YAML + registered step handlers, not core-code changes ([ADR-014](/sample-autonomous-cloud-coding-agents/architecture/adr-014-workflow-driven-tasks), [WORKFLOWS.md](/sample-autonomous-cloud-coding-agents/architecture/workflows)) - [x] **`coding/new-task-v1`** - Branch, implement, build/test, open PR - [x] **`coding/pr-iteration-v1`** - Check out PR branch, read review feedback, address it, push - [x] **`coding/pr-review-v1`** - Read-only structured code review via GitHub Reviews API (no Write/Edit tools) @@ -39,10 +39,10 @@ What's shipped and what's coming next. - [x] **Input guardrails** - Bedrock Guardrails screen task descriptions and PR/issue content (fail-closed) - [x] **Output screening** - Regex-based secret/PII scanner with PostToolUse hook redaction - [x] **Content sanitization** - HTML stripping, injection pattern neutralization, control character removal -- [x] **Cedar policy engine and HITL gates** - Tool-call governance (allow / hard-deny / soft-deny requiring approval) with fail-closed default, per-repo Cedar policies, submit-time `initial_approvals`, `AWAITING_APPROVAL` state, `bgagent approve` / `deny` / `pending` / `policies`, and REST approval APIs. Stranded approvals in `AWAITING_APPROVAL` are cleared by the stranded-task reconciler. See [CEDAR_HITL_GATES.md](/architecture/cedar-hitl-gates) +- [x] **Cedar policy engine and HITL gates** - Tool-call governance (allow / hard-deny / soft-deny requiring approval) with fail-closed default, per-repo Cedar policies, submit-time `initial_approvals`, `AWAITING_APPROVAL` state, `bgagent approve` / `deny` / `pending` / `policies`, and REST approval APIs. Stranded approvals in `AWAITING_APPROVAL` are cleared by the stranded-task reconciler. See [CEDAR_HITL_GATES.md](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates) - [x] **WAF** - Managed rule groups + rate-based rule (1,000 req/5 min/IP) - [x] **Pre-flight checks** - GitHub API reachability, repo access, token permissions (fail-closed) -- [x] **Per-session IAM scoping** - Agent assumes a per-task **SessionRole** via `sts:AssumeRole` with session tags `{user_id, repo, task_id}` and refreshable credentials (1-hour role-chaining cap; tasks up to 8 h). Tenant-data DynamoDB access uses `dynamodb:LeadingKeys = ${aws:PrincipalTag/task_id}`; S3 traces/attachments use a `${aws:PrincipalTag/user_id}` prefix. Bedrock model invocation still uses the compute role (see **Bedrock IAM session-tag attribution** under What's next). See [SECURITY.md](/architecture/security) +- [x] **Per-session IAM scoping** - Agent assumes a per-task **SessionRole** via `sts:AssumeRole` with session tags `{user_id, repo, task_id}` and refreshable credentials (1-hour role-chaining cap; tasks up to 8 h). Tenant-data DynamoDB access uses `dynamodb:LeadingKeys = ${aws:PrincipalTag/task_id}`; S3 traces/attachments use a `${aws:PrincipalTag/user_id}` prefix. Bedrock model invocation still uses the compute role (see **Bedrock IAM session-tag attribution** under What's next). See [SECURITY.md](/sample-autonomous-cloud-coding-agents/architecture/security) - [x] **Model invocation logging** - Full prompt/response audit trail (90-day retention) ### Memory and learning @@ -55,7 +55,7 @@ What's shipped and what's coming next. - [x] **Rich prompt assembly** - Task description + GitHub issue/PR content + memory context (~100K token budget) - [x] **Token budget management** - Oldest comments trimmed first; title/body always preserved -- [x] **Task attachments (multimodal)** - `attachments` on create-task: inline base64 (≤ 500 KB), presigned upload (up to 10 MB), and URL fetch with SSRF protection. Images (PNG, JPEG) and text files (TXT, CSV, MD, JSON, PDF, LOG) pass through Guardrail screening, magic-bytes validation, and re-encoding. CLI `--attachment`, Slack file uploads, and Linear image extraction share the same schema. See [ATTACHMENTS.md](/architecture/attachments) +- [x] **Task attachments (multimodal)** - `attachments` on create-task: inline base64 (≤ 500 KB), presigned upload (up to 10 MB), and URL fetch with SSRF protection. Images (PNG, JPEG) and text files (TXT, CSV, MD, JSON, PDF, LOG) pass through Guardrail screening, magic-bytes validation, and re-encoding. CLI `--attachment`, Slack file uploads, and Linear image extraction share the same schema. See [ATTACHMENTS.md](/sample-autonomous-cloud-coding-agents/architecture/attachments) ### Webhooks @@ -82,13 +82,13 @@ What's shipped and what's coming next. - [x] **GitHub edit-in-place** - Single status comment per task on the target PR, edited in place as progress events fire (phase, milestone, cost, link) - [x] **Routable agent milestones** - Named checkpoints (`pr_created`, `nudge_acknowledged`) unwrapped against allowlist for channel filter matching - [x] **Slack notification dispatcher** - FanOut Block Kit messages for Slack-origin tasks (lifecycle events, threaded replies, terminal dedup, in-thread cancel). Generic fallback text for unmapped event types (e.g. some milestones); richer milestone and approval-gate rendering is follow-up work -- [x] **Deploy-preview screenshots** - Listens for GitHub `deployment_status: success` events from any provider (Vercel, Amplify Hosting, Netlify, GitHub Actions); captures the preview URL via AgentCore Browser; posts a markdown image comment on the open PR (and on the linked Linear issue if Linear is configured). Lambda-only, deterministic, ~10–15 s post-deploy. See [Deploy preview screenshots guide](/using/deploy-preview-screenshots-guide). +- [x] **Deploy-preview screenshots** - Listens for GitHub `deployment_status: success` events from any provider (Vercel, Amplify Hosting, Netlify, GitHub Actions); captures the preview URL via AgentCore Browser; posts a markdown image comment on the open PR (and on the linked Linear issue if Linear is configured). Lambda-only, deterministic, ~10–15 s post-deploy. See [Deploy preview screenshots guide](/sample-autonomous-cloud-coding-agents/using/deploy-preview-screenshots-guide). - [ ] **Email dispatcher** - Log-only stub; pending SES integration ### Channels -- [x] **Slack integration** - @mention task submission, `bgagent slack link` / `setup`, file attachments on submit, threaded progress notifications. See [SLACK_SETUP_GUIDE.md](/using/slack-setup-guide) -- [x] **Linear integration** - Label-triggered tasks, `bgagent linear setup` / `link`, progress comments on issues. See [LINEAR_SETUP_GUIDE.md](/using/linear-setup-guide) +- [x] **Slack integration** - @mention task submission, `bgagent slack link` / `setup`, file attachments on submit, threaded progress notifications. See [SLACK_SETUP_GUIDE.md](/sample-autonomous-cloud-coding-agents/using/slack-setup-guide) +- [x] **Linear integration** - Label-triggered tasks, `bgagent linear setup` / `link`, progress comments on issues. See [LINEAR_SETUP_GUIDE.md](/sample-autonomous-cloud-coding-agents/using/linear-setup-guide) ### Observability @@ -176,8 +176,8 @@ Planned capabilities, grouped by theme. Items are independent and may ship in an | Capability | Description | |------------|-------------| -| **Smart progress updates (manager-style status)** | Extend check-in beyond the shipped deterministic snapshot: human-readable progress that answers what the agent completed, what it plans next, and which decisions or blockers matter—surfaced via `bgagent status`, notification channels (GitHub/Slack/email), and the future control panel. Prefer structured agent-emitted progress events in `TaskEventsTable` (e.g. done / next / decisions / blockers) so all readers stay consistent and auditable; complement with Phase 2 **`bgagent ask`** for on-demand Q&A and an optional read-path **LLM-synthesized summary** over events (no agent turn) where cost/latency trade-offs are acceptable. Distinct from raw `watch`/`events` streams and from post-mortem **LLM-assisted trace analysis**. Design context: [INTERACTIVE_AGENTS.md](/architecture/interactive-agents). | -| **`bgagent ask` (Phase 2)** | Mid-run questions to the agent (`POST /tasks/{id}/asks`); answers durable as `status_response` events with CLI block-and-poll. Enables interactive summaries (e.g. "what changed so far?") without a separate status API. Ships as part of the interactive check-in layer in [INTERACTIVE_AGENTS.md](/architecture/interactive-agents) Phase 2. | +| **Smart progress updates (manager-style status)** | Extend check-in beyond the shipped deterministic snapshot: human-readable progress that answers what the agent completed, what it plans next, and which decisions or blockers matter—surfaced via `bgagent status`, notification channels (GitHub/Slack/email), and the future control panel. Prefer structured agent-emitted progress events in `TaskEventsTable` (e.g. done / next / decisions / blockers) so all readers stay consistent and auditable; complement with Phase 2 **`bgagent ask`** for on-demand Q&A and an optional read-path **LLM-synthesized summary** over events (no agent turn) where cost/latency trade-offs are acceptable. Distinct from raw `watch`/`events` streams and from post-mortem **LLM-assisted trace analysis**. Design context: [INTERACTIVE_AGENTS.md](/sample-autonomous-cloud-coding-agents/architecture/interactive-agents). | +| **`bgagent ask` (Phase 2)** | Mid-run questions to the agent (`POST /tasks/{id}/asks`); answers durable as `status_response` events with CLI block-and-poll. Enables interactive summaries (e.g. "what changed so far?") without a separate status API. Ships as part of the interactive check-in layer in [INTERACTIVE_AGENTS.md](/sample-autonomous-cloud-coding-agents/architecture/interactive-agents) Phase 2. | | **LLM-synthesized status summary (optional)** | Optional `bgagent status` mode where a Lambda narrates recent `TaskEvents` without waking the agent—deferred in design due to cost and hallucination risk; pursue behind a flag only if agent-authored progress reports are insufficient. Complements, does not replace, **Smart progress updates**. | ### Channels and integrations @@ -217,7 +217,7 @@ Planned capabilities, grouped by theme. Items are independent and may ship in an | Capability | Description | |------------|-------------| -| **Bedrock IAM session-tag attribution** | Route Bedrock **InvokeModel** through assumed credentials that carry `{user_id, repo, task_id}` session tags. **Per-session IAM scoping** (#209) already tags the SessionRole for DynamoDB/S3; model calls still use the AgentCore/ECS compute role today. Extend `aws_session.py` (or equivalent) so inference is chargeable in Cost Explorer / CUR 2.0 by principal tag. Operator must activate IAM principal cost allocation tags (see [COST_MODEL.md](/architecture/cost-model#cost-attribution)). | +| **Bedrock IAM session-tag attribution** | Route Bedrock **InvokeModel** through assumed credentials that carry `{user_id, repo, task_id}` session tags. **Per-session IAM scoping** (#209) already tags the SessionRole for DynamoDB/S3; model calls still use the AgentCore/ECS compute role today. Extend `aws_session.py` (or equivalent) so inference is chargeable in Cost Explorer / CUR 2.0 by principal tag. Operator must activate IAM principal cost allocation tags (see [COST_MODEL.md](/sample-autonomous-cloud-coding-agents/architecture/cost-model#cost-attribution)). | | **Bedrock per-request metadata** | Pass `task_id`, `user_id`, and `repo` on each Bedrock call via request metadata / `X-Amzn-Bedrock-Request-Metadata` into model invocation logs. Complements IAM attribution; does not replace in-app `cost_usd`. Requires Claude Code / SDK support for metadata on InvokeModel. | | **Cost dashboard and export API** | Log Insights widgets on invocation logs; optional API or export for monthly spend roll-ups by `user_id` / `repo` from the task table. Operator dashboard today covers task-level cost aggregates, not Bedrock chargeback dimensions. | | **Optional tagged application inference profiles** | CDK-managed Bedrock application inference profiles per onboarded repo or environment; set `ANTHROPIC_MODEL` to tagged profile ARN for `resourceTags/*` billing when repo count is bounded. | @@ -228,7 +228,7 @@ Planned capabilities, grouped by theme. Items are independent and may ship in an | Capability | Description | |------------|-------------| -| **Deployed runtime E2E verification** | **Phase 0 landed:** `@aws-cdk/integ-tests-alpha` + `integ-runner` deploy a trimmed Task API stack to a real account, assert the create-and-persist happy path (task persists at `SUBMITTED`), then tear it down (`mise //cdk:integ`). In CI it runs per-PR via `workflow_run` when the diff touches `cdk/**` or `agent/**`, behind the `integ` environment's admin-approval gate, and posts a required `integ-smoke` status that blocks merge (`workflow_dispatch` retained for manual runs). Phase 1 (full lifecycle / real agent runs) and Phase 2 (channels) follow. See [ADR-013](/architecture/adr-013-tiered-validation-pyramid). | +| **Deployed runtime E2E verification** | **Phase 0 landed:** `@aws-cdk/integ-tests-alpha` + `integ-runner` deploy a trimmed Task API stack to a real account, assert the create-and-persist happy path (task persists at `SUBMITTED`), then tear it down (`mise //cdk:integ`). In CI it runs per-PR via `workflow_run` when the diff touches `cdk/**` or `agent/**`, behind the `integ` environment's admin-approval gate, and posts a required `integ-smoke` status that blocks merge (`workflow_dispatch` retained for manual runs). Phase 1 (full lifecycle / real agent runs) and Phase 2 (channels) follow. See [ADR-013](/sample-autonomous-cloud-coding-agents/architecture/adr-013-tiered-validation-pyramid). | | **Admission backlog observability** | Metric and alarm when `SUBMITTED` task depth exceeds an operator threshold (capacity and admission health). | | **Admission queue with deferred pickup** | When admission is at capacity, persist tasks in a durable queue instead of failing them. Automatically re-attempt admission and continue processing in FIFO order (with optional priority lanes) as concurrency becomes available. Preserve cancel/idempotency semantics and expose queue position/ETA in task status. | | **Safe orchestrator deploys** | Pre-deploy checks for active tasks (drain or warn); blue-green or canary Lambda deploy for the durable orchestrator with rollback on error regressions (`OBSERVABILITY.md`). | @@ -275,4 +275,4 @@ Planned capabilities, grouped by theme. Items are independent and may ship in an --- -Design docs to keep in sync: [ARCHITECTURE.md](/architecture/architecture), [ORCHESTRATOR.md](/architecture/orchestrator), [API_CONTRACT.md](/architecture/api-contract), [ATTACHMENTS.md](/architecture/attachments), [CEDAR_HITL_GATES.md](/architecture/cedar-hitl-gates), [INPUT_GATEWAY.md](/architecture/input-gateway), [INTERACTIVE_AGENTS.md](/architecture/interactive-agents), [REPO_ONBOARDING.md](/architecture/repo-onboarding), [MEMORY.md](/architecture/memory), [OBSERVABILITY.md](/architecture/observability), [COMPUTE.md](/architecture/compute), [SECURITY.md](/architecture/security), [EVALUATION.md](/architecture/evaluation). +Design docs to keep in sync: [ARCHITECTURE.md](/sample-autonomous-cloud-coding-agents/architecture/architecture), [ORCHESTRATOR.md](/sample-autonomous-cloud-coding-agents/architecture/orchestrator), [API_CONTRACT.md](/sample-autonomous-cloud-coding-agents/architecture/api-contract), [ATTACHMENTS.md](/sample-autonomous-cloud-coding-agents/architecture/attachments), [CEDAR_HITL_GATES.md](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates), [INPUT_GATEWAY.md](/sample-autonomous-cloud-coding-agents/architecture/input-gateway), [INTERACTIVE_AGENTS.md](/sample-autonomous-cloud-coding-agents/architecture/interactive-agents), [REPO_ONBOARDING.md](/sample-autonomous-cloud-coding-agents/architecture/repo-onboarding), [MEMORY.md](/sample-autonomous-cloud-coding-agents/architecture/memory), [OBSERVABILITY.md](/sample-autonomous-cloud-coding-agents/architecture/observability), [COMPUTE.md](/sample-autonomous-cloud-coding-agents/architecture/compute), [SECURITY.md](/sample-autonomous-cloud-coding-agents/architecture/security), [EVALUATION.md](/sample-autonomous-cloud-coding-agents/architecture/evaluation). diff --git a/docs/src/content/docs/using/Approval-gates-cedar-hitl.md b/docs/src/content/docs/using/Approval-gates-cedar-hitl.md index daa28cbc..b28882b5 100644 --- a/docs/src/content/docs/using/Approval-gates-cedar-hitl.md +++ b/docs/src/content/docs/using/Approval-gates-cedar-hitl.md @@ -6,7 +6,7 @@ The platform evaluates every tool call the agent is about to make (Bash, Write, The mechanism is Cedar HITL gates — "Human-In-The-Loop." It is the same policy language you can already author at the blueprint level, with one added annotation (`@tier("soft")`) that flips a rule from hard-deny to require-approval. -For the full design and guarantees (atomicity, fail-closed posture, timeout semantics, late-approval handling), see [Cedar HITL gates design doc](/architecture/cedar-hitl-gates). For writing policies, see the [Cedar policy guide](/customizing/cedar-policies). +For the full design and guarantees (atomicity, fail-closed posture, timeout semantics, late-approval handling), see [Cedar HITL gates design doc](/sample-autonomous-cloud-coding-agents/architecture/cedar-hitl-gates). For writing policies, see the [Cedar policy guide](/sample-autonomous-cloud-coding-agents/customizing/cedar-policies). ### When a gate fires diff --git a/docs/src/content/docs/using/Authentication.md b/docs/src/content/docs/using/Authentication.md index 54e16ae3..d33ff98f 100644 --- a/docs/src/content/docs/using/Authentication.md +++ b/docs/src/content/docs/using/Authentication.md @@ -82,7 +82,7 @@ Three steps: You're in. `bgagent submit`, `bgagent list`, `bgagent status` work against the shared stack. Tasks you submit are attributed to your Cognito user; concurrency caps and budgets are scoped to you. -**You do not run** `bgagent linear setup` or `bgagent slack setup` — those are workspace-level operations performed once by the stack/workspace admin. If you want Linear-triggered tasks to be attributed to *you* (not auto-dropped), the admin needs to map your Linear identity to your Cognito user; ask them about [Linear user linking](/using/linear-setup-guide#step-6-link-your-linear-account). +**You do not run** `bgagent linear setup` or `bgagent slack setup` — those are workspace-level operations performed once by the stack/workspace admin. If you want Linear-triggered tasks to be attributed to *you* (not auto-dropped), the admin needs to map your Linear identity to your Cognito user; ask them about [Linear user linking](/sample-autonomous-cloud-coding-agents/using/linear-setup-guide#step-6-link-your-linear-account). If something looks broken (commands fail with `Not configured` or `401 Unauthorized`), re-paste the bundle and re-run `bgagent login`. The bundle holds no secrets — your password (separate) is the credential. diff --git a/docs/src/content/docs/using/Deploy-preview-screenshots-guide.md b/docs/src/content/docs/using/Deploy-preview-screenshots-guide.md index 53c12ee7..cd01304e 100644 --- a/docs/src/content/docs/using/Deploy-preview-screenshots-guide.md +++ b/docs/src/content/docs/using/Deploy-preview-screenshots-guide.md @@ -6,7 +6,7 @@ title: Deploy preview screenshots guide Wire your repo into ABCA so that every preview deploy gets screenshotted and posted as a comment on the open GitHub PR. If you also have Linear configured, the same screenshot is posted to the linked Linear issue as a bonus. -> The pipeline only needs GitHub. Linear posting is opt-in: present iff `LinearWorkspaceRegistryTable` has at least one active row (configured via [Linear setup guide](/using/linear-setup-guide)). Without Linear, the GitHub-side screenshot still works; the Linear-side just no-ops silently. +> The pipeline only needs GitHub. Linear posting is opt-in: present iff `LinearWorkspaceRegistryTable` has at least one active row (configured via [Linear setup guide](/sample-autonomous-cloud-coding-agents/using/linear-setup-guide)). Without Linear, the GitHub-side screenshot still works; the Linear-side just no-ops silently. ## Works with any provider that posts `deployment_status` diff --git a/docs/src/content/docs/using/Linear-pak-migration-runbook.md b/docs/src/content/docs/using/Linear-pak-migration-runbook.md index 1320acda..b83d63c6 100644 --- a/docs/src/content/docs/using/Linear-pak-migration-runbook.md +++ b/docs/src/content/docs/using/Linear-pak-migration-runbook.md @@ -4,7 +4,7 @@ title: Linear pak migration runbook # Linear PAK → OAuth migration runbook (Phase 2.0a → 2.0b) -> **Who needs this.** Operators who deployed Phase 2.0a (single Linear personal API key shared across all teammates) and need to upgrade to 2.0b (per-workspace OAuth). If you're starting fresh on 2.0b, read [LINEAR_SETUP_GUIDE.md](/using/linear-setup-guide) instead. +> **Who needs this.** Operators who deployed Phase 2.0a (single Linear personal API key shared across all teammates) and need to upgrade to 2.0b (per-workspace OAuth). If you're starting fresh on 2.0b, read [LINEAR_SETUP_GUIDE.md](/sample-autonomous-cloud-coding-agents/using/linear-setup-guide) instead. 2.0b is a **hard cutover** — no `--use-pak` fallback. Plan for a short maintenance window (~30 min for a single workspace). @@ -31,7 +31,7 @@ Run these BEFORE deploying 2.0b so the maintenance window is short: 2. **Deploy 2.0b**: `mise //cdk:deploy`. Adds `LinearWorkspaceRegistryTable`, removes `LinearApiTokenSecret` + IAM grants, adds the `bgagent-linear-oauth-*` prefix grant on the agent runtime, webhook processor, and orchestrator. -3. **For each Linear workspace**, follow the [setup walkthrough](/using/linear-setup-guide#setup-walkthrough) starting at step 2. Each workspace needs: +3. **For each Linear workspace**, follow the [setup walkthrough](/sample-autonomous-cloud-coding-agents/using/linear-setup-guide#setup-walkthrough) starting at step 2. Each workspace needs: - A new Linear OAuth app (scopes: `read,write,app:assignable,app:mentionable`) - `bgagent linear setup ` to run the OAuth dance and write the per-workspace secret - Webhook signing secret pasted into ABCA via `update-webhook-secret` diff --git a/docs/src/content/docs/using/Linear-setup-guide.md b/docs/src/content/docs/using/Linear-setup-guide.md index aa39d07c..31a3f5fa 100644 --- a/docs/src/content/docs/using/Linear-setup-guide.md +++ b/docs/src/content/docs/using/Linear-setup-guide.md @@ -8,8 +8,8 @@ Set up the ABCA Linear integration so that applying a label to a Linear issue tr ## Prerequisites -- ABCA CDK stack deployed (see [Developer guide](/developer-guide/introduction)) -- A Cognito user account configured (see [User guide](/using/overview)) +- ABCA CDK stack deployed (see [Developer guide](/sample-autonomous-cloud-coding-agents/developer-guide/introduction)) +- A Cognito user account configured (see [User guide](/sample-autonomous-cloud-coding-agents/using/overview)) - A Linear workspace where you have **admin** access - The `bgagent` CLI installed and logged in (`bgagent configure` + `bgagent login`) @@ -114,7 +114,7 @@ The CLI shows a picker of human Linear members in the workspace. After you pick The teammate needs their own ABCA account first (Cognito user + configured CLI). If they don't have one yet: -1. **Admin** runs `bgagent admin invite-user teammate@example.com` to create their Cognito user (see [User guide → Joining an existing deployment](/using/overview#joining-an-existing-deployment) for the full Cognito-side flow). +1. **Admin** runs `bgagent admin invite-user teammate@example.com` to create their Cognito user (see [User guide → Joining an existing deployment](/sample-autonomous-cloud-coding-agents/using/overview#joining-an-existing-deployment) for the full Cognito-side flow). 2. **Teammate** pastes the bundle + password from the admin into: ```bash diff --git a/docs/src/content/docs/using/Overview.md b/docs/src/content/docs/using/Overview.md index 564006f0..c270f96b 100644 --- a/docs/src/content/docs/using/Overview.md +++ b/docs/src/content/docs/using/Overview.md @@ -9,7 +9,7 @@ There are five ways to interact with the platform. You can use them independentl 1. **CLI** (recommended) - The `bgagent` CLI authenticates via Cognito and calls the Task API. Best for individual developers submitting tasks from the terminal. Handles login, token caching, and output formatting. 2. **REST API** (direct) - Call the Task API endpoints directly with a JWT token. Best for building custom integrations, dashboards, or internal tools on top of the platform. Full validation, audit logging, and idempotency support. 3. **Webhook** - External systems (CI pipelines, GitHub Actions) can create tasks via HMAC-authenticated HTTP requests. Best for automated workflows where tasks should be triggered by events (e.g., a new issue is labeled, a PR needs review). No Cognito credentials needed; uses a shared secret per integration. -4. **Slack** - Submit tasks by @mentioning the bot and receive threaded progress notifications with reaction-based status. See the [Slack setup guide](/using/slack-setup-guide). -5. **Linear** - Apply a label to a Linear issue to trigger a task; the agent posts progress comments back on the issue via Linear's MCP server. See the [Linear setup guide](/using/linear-setup-guide). +4. **Slack** - Submit tasks by @mentioning the bot and receive threaded progress notifications with reaction-based status. See the [Slack setup guide](/sample-autonomous-cloud-coding-agents/using/slack-setup-guide). +5. **Linear** - Apply a label to a Linear issue to trigger a task; the agent posts progress comments back on the issue via Linear's MCP server. See the [Linear setup guide](/sample-autonomous-cloud-coding-agents/using/linear-setup-guide). For example, a team might use the **CLI** for ad-hoc tasks, **webhooks** to auto-trigger `coding/pr-review-v1` on every new PR via GitHub Actions, **Slack** for quick team-wide requests, **Linear** for tickets that already live in the PM tool, and the **REST API** to build a dashboard that tracks task status across repositories. \ No newline at end of file diff --git a/docs/src/content/docs/using/Roles.md b/docs/src/content/docs/using/Roles.md index 9471b8d0..e14199f5 100644 --- a/docs/src/content/docs/using/Roles.md +++ b/docs/src/content/docs/using/Roles.md @@ -15,4 +15,4 @@ There are four lifecycle roles. They are often the same person early on, but the If you're a teammate joining an existing deployment, jump to [Joining an existing deployment](#joining-an-existing-deployment) below. -If you're standing up a new deployment from scratch, see the [Developer guide](/developer-guide/introduction) first, then come back here for the [admin onboarding flow](#get-stack-outputs). \ No newline at end of file +If you're standing up a new deployment from scratch, see the [Developer guide](/sample-autonomous-cloud-coding-agents/developer-guide/introduction) first, then come back here for the [admin onboarding flow](#get-stack-outputs). \ No newline at end of file diff --git a/docs/src/content/docs/using/Slack-setup-guide.md b/docs/src/content/docs/using/Slack-setup-guide.md index 90909790..85623fc4 100644 --- a/docs/src/content/docs/using/Slack-setup-guide.md +++ b/docs/src/content/docs/using/Slack-setup-guide.md @@ -8,9 +8,9 @@ This guide walks through setting up the ABCA Slack integration. Once configured, ## Prerequisites -- ABCA CDK stack deployed (see [Developer guide](/developer-guide/introduction)) -- A Cognito user account configured (see [User guide](/using/overview)) -- A Slack workspace where you can install apps (use a personal free workspace if your corporate Slack restricts app installs) +- ABCA CDK stack deployed (see [Developer guide](/sample-autonomous-cloud-coding-agents/developer-guide/introduction)) +- A Cognito user account configured (see [User guide](/sample-autonomous-cloud-coding-agents/using/overview)) +- A Slack workspace where you are authorized to install apps (if your workspace requires admin approval for app installs, request it through your Slack administrator) - AWS CLI configured with credentials for your ABCA account ## Quick start @@ -153,7 +153,7 @@ Run `/bgagent link` in Slack, then `bgagent slack link ` in your terminal. ### "Repository not onboarded" -The repo must be registered with a Blueprint before submitting tasks. See the [User guide](/using/overview). +The repo must be registered with a Blueprint before submitting tasks. See the [User guide](/sample-autonomous-cloud-coding-agents/using/overview). ### OAuth install fails (bad_client_secret) diff --git a/docs/src/content/docs/using/Task-lifecycle.md b/docs/src/content/docs/using/Task-lifecycle.md index fcb52f83..e416e934 100644 --- a/docs/src/content/docs/using/Task-lifecycle.md +++ b/docs/src/content/docs/using/Task-lifecycle.md @@ -81,7 +81,7 @@ If a task fails with a `preflight_failed` event, the platform rejected the run b - `INSUFFICIENT_GITHUB_REPO_PERMISSIONS` - The PAT lacks the required permissions for the workflow. For `coding/new-task-v1` and `coding/pr-iteration-v1`, you need Contents (read/write) and Pull requests (read/write). For `coding/pr-review-v1`, Triage or higher is enough. - `PR_NOT_FOUND_OR_CLOSED` - The specified PR does not exist or is already closed. -To fix permission issues, update the GitHub PAT in AWS Secrets Manager and submit a new task. See [Developer guide - Repository preparation](/developer-guide/repository-preparation) for the full permissions table. +To fix permission issues, update the GitHub PAT in AWS Secrets Manager and submit a new task. See [Developer guide - Repository preparation](/sample-autonomous-cloud-coding-agents/developer-guide/repository-preparation) for the full permissions table. ### Viewing logs @@ -109,4 +109,4 @@ If your repo is wired to a deploy provider that publishes GitHub `deployment_sta This runs independently of the agent: there's no LLM involved, just a Lambda that drives a headless browser via AgentCore Browser. End-to-end latency is typically 10–15 seconds after the deploy provider reports success. -Setup is opt-in and per-repo. See the [Deploy preview screenshots guide](/using/deploy-preview-screenshots-guide) for the wiring (one webhook on the repo, one secret pasted into AWS). \ No newline at end of file +Setup is opt-in and per-repo. See the [Deploy preview screenshots guide](/sample-autonomous-cloud-coding-agents/using/deploy-preview-screenshots-guide) for the wiring (one webhook on the repo, one secret pasted into AWS). \ No newline at end of file diff --git a/docs/src/content/docs/using/Tips-for-being-a-good-citizen.md b/docs/src/content/docs/using/Tips-for-being-a-good-citizen.md index 8449091f..68f830cf 100644 --- a/docs/src/content/docs/using/Tips-for-being-a-good-citizen.md +++ b/docs/src/content/docs/using/Tips-for-being-a-good-citizen.md @@ -9,8 +9,8 @@ The platform is a shared resource - compute, model tokens, and GitHub API calls The agent is only as good as the context it receives. A well-prepared repository leads to faster, higher-quality results. - **Onboard first** - Repositories must be registered via a Blueprint construct before tasks can target them. If you get a `REPO_NOT_ONBOARDED` error, contact your platform administrator. -- **Add a CLAUDE.md** - This is the single most impactful thing you can do. The agent loads project configuration from `CLAUDE.md`, `.claude/rules/*.md`, `.claude/settings.json`, and `.mcp.json` in your repository. Use these to document build commands, coding conventions, architecture decisions, and constraints. A good `CLAUDE.md` prevents the agent from guessing and reduces wasted turns. See the [Prompt guide](/customizing/prompt-engineering#repo-level-customization) for examples. -- **Keep your PAT aligned** - If tasks fail with `preflight_failed`, the GitHub PAT likely lacks the permissions the task type needs. Check the event's `reason` field and update the secret in Secrets Manager. See [Repository preparation](/developer-guide/repository-preparation) for the full permissions table. +- **Add a CLAUDE.md** - This is the single most impactful thing you can do. The agent loads project configuration from `CLAUDE.md`, `.claude/rules/*.md`, `.claude/settings.json`, and `.mcp.json` in your repository. Use these to document build commands, coding conventions, architecture decisions, and constraints. A good `CLAUDE.md` prevents the agent from guessing and reduces wasted turns. See the [Prompt guide](/sample-autonomous-cloud-coding-agents/customizing/prompt-engineering#repo-level-customization) for examples. +- **Keep your PAT aligned** - If tasks fail with `preflight_failed`, the GitHub PAT likely lacks the permissions the task type needs. Check the event's `reason` field and update the secret in Secrets Manager. See [Repository preparation](/sample-autonomous-cloud-coding-agents/developer-guide/repository-preparation) for the full permissions table. ### Write effective task descriptions diff --git a/docs/src/content/docs/using/Using-the-rest-api.md b/docs/src/content/docs/using/Using-the-rest-api.md index c995e9c3..812399bf 100644 --- a/docs/src/content/docs/using/Using-the-rest-api.md +++ b/docs/src/content/docs/using/Using-the-rest-api.md @@ -82,7 +82,7 @@ curl -X POST "$API_URL/tasks" \ -d '{"repo": "owner/repo", "workflow_ref": "coding/pr-review-v1", "pr_number": 55, "task_description": "Focus on security implications and error handling"}' ``` -> **Selecting a workflow.** `workflow_ref` chooses which workflow runs the task, in the form `[@]` (e.g. `coding/new-task-v1`). It replaced the old `task_type` field (see [Workflows](/architecture/workflows)). Omit it and the platform resolves a default — the repo's Blueprint default if configured, otherwise the conservative `default/agent-v1`. The one-to-one mapping from the retired `task_type` values is `new_task → coding/new-task-v1`, `pr_iteration → coding/pr-iteration-v1`, `pr_review → coding/pr-review-v1`. +> **Selecting a workflow.** `workflow_ref` chooses which workflow runs the task, in the form `[@]` (e.g. `coding/new-task-v1`). It replaced the old `task_type` field (see [Workflows](/sample-autonomous-cloud-coding-agents/architecture/workflows)). Omit it and the platform resolves a default — the repo's Blueprint default if configured, otherwise the conservative `default/agent-v1`. The one-to-one mapping from the retired `task_type` values is `new_task → coding/new-task-v1`, `pr_iteration → coding/pr-iteration-v1`, `pr_review → coding/pr-review-v1`. **Request body fields:** diff --git a/docs/src/content/docs/using/Workflows.md b/docs/src/content/docs/using/Workflows.md index a1088cef..2031a2b7 100644 --- a/docs/src/content/docs/using/Workflows.md +++ b/docs/src/content/docs/using/Workflows.md @@ -2,7 +2,7 @@ title: Workflows --- -Every task runs a **workflow** — a named, versioned recipe that decides whether to clone a repo, which tools the agent may use, and how the result is delivered. You select one with `workflow_ref` (REST/webhook) or `--workflow` (CLI); the `--pr`/`--review-pr` flags select the coding PR workflows for you. If you specify nothing, the platform resolves a default (your repo's Blueprint default, or the conservative `default/agent-v1`). Workflows replace the old `task_type` field — see [Workflows](/architecture/workflows) for the full design and how to author your own. +Every task runs a **workflow** — a named, versioned recipe that decides whether to clone a repo, which tools the agent may use, and how the result is delivered. You select one with `workflow_ref` (REST/webhook) or `--workflow` (CLI); the `--pr`/`--review-pr` flags select the coding PR workflows for you. If you specify nothing, the platform resolves a default (your repo's Blueprint default, or the conservative `default/agent-v1`). Workflows replace the old `task_type` field — see [Workflows](/sample-autonomous-cloud-coding-agents/architecture/workflows) for the full design and how to author your own. The shipped workflows that cover the full lifecycle of a code change: