diff --git a/.github/agents/git-ape.agent.md b/.github/agents/git-ape.agent.md index d206482..584bc89 100644 --- a/.github/agents/git-ape.agent.md +++ b/.github/agents/git-ape.agent.md @@ -168,6 +168,72 @@ Coordinate the deployment of Azure resources by delegating to specialized subage - This is a brand new deployment (no existing resources) - User explicitly says "deploy fresh" or "new deployment" +## Session Resumption + +Before starting a new deployment workflow, check whether any previous deployment was interrupted. + +**On startup (before asking what to deploy):** +```bash +# List any deployments with a resumable phase state. +# Treat in-progress as resumable because an agent crash/kill may leave the +# phase-state file before it can be rewritten to suspended. +_jq_read() { local file=$1 field=$2 + jq -r ".$field" "$file" 2>/dev/null || \ + python3 -c "import json,sys; print(json.load(sys.stdin)['$field'])" < "$file" 2>/dev/null +} + +if [ -d .azure/deployments ]; then + for dir in .azure/deployments/*/; do + state_file="${dir}tracking/phase-state.json" + if [ -f "$state_file" ]; then + status=$(_jq_read "$state_file" status) + case "$status" in + in-progress|suspended|blocked|awaiting-confirmation) + dep=$(_jq_read "$state_file" deploymentId) + phase=$(_jq_read "$state_file" phase) + updated=$(_jq_read "$state_file" updatedAt) + echo "Found: $dep | $phase | $status | $updated" + ;; + esac + fi + done +fi +``` + +**If one or more resumable deployments are found**, show a resume prompt before asking what to deploy: + +```markdown +⏸️ In-Progress Deployment Found + +| Deployment | Last Stage | Status | Last Updated | +|------------|-----------|--------|--------------| +| {deploymentId} | {phase} | {status} | {updatedAt} | + +Would you like to: +R. Resume — continue from {phase} +N. Start a new deployment +V. View details of the interrupted deployment +``` + +**If user chooses R:** +1. Set `DEPLOYMENT_ID` to the found deployment's ID +2. Read existing artifacts from `.azure/deployments/$DEPLOYMENT_ID/` to reconstruct context +3. Determine the resume point from `completedPhases` in `phase-state.json`: + - Skip all phases listed in `completedPhases` + - Re-enter at the phase listed in `phase` with `status` of `in-progress`, `suspended`, `blocked`, or `awaiting-confirmation` +4. For `blocked` at the security gate: re-show the blocking findings from `security-gate.json` and present options A/B/C/D again +5. For `awaiting-confirmation`: re-display the deployment summary and wait for user approval +6. For `in-progress` or `suspended` mid-phase: re-invoke the subagent for that phase with the existing artifact context + +**If user chooses N:** Proceed with a fresh deployment. Do not modify the interrupted deployment's state. + +**If user chooses V:** Read and display `requirements.json` summary and the current `phase-state.json`, then ask R or N. + +**Skip resume check if:** +- User's opening message explicitly says "new deployment", "start fresh", "skip resume", or "ignore previous" +- No `.azure/deployments/` directory exists +- All found `phase-state.json` files have terminal statuses (`completed`, `failed`, `aborted`) + ## Workflow Stages ### Stage 1: Requirements Gathering @@ -489,10 +555,77 @@ For each deployment, save: **In Headless Mode:** Commit these files to the branch so the PR shows full deployment artifacts. The PR diff becomes the deployment review. +**Phase State Tracking:** + +Write `tracking/phase-state.json` at every stage boundary so interrupted sessions can be resumed. + +**Pattern — write BEFORE invoking subagent (`status: in-progress`), then AFTER it returns (`status: completed`):** + +```bash +# Before delegating to a subagent: +mkdir -p ".azure/deployments/$DEPLOYMENT_ID/tracking" +PHASE_STARTED_AT="$(date -u +%Y-%m-%dT%H:%M:%SZ)" +COMPLETED_PHASES_JSON="${COMPLETED_PHASES_JSON:-[]}" + +jq -n \ + --arg deploymentId "$DEPLOYMENT_ID" \ + --arg phase "$PHASE" \ + --arg status "in-progress" \ + --arg startedAt "$PHASE_STARTED_AT" \ + --arg updatedAt "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \ + --arg mode "$MODE" \ + --argjson completedPhases "$COMPLETED_PHASES_JSON" \ + '{ + deploymentId: $deploymentId, + phase: $phase, + status: $status, + startedAt: $startedAt, + updatedAt: $updatedAt, + completedPhases: $completedPhases, + mode: $mode + }' > ".azure/deployments/$DEPLOYMENT_ID/tracking/phase-state.json" + +# After the subagent returns successfully: +# Re-write with status: completed and the current phase added to completedPhases +``` + +**Phase names and transition events:** + +| Phase | Write `in-progress` when | Write `completed` when | +|-------|--------------------------|------------------------| +| `stage-1-requirements` | Gatherer invoked | `requirements.json` saved + user confirmed | +| `stage-2-template` | Generator invoked | `template.json`, `security-analysis.md`, `preflight-report.md`, `cost-estimate.json` saved | +| `stage-2.5-security-gate` | Gate evaluated | `security-gate.json` shows `PASSED` or `OVERRIDDEN` | +| `stage-2.75-waf-review` | Architect invoked | `waf-review.md` saved | +| `stage-2.85-confirmation` | Confirmation shown to user | User types yes/proceed/deploy | +| `stage-3-deployment` | Deployer invoked | `deployment.log` saved with succeeded status | +| `stage-4-validation` | Integration tests start | `tests.json` saved | + +**Terminal statuses (never resumed):** `completed`, `failed`, `aborted` + +**Resumable statuses:** `in-progress`, `suspended`, `blocked`, `awaiting-confirmation` + +`in-progress` is resumable because a crash, kill, or host interruption can stop the agent before it writes `suspended`. If `updatedAt` is very recent, tell the user it may represent a still-running session before offering R/N/V. + +**On any abort or unrecoverable error — write suspended state before ending:** +```bash +# Write status: suspended with a suspendReason before the session closes +# Include all phases completed so far in completedPhases +``` + +**Special case — Stage 2.5 Security Gate BLOCKED, user chose D (abort for now):** +Write `status: blocked` (not `suspended`) — resume will re-show the blocking findings from `security-gate.json` and offer A/B/C/D again. + +**Special case — Stage 2.85 awaiting confirmation:** +Write `status: awaiting-confirmation` immediately after showing the deployment summary, before waiting for the user's yes/no. Resume will re-display the summary. + +**Final state — all stages complete:** +Write `status: completed` with all 7 phase keys in `completedPhases`. This deployment is never offered for resumption. + **Before Starting:** ```bash DEPLOYMENT_ID="deploy-$(date +%Y%m%d-%H%M%S)" -mkdir -p .azure/deployments/$DEPLOYMENT_ID +mkdir -p .azure/deployments/$DEPLOYMENT_ID/tracking ``` **After Each Stage:**