diff --git a/.github/docs/state-machine.md b/.github/docs/state-machine.md index 379cb9185c7..36e84070620 100644 --- a/.github/docs/state-machine.md +++ b/.github/docs/state-machine.md @@ -1,261 +1,560 @@ -# Agentic Workflow State Machine - -Auto-generated documentation of all agentic workflows in this repository. - -## Workflow Overview - -| Workflow | Trigger | Reads | Writes | Key Labels | -|----------|---------|-------|--------|------------| -| **repo-assist** | ⏰ every 12h, `/repo-assist` | Issues, PRs, code, tests | comment, PR, issue, labels | `AI-thinks-issue-fixed`, `AI-thinks-windows-only`, `AI-Issue-Regression-PR` | -| **labelops-pr-maintenance** | ⏰ every 3h | PRs with AI-Auto-Resolve-* labels, CI status | comment, push, labels, dispatch | `AI-Auto-Resolve-CI`, `AI-Auto-Resolve-Conflicts`, `AI-needs-CI-fix-input` | -| **regression-pr-shepherd** | ⏰ every 4h | PRs with `AI-Issue-Regression-PR` | comment, push, remove-labels | `AI-Issue-Regression-PR`, `AI-thinks-issue-fixed` | -| **labelops-flake-fix** | πŸ€– dispatched by labelops-pr-maintenance | Test results, PR diffs | PR, comment, issue | `Flaky`, `automation` | -| **labelops-pr-security-scan** | ⏰ every 1h | PR diffs, file lists, repo rules | labels, comment | `AI-Tooling-Check-Scanned-Clean`, `AI-Tooling-Check-Bypassed`, `⚠️ Affects-*`, `⚠️ Suspicious-Prompting`, `⚠️ Scope-Review-Needed` | -| **aw-auto-update** | ⏰ every 24h | `.github/workflows/*` files | agent-session | `automation` | - -## Issue Lifecycle +# dotnet/fsharp β€” Agentic State Machine + +> **15 workflows documented.** Source: `.github/workflows/` Β· FULL_REWRITE (generator `d4fe5640de7eb85c`). + +> **What this doc is.** A map of the 15 GitHub Actions workflows and AI agents in this repository β€” their triggers, control flow, side effects, and how they hand work to each other. Use it to understand "what runs when, who acts on what, and where does this label come from." + +## Glossary + +- **gh-aw** β€” `gh aw`, a GitHub CLI extension that compiles agentic workflow `.md` files (frontmatter + prompt) into runnable `.lock.yml` GitHub Actions workflows. See [github/gh-aw](https://github.com/githubnext/gh-aw). +- **gh-aw workflow (`*.md`)** β€” A workflow defined as a Markdown file with YAML frontmatter declaring triggers (`on:`), tools (`tools:`), and `safe-outputs:`. The body is the LLM prompt. +- **safe-outputs** β€” gh-aw's runtime permission and rate-limit framework. Each safe-output key (`create-pull-request`, `add-comment`, `add-labels`, `push-to-pull-request-branch`, `dispatch-workflow`, `noop`, etc.) constrains what side effects an agent run can produce: maximum count per run (`max:`), allowed file globs (`allowed-files:`), required label set, fallback behavior on validation failure. +- **noop** β€” A safe-output that lets an agent end a run cleanly with no side effects (no PR, no comment). `report-as-issue: false` suppresses opening an issue when the noop is the only output β€” i.e., a successful no-op is silent. +- **CCA (Copilot Coding Agent)** β€” Microsoft's hosted coding agent invoked via `create-agent-session`. The calling workflow hands off a task; CCA executes it asynchronously and writes results back to the repo. +- **state-store branch** β€” A dedicated git branch (e.g., `memory/repo-assist`, `safety/scanned-PRs`) used as persistent JSON storage between scheduled workflow runs. Avoids depending on external infrastructure. +- **flaky-test-detector** β€” A repository tool / Copilot skill that confirms a test is flaky by checking for failure evidence across β‰₯3 distinct unrelated PRs (not the originating PR). +- **Cat A/B/C** (regression-pr-shepherd) β€” Triage categories for regression-test PRs. **A**: new human review feedback since last bot comment. **B**: CI failure or merge conflict. **C**: healthy (skip this round). +- **B0–B4** (regression-pr-shepherd, Category B subtypes) β€” **B0**: merge conflict (rebase + resolve). **B1**: infrastructure/flaky failure (retry CI). **B2**: test compilation or setup error (fix test code). **B3**: added test fails β€” bug NOT fixed (remove label + comment + close PR). **B4**: other failures (note + re-trigger). +- **`.lock.yml`** β€” The compiled GitHub Actions YAML produced by `gh aw compile` from a gh-aw `.md` source. Checked in and executed by GitHub Actions directly; never hand-edited. +- **`BSL` (baseline)** β€” F# compiler test baseline files (`.bsl`); tests diff against these. `BSL auto-accept` means automatically regenerating baselines instead of comparing β€” disallowed in some auto-resolve paths because it masks regressions. +- **`dotnet/skills`** β€” A separate Microsoft repository hosting reusable Copilot skills (validators, tools, agent prompts). Some workflows (e.g., `skill-validation.yml`) download nightly-release binaries from there. +- **`FCS` (F# Compiler Service)** β€” The F# compiler-as-a-library used by IDEs and tools. "FCS-testable" issues can be reproduced/fixed via the compiler service without needing Visual Studio. +- **`has_ci`** (labelops-pr-maintenance) β€” PR has CI runs to evaluate. +- **`has_conflicts`** (labelops-pr-maintenance) β€” PR has merge conflicts. +- **`ci_blocked`** (labelops-pr-maintenance) β€” CI hasn't started yet (queued or blocked by other workflows). +- **`12h stuck guard`** (labelops-pr-maintenance) β€” Skip the PR if LabelOps already committed within the last 12 hours AND checks are still red. Prevents retry storms. +- **milestone 29 / `2026-05-12` cutoff** β€” Repo-specific operational constants: milestone 29 is applied by `add_to_project.yml` (internal milestone number; title unconfirmed from source); the `2026-05-12` cutoff in `labelops-pr-security-scan` is the date the scanner went live β€” PRs opened before that date are skipped to avoid re-scanning historical PRs. + +## Legend + +| Symbol | Meaning | +|---|---| +| ⏰ | scheduled trigger (cron) | +| πŸ‘€ | human-initiated trigger (manual dispatch, PR/issue events, comment, reaction) | +| βš™οΈ | workflow-engine action (job condition, step logic, push event, internal evaluation) | +| πŸ€– | agent/bot action (safe-output emission, dispatch, autonomous write) | +| `<>` | binary branch on a guard condition | +| `<>` / `<>` | parallel job split / join | + +--- + +## Overview + +| Workflow | Trigger | Inputs | Primary Actions | +|---|---|---|---| +| `agentic-state-machine.md` | schedule 7d, dispatch | none | `noop`, `create-pull-request` | +| `aw-auto-update.md` | schedule 24h, dispatch | none | `noop`, `create-agent-session` | +| `labelops-flake-fix.md` | dispatch | `failing_test`, `affected_prs`, `originating_pr` (all req) | `create-pull-request`, `add-comment`, `create-issue` | +| `labelops-pr-maintenance.md` | schedule 3h, dispatch | none | `noop`, `add-comment`, `push-to-pull-request-branch`, `add-labels`, `dispatch-workflow` | +| `labelops-pr-security-scan.md` | schedule 1h, dispatch | none | `noop`, `add-labels`, `add-comment` | +| `regression-pr-shepherd.md` | schedule 4h, dispatch | none | `noop`, `add-comment`, `push-to-pull-request-branch`, `remove-labels` | +| `repo-assist.md` | schedule 12h, dispatch, slash_command `/repo-assist`, reaction `eyes` | none | `noop`, `messages`, `add-comment`, `create-pull-request`, `push-to-pull-request-branch`, `create-issue`, `update-issue`, `add-labels`, `remove-labels` | +| `add_to_project.yml` | issues (opened, transferred), pull_request_target (opened, main) | β€” | add label `Needs-Triage`, set milestone 29 | +| `backport.yml` | issue_comment (created), schedule (cron `0 13 * * *`) | β€” | delegates to `dotnet/arcade` backport-base.yml | +| `branch-merge.yml` | push (main, release/\*) | β€” | delegates to `dotnet/arcade` inter-branch-merge-base.yml | +| `check_release_notes.yml` | pull_request_target (opened/sync/reopened/labeled/unlabeled; main, release/\*) | β€” | create or update PR comment | +| `commands.yml` | issue_comment (created) | β€” | apply patch to PR branch, comment on PR | +| `copilot-setup-steps.yml` | dispatch | none | build environment setup for Copilot agent | +| `repository_lockdown_check.yml` | pull_request_target (opened/sync/reopened; main, release/\*) | β€” | create, update, or delete lockdown comment | +| `skill-validation.yml` | pull_request (`.github/skills/**`, `.github/agents/**`), push (main), dispatch | none | validate skills and agents | + +--- + +## Group A1 β€” Agentic Infrastructure ```mermaid stateDiagram-v2 direction LR - - [*] --> Open: πŸ‘€ contributor files issue - Open --> Investigated: πŸ€– repo-assist (⏰ 12h) - Investigated --> FixedCandidate: πŸ€– repo-assist adds AI-thinks-issue-fixed - Investigated --> WindowsOnly: πŸ€– repo-assist adds AI-thinks-windows-only - - state "Fix Verification" as FixVerify { - FixedCandidate --> TestExists: πŸ€– repo-assist finds existing test - FixedCandidate --> TestPRCreated: πŸ€– repo-assist creates regression test PR - FixedCandidate --> NotActuallyFixed: πŸ€– repo-assist/shepherd removes label + state "agentic-state-machine" as ASM { + direction LR + [*] --> ASM_ReadManifest : ⏰ schedule (every 7d) + [*] --> ASM_ReadManifest : πŸ‘€ workflow_dispatch (inputs: none) + state ASM_ModeCheck <> + ASM_ReadManifest --> ASM_ModeCheck : βš™οΈ mode detection via manifest header (generator SHA + source SHAs) + ASM_ModeCheck --> ASM_Noop : βš™οΈ NOOP β€” INCREMENTAL + all SHAs unchanged + ASM_ModeCheck --> ASM_Generate : βš™οΈ FULL_REWRITE or changed SHAs + ASM_Noop --> [*] : βš™οΈ noop (report-as-issue: false) + ASM_Generate --> ASM_Validate : βš™οΈ build model, draft diagrams, run all structural + behavioral audits + ASM_Validate --> [*] : πŸ€– create-pull-request } - - TestExists --> Closed: πŸ‘€ maintainer closes - TestPRCreated --> Closed: βš™οΈ CI passes + πŸ‘€ maintainer merges PR - NotActuallyFixed --> Open: label removed, issue remains open - - state "Windows-Only Reassessment" as WinReassess { - WindowsOnly --> Reclassified: πŸ€– repo-assist removes AI-thinks-windows-only - WindowsOnly --> ConfirmedWinOnly: πŸ€– repo-assist keeps label + state "aw-auto-update" as AWU { + direction LR + [*] --> AWU_Install : ⏰ schedule (every 24h) + [*] --> AWU_Install : πŸ‘€ workflow_dispatch (inputs: none) + state AWU_Installed <> + AWU_Install --> AWU_Installed : βš™οΈ gh extension install github/gh-aw + AWU_Installed --> AWU_Noop : βš™οΈ install failed + AWU_Installed --> AWU_Upgrade : βš™οΈ installed + state AWU_Upgraded <> + AWU_Upgrade --> AWU_Upgraded : βš™οΈ gh aw upgrade + AWU_Upgraded --> AWU_Noop : βš™οΈ upgrade failed + AWU_Upgraded --> AWU_Compile : βš™οΈ upgraded + state AWU_Compiled <> + AWU_Compile --> AWU_Compiled : βš™οΈ gh aw compile + AWU_Compiled --> AWU_Noop : βš™οΈ compile errors + AWU_Compiled --> AWU_Capture : βš™οΈ compiled + AWU_Capture --> AWU_Reset : βš™οΈ capture NEW_VERSION + DIFF_STAT + CHANGED_FILES + AWU_Reset --> AWU_Dedupe : βš™οΈ git reset --hard && git clean -fd + state AWU_DupeExists <> + AWU_Dedupe --> AWU_DupeExists : βš™οΈ gh pr list + gh issue list (title: "[Auto Update] Agentic workflows") + AWU_DupeExists --> AWU_Noop : βš™οΈ open PR or issue found + AWU_DupeExists --> AWU_Decide : βš™οΈ none found + state AWU_HasChanges <> + AWU_Decide --> AWU_HasChanges : βš™οΈ CHANGED_FILES empty? + AWU_HasChanges --> AWU_Noop : βš™οΈ empty (normal steady state) + AWU_HasChanges --> AWU_Session : βš™οΈ non-empty β€” delegate to CCA + AWU_Session --> [*] : πŸ€– create-agent-session (to CCA) + AWU_Noop --> [*] : βš™οΈ noop (report-as-issue: false) } - - Reclassified --> Investigated: πŸ€– repo-assist re-investigates ``` -## Regression Test PR Lifecycle +### Safe-outputs configuration -```mermaid -stateDiagram-v2 - direction LR - - [*] --> Created: πŸ€– repo-assist creates PR (⏰ 12h) +**gh-aw safe-output defaults (suppressed below):** `target: "*"`, `noop.report-as-issue: false`. Per-workflow blocks list overrides and distinguishing config. **`noop` rows omitted β€” all gh-aw workflows emit `noop | β€” | (defaults)`.** - state "Shepherd Loop (⏰ 4h)" as ShepherdLoop { - Created --> Categorize: πŸ€– regression-pr-shepherd +| Workflow | Output | Max | Key Constraints | +|---|---|---|---| +| `agentic-state-machine.md` | `create-pull-request` | 1 | title `[Agentic State Machine] `; labels `automation, NO_RELEASE_NOTES`; allowed-files `.github/docs/**`; protected-files allowed | +| `aw-auto-update.md` | `create-agent-session` | 1 | base `main` | - state categorize <> - Categorize --> categorize - categorize --> HasFeedback: review comments exist - categorize --> CIFailing: checks failed - categorize --> Healthy: all green +--- - HasFeedback --> FixPushed: πŸ€– shepherd addresses feedback - CIFailing --> InfraRetry: πŸ€– shepherd retries (flaky/infra) - CIFailing --> CompileFixed: πŸ€– shepherd fixes test code - CIFailing --> BugStillExists: πŸ€– shepherd detects real failure - CIFailing --> MergeResolved: πŸ€– shepherd rebases +## Group A2 β€” LabelOps Agents - FixPushed --> Created: βš™οΈ CI restarts - InfraRetry --> Created: βš™οΈ CI restarts - CompileFixed --> Created: βš™οΈ CI restarts - MergeResolved --> Created: βš™οΈ CI restarts +```mermaid +stateDiagram-v2 + direction LR + state "labelops-flake-fix" as LFF { + direction LR + [*] --> LFF_Validate : πŸ‘€ workflow_dispatch (inputs: failing_test, affected_prs, originating_pr) + state LFF_InputOK <> + LFF_Validate --> LFF_InputOK : βš™οΈ validate inputs + LFF_InputOK --> [*] : βš™οΈ invalid inputs β€” exit + LFF_InputOK --> LFF_Reverify : βš™οΈ valid + state LFF_Confirmed <> + LFF_Reverify --> LFF_Confirmed : βš™οΈ flaky-test-detector: evidence across β‰₯3 of affected_prs? + LFF_Confirmed --> LFF_NoopComment : βš™οΈ not confirmed + LFF_Confirmed --> LFF_CheckAuthor : βš™οΈ confirmed + state LFF_AuthorIntro <> + LFF_CheckAuthor --> LFF_AuthorIntro : βš™οΈ originating PR introduced or modified this test (gh pr diff)? + LFF_AuthorIntro --> LFF_NoopComment : βš™οΈ yes β€” skip (would defeat PR purpose) + LFF_AuthorIntro --> LFF_Reproduce : βš™οΈ no + LFF_Reproduce --> LFF_AllFail : βš™οΈ reproduce loop up to 20 iterations, 15 min cap + state LFF_AllFail <> + LFF_AllFail --> LFF_NoopComment : βš™οΈ N/N failures β€” hard failure, not a flake + LFF_AllFail --> LFF_AnyFail : βš™οΈ not all failed + state LFF_AnyFail <> + LFF_AnyFail --> LFF_Quarantine : βš™οΈ 0/N β€” no local repro, prefer quarantine (Option B) + LFF_AnyFail --> LFF_DetermFix : βš™οΈ 1-(N-1)/N β€” classic non-determinism, prefer fix (Option A) + LFF_DetermFix --> LFF_OpenPR : βš™οΈ root cause fixed, 0/20 loop verified + LFF_Quarantine --> LFF_TrackingIssue : βš™οΈ add skip marker referencing tracking issue + LFF_TrackingIssue --> LFF_OpenPR : πŸ€– create-issue (labels: Flaky, automation, max: 1) + LFF_OpenPR --> LFF_Comment : πŸ€– create-pull-request (fix) + LFF_Comment --> [*] : πŸ€– add-comment on originating PR (max: 1) + LFF_NoopComment --> [*] : πŸ€– add-comment explaining skip OR noop + } + state "labelops-pr-maintenance" as LPM { + direction LR + [*] --> LPM_SelectPRs : ⏰ schedule (every 3h) + [*] --> LPM_SelectPRs : πŸ‘€ workflow_dispatch (inputs: none) + state LPM_HasPRs <> + LPM_SelectPRs --> LPM_HasPRs : βš™οΈ gh pr list (label AI-Auto-Resolve-*, ≀3) + LPM_HasPRs --> LPM_Noop : βš™οΈ no eligible PRs + LPM_HasPRs --> LPM_ClassifyPR : βš™οΈ up to 3 PRs selected + LPM_ClassifyPR --> LPM_CICheck : βš™οΈ classify (has_ci/has_conflicts/ci_blocked, 12h stuck guard) + state LPM_NeedsCITriage <> + LPM_CICheck --> LPM_NeedsCITriage : βš™οΈ has_ci AND NOT ci_blocked? + LPM_NeedsCITriage --> LPM_CITriage : βš™οΈ yes + LPM_NeedsCITriage --> LPM_ConflictCheck : βš™οΈ no + state LPM_CIHealthy <> + LPM_CITriage --> LPM_CIHealthy : βš™οΈ all checks SUCCESS/SKIPPED/NEUTRAL? + LPM_CIHealthy --> LPM_ConflictCheck : βš™οΈ yes β€” healthy + LPM_CIHealthy --> LPM_CIFixable : βš™οΈ no β€” failures exist + state LPM_CIFixable <> + LPM_CIFixable --> LPM_Fix : βš™οΈ fixable (≀3 attempts, ≀500 LOC, no BSL auto-accept) + LPM_CIFixable --> LPM_ProvenFlake : βš™οΈ not fixable + LPM_Fix --> LPM_PushCIFix : βš™οΈ reproduce locally + fix + build + targeted tests + LPM_PushCIFix --> LPM_NextPR : πŸ€– push-to-pull-request-branch + add-comment β€” stop this PR (CI restarts) + state LPM_ProvenFlake <> + LPM_ProvenFlake --> LPM_ExistingFlakePRCheck : βš™οΈ flaky-test-detector >=3 distinct PRs AND test not introduced by this PR + LPM_ProvenFlake --> LPM_Escalate : βš™οΈ not proven flake + state LPM_ExistingFlakePRCheck <> + LPM_ExistingFlakePRCheck --> LPM_NextPR : βš™οΈ existing [LabelOps Flake] PR found (skip) + LPM_ExistingFlakePRCheck --> LPM_DispatchFlakeFix : βš™οΈ no existing [LabelOps Flake] PR + LPM_DispatchFlakeFix --> LPM_ConflictCheck : πŸ€– dispatch-workflow: labelops-flake-fix (max: 3) + LPM_Escalate --> LPM_ConflictCheck : πŸ€– add-labels AI-needs-CI-fix-input + add-comment + state LPM_NeedsConflictTriage <> + LPM_ConflictCheck --> LPM_NeedsConflictTriage : βš™οΈ has_conflicts AND Step 3 did NOT push? + LPM_NeedsConflictTriage --> LPM_NextPR : βš™οΈ no conflict work needed + LPM_NeedsConflictTriage --> LPM_MergeTree : βš™οΈ yes + state LPM_ConflictsExist <> + LPM_MergeTree --> LPM_ConflictsExist : βš™οΈ git merge-tree --write-tree --messages origin/main HEAD: any CONFLICT lines? + LPM_ConflictsExist --> LPM_NextPR : βš™οΈ no CONFLICT lines β€” PR merges cleanly + LPM_ConflictsExist --> LPM_Resolve : βš™οΈ conflicts exist + LPM_Resolve --> LPM_NextPR : πŸ€– push-to-pull-request-branch + add-comment + state LPM_MorePRs <> + LPM_NextPR --> LPM_MorePRs : βš™οΈ more PRs remaining in batch (max 3)? + LPM_MorePRs --> LPM_ClassifyPR : βš™οΈ yes (next PR) + LPM_MorePRs --> LPM_Done : βš™οΈ no + LPM_Done --> [*] : βš™οΈ run complete + LPM_Noop --> [*] : βš™οΈ noop (report-as-issue: false) + } + state "labelops-pr-security-scan" as LPSS { + direction LR + [*] --> LPSS_ReadRules : ⏰ schedule (every 1h) + [*] --> LPSS_ReadRules : πŸ‘€ workflow_dispatch (inputs: none) + LPSS_ReadRules --> LPSS_ReadMemory : βš™οΈ get_file_contents .github/tooling-check-repo-rules.md from default branch + LPSS_ReadMemory --> LPSS_ListPRs : βš™οΈ load state.json from safety/scanned-PRs branch + state LPSS_HasPRs <> + LPSS_ListPRs --> LPSS_HasPRs : βš™οΈ paginate PRs (newest-first, skip isDraft, cutoff 2026-05-12) + LPSS_HasPRs --> LPSS_WriteMemory : βš™οΈ no PRs to scan + LPSS_HasPRs --> LPSS_PruneMemory : βš™οΈ PRs found + LPSS_PruneMemory --> LPSS_PerPR : βš™οΈ remove closed PRs from state.json + state LPSS_AlreadyScanned <> + LPSS_PerPR --> LPSS_AlreadyScanned : βš™οΈ state.json sha == current headRefOid? + LPSS_AlreadyScanned --> LPSS_NextPR : βš™οΈ yes β€” already scanned at this commit + LPSS_AlreadyScanned --> LPSS_ForkCheck : βš™οΈ no β€” scan needed + state LPSS_IsFork <> + LPSS_ForkCheck --> LPSS_IsFork : βš™οΈ headRepository field: is fork? + LPSS_IsFork --> LPSS_Bypass : βš™οΈ non-fork + LPSS_IsFork --> LPSS_Classify : βš™οΈ fork β€” read file list + diff + title + body + LPSS_Bypass --> LPSS_NextPR : πŸ€– add-labels: AI-Tooling-Check-Bypassed + update memory (no comment) + state LPSS_Flagged <> + LPSS_Classify --> LPSS_Flagged : βš™οΈ classify against generic + repo-specific categories + LPSS_Flagged --> LPSS_ApplyClean : βš™οΈ no categories matched + LPSS_Flagged --> LPSS_CatChanged : βš™οΈ categories matched + LPSS_ApplyClean --> LPSS_NextPR : πŸ€– add-labels: AI-Tooling-Check-Scanned-Clean + update memory (no comment) + state LPSS_CatChanged <> + LPSS_CatChanged --> LPSS_FlagsAndComment : βš™οΈ category set changed or no previous entry + LPSS_CatChanged --> LPSS_FlagsOnly : βš™οΈ category set identical to previous scan + LPSS_FlagsAndComment --> LPSS_NextPR : πŸ€– add-labels (warning labels) + add-comment (hide) + update memory + LPSS_FlagsOnly --> LPSS_NextPR : πŸ€– add-labels (warning labels) + update memory (no new comment) + state LPSS_MorePRs <> + LPSS_NextPR --> LPSS_MorePRs : βš™οΈ more PRs? + LPSS_MorePRs --> LPSS_PerPR : βš™οΈ yes (next PR) + LPSS_MorePRs --> LPSS_WriteMemory : βš™οΈ no + LPSS_WriteMemory --> [*] : βš™οΈ save state.json to safety/scanned-PRs branch } +``` - Healthy --> Merged: πŸ‘€ maintainer merges - BugStillExists --> Closed: πŸ€– shepherd closes PR + removes AI-thinks-issue-fixed +### Safe-outputs configuration - Merged --> [*] - Closed --> [*] -``` +| Workflow | Output | Max | Key Constraints | +|---|---|---|---| +| `labelops-flake-fix.md` | `create-pull-request` | 1 | title `[LabelOps Flake] `; labels `automation, Flaky, NO_RELEASE_NOTES`; protected-files fallback-to-issue | +| `labelops-flake-fix.md` | `add-comment` | 1 | (defaults) | +| `labelops-flake-fix.md` | `create-issue` | 1 | title `[LabelOps Flake] `; labels `Flaky, automation` | +| `labelops-pr-maintenance.md` | `add-comment` | 5 | hide-older-comments | +| `labelops-pr-maintenance.md` | `push-to-pull-request-branch` | 5 | protected-files allowed | +| `labelops-pr-maintenance.md` | `add-labels` | 3 | allowed `AI-needs-CI-fix-input` | +| `labelops-pr-maintenance.md` | `dispatch-workflow` | 3 | workflows `[labelops-flake-fix]` | +| `labelops-pr-security-scan.md` | `add-labels` | 50 | allowed: 11 security labels (see Labels) | +| `labelops-pr-security-scan.md` | `add-comment` | 25 | hide-older-comments | + +--- -## PR Maintenance Lifecycle +## Group A3 β€” Code Quality Agents ```mermaid stateDiagram-v2 direction LR + state "regression-pr-shepherd" as RPS { + direction LR + [*] --> RPS_ListPRs : ⏰ schedule (every 4h) + [*] --> RPS_ListPRs : πŸ‘€ workflow_dispatch (inputs: none) + state RPS_HasEligible <> + RPS_ListPRs --> RPS_HasEligible : βš™οΈ gh pr list (label AI-Issue-Regression-PR, title filter) + RPS_HasEligible --> RPS_Noop : βš™οΈ no eligible PRs + RPS_HasEligible --> RPS_QuickTriage : βš™οΈ up to 3 eligible PRs (priority: Cat A > Cat B > Cat C) + RPS_QuickTriage --> RPS_HasFeedback : βš™οΈ quick triage: mergeable + checks + memory + state RPS_HasFeedback <> + RPS_HasFeedback --> RPS_FixFeedback : βš™οΈ new human review comments since last bot comment? => Category A + RPS_HasFeedback --> RPS_HasCIFailure : βš™οΈ no new review feedback + state RPS_HasCIFailure <> + RPS_HasCIFailure --> RPS_CITriage : βš™οΈ yes => Category B (CI failure or merge conflict) + RPS_HasCIFailure --> RPS_NextPR : βš™οΈ no => Category C (healthy, skip) + RPS_FixFeedback --> RPS_SameFixCheck : βš™οΈ check: last commit matches feedback? + state RPS_SameFixCheck <> + RPS_SameFixCheck --> RPS_NextPR : βš™οΈ already pushed (last commit matches feedback β€” skip) + RPS_SameFixCheck --> RPS_PushFix : βš™οΈ not yet pushed + RPS_PushFix --> RPS_NextPR : πŸ€– push-to-pull-request-branch + add-comment (reply to review thread) + RPS_CITriage --> RPS_B0Conflict : βš™οΈ fetch failed job logs, analyze failure type + state RPS_B0Conflict <> + RPS_B0Conflict --> RPS_RebaseResolve : βš™οΈ yes => B0: rebase + resolve (tests/ scope) + RPS_B0Conflict --> RPS_B1Infra : βš™οΈ no + state RPS_B1Infra <> + RPS_B1Infra --> RPS_Retry : βš™οΈ yes => B1: infrastructure or flaky failure β€” retry CI + RPS_B1Infra --> RPS_B2Error : βš™οΈ no + state RPS_B2Error <> + RPS_B2Error --> RPS_FixTest : βš™οΈ yes => B2: test compilation or setup error β€” fix test code + RPS_B2Error --> RPS_B3BugReproduced : βš™οΈ no + state RPS_B3BugReproduced <> + RPS_B3BugReproduced --> RPS_BugStillExists : βš™οΈ yes => B3: added test fails (bug NOT fixed) + RPS_B3BugReproduced --> RPS_B4Other : βš™οΈ no => B4: other failures (note + re-trigger) + RPS_RebaseResolve --> RPS_NextPR : πŸ€– push-to-pull-request-branch + add-comment + RPS_Retry --> RPS_NextPR : πŸ€– push empty commit or re-run workflow + RPS_FixTest --> RPS_NextPR : πŸ€– push-to-pull-request-branch + add-comment + RPS_B4Other --> RPS_NextPR : πŸ€– add-comment + RPS_BugStillExists --> RPS_NextPR : πŸ€– remove-labels AI-thinks-issue-fixed + add-comment + close PR + state RPS_MorePRs <> + RPS_NextPR --> RPS_MorePRs : βš™οΈ more PRs in batch (max 3)? + RPS_MorePRs --> RPS_QuickTriage : βš™οΈ yes (next PR) + RPS_MorePRs --> RPS_Done : βš™οΈ no + RPS_Done --> [*] : βš™οΈ run complete, update memory + RPS_Noop --> [*] : βš™οΈ noop (report-as-issue: false) + } + state "repo-assist" as RA { + direction LR + [*] --> RA_PreStep : ⏰ schedule (every 12h) + [*] --> RA_PreStep : πŸ‘€ workflow_dispatch (inputs: none) + [*] --> RA_PreStep : πŸ‘€ slash_command: /repo-assist + [*] --> RA_PreStep : πŸ‘€ reaction: eyes + RA_PreStep --> RA_ReadMemory : βš™οΈ fetch issues + PRs, compute weights, write task_selection.json + RA_ReadMemory --> RA_CommandMode : βš™οΈ read state.json from memory/repo-assist branch + state RA_CommandMode <> + RA_CommandMode --> RA_RunInstructions : πŸ‘€ instructions non-empty (slash_command or reaction with text) + RA_CommandMode --> RA_T1 : βš™οΈ instructions empty β€” non-command mode + state RA_InstructionResult <> + RA_RunInstructions --> RA_InstructionResult : βš™οΈ follow user instructions, apply guidelines + RA_InstructionResult --> [*] : βš™οΈ no actionable work => noop + RA_InstructionResult --> RA_CmdOutputs : βš™οΈ work done + RA_CmdOutputs --> [*] : πŸ€– any of repo-assist's 9 safe-outputs (see Safe-outputs below) + RA_T1 --> RA_T3 : βš™οΈ Task 1: investigate issues, add AI-thinks-issue-fixed/windows-only + RA_T3 --> RA_T2 : βš™οΈ Task 3: revisit AI-thinks-windows-only, remove-labels if FCS-testable + RA_T2 --> RA_T2_SkipCheck : βš™οΈ Task 2: Step A β€” check 6 skip conditions + state RA_T2_SkipCheck <> + RA_T2_SkipCheck --> RA_TaskFinal : βš™οΈ skip (closed|existing-PR|test-coverage|untestable|human-coverage) + RA_T2_SkipCheck --> RA_T2_CreatePR : βš™οΈ no skip condition met + state RA_T2_CreatePR + RA_T2_CreatePR --> RA_TaskFinal : πŸ€– create-pull-request (auto-merge, reviewers: abonie+T-Gro) or remove-labels + RA_TaskFinal --> RA_WriteMemory : βš™οΈ Task FINAL: update Monthly Activity Summary issue (update-issue) + RA_WriteMemory --> [*] : πŸ€– write state.json to memory/repo-assist branch, messages safe-output + } +``` - [*] --> Labelled: πŸ‘€ maintainer adds AI-Auto-Resolve-* label - - state "Maintenance Loop (⏰ 3h)" as MaintLoop { - Labelled --> ClassifyPR: πŸ€– labelops-pr-maintenance +> **`repo-assist` Task 2 β€” Step A skip conditions** (any of these β†’ no PR): +> 1. The issue is **closed** β€” someone already resolved it +> 2. A regression test PR exists (open or merged): `gh pr list --label "AI-Issue-Regression-PR" --search "{issue_number}" --state all` +> 3. The issue body or comments link to a PR that addresses it +> 4. Repo Assist already posted a test-link comment (a comment containing a GitHub permalink to a test file) +> 5. Repo Assist already posted an "untestable" explanation (Outcome 3 from a previous run) +> 6. A human already posted a comment with test coverage or a fix reference after the `AI-thinks-issue-fixed` label was applied - state classify <> - ClassifyPR --> classify - classify --> CICheck: has AI-Auto-Resolve-CI - classify --> ConflictCheck: has AI-Auto-Resolve-Conflicts +> **`repo-assist` task ordering** β€” non-sequential by design: **Task 1 β†’ Task 3 β†’ Task 2 β†’ Task FINAL**. Task 3 revisits `AI-thinks-windows-only` labels (which Task 1 may have just applied) BEFORE Task 2 attempts to write regression tests, because Task 3's findings determine whether Task 2 should skip. - CICheck --> CIHealthy: βš™οΈ all checks pass - CICheck --> CIFixable: πŸ€– labelops fixes CI - CICheck --> ProvenFlake: πŸ€– labelops detects flake - CICheck --> Escalated: πŸ€– labelops adds AI-needs-CI-fix-input +### Safe-outputs configuration - CIFixable --> Labelled: βš™οΈ CI restarts after push - ProvenFlake --> FlakeDispatched: πŸ€– labelops dispatches labelops-flake-fix - Escalated --> Blocked: πŸ‘€ maintainer needed +| Workflow | Output | Max | Key Constraints | +|---|---|---|---| +| `regression-pr-shepherd.md` | `add-comment` | 5 | hide-older-comments | +| `regression-pr-shepherd.md` | `push-to-pull-request-branch` | 10 | title `Add regression test: `; labels `AI-Issue-Regression-PR`; allowed-files `tests/**`, `vsintegration/tests/**`; protected-files fallback-to-issue | +| `regression-pr-shepherd.md` | `remove-labels` | 5 | allowed `AI-thinks-issue-fixed` | +| `repo-assist.md` | `messages` | β€” | footer, run-started, run-success, run-failure | +| `repo-assist.md` | `add-comment` | 10 | hide-older-comments | +| `repo-assist.md` | `create-pull-request` | 10 | title `Add regression test: `; labels `NO_RELEASE_NOTES, AI-Issue-Regression-PR`; reviewers abonie, T-Gro; auto-merge; allowed-files `tests/**`, `vsintegration/tests/**` | +| `repo-assist.md` | `push-to-pull-request-branch` | 4 | title `[Repo Assist] `; protected-files fallback-to-issue | +| `repo-assist.md` | `create-issue` | 4 | title `[Repo Assist] `; labels `automation, repo-assist` | +| `repo-assist.md` | `update-issue` | 1 | title `[Repo Assist] ` | +| `repo-assist.md` | `add-labels` | 30 | allowed `AI-thinks-issue-fixed, AI-thinks-windows-only` | +| `repo-assist.md` | `remove-labels` | 10 | allowed `AI-thinks-issue-fixed, AI-thinks-windows-only` | - Blocked --> Labelled: πŸ‘€ maintainer pushes fix (unblocks) +--- - CIHealthy --> ConflictCheck - ConflictCheck --> NoConflicts: merge-tree clean - ConflictCheck --> ConflictResolved: πŸ€– labelops resolves conflicts - ConflictCheck --> ConflictFailed: πŸ€– labelops cannot resolve +## Group B β€” PR & Issue Triage - ConflictResolved --> Labelled: βš™οΈ CI restarts after push - ConflictFailed --> Labelled: comment posted, no push +```mermaid +stateDiagram-v2 + direction LR + state "add_to_project" as ATP { + direction LR + [*] --> ATP_TriggerGuard : πŸ‘€ issues (opened / transferred) + [*] --> ATP_TriggerGuard : πŸ‘€ pull_request_target (opened, branches: main) + state ATP_IsIssueEvent <> + ATP_TriggerGuard --> ATP_IsIssueEvent : βš™οΈ job guard: github.event_name != 'pull_request_target'? + ATP_IsIssueEvent --> [*] : βš™οΈ false β€” all jobs skipped (pull_request_target gated off) + ATP_IsIssueEvent --> ATP_Fork : βš™οΈ true (issues event) + state ATP_Fork <> + state ATP_Join <> + ATP_Fork --> ATP_CleanupRuns : βš™οΈ cleanup_old_runs job (parallel) + ATP_Fork --> ATP_ApplyLabel : βš™οΈ apply-label job (parallel) + ATP_Fork --> ATP_ApplyMilestone : βš™οΈ apply-milestone job (parallel) + ATP_CleanupRuns --> ATP_Join : βš™οΈ gh api DELETE all completed runs for this workflow + ATP_ApplyLabel --> ATP_Join : βš™οΈ github.rest.issues.addLabels: Needs-Triage + ATP_ApplyMilestone --> ATP_Join : βš™οΈ github.rest.issues.update: milestone=29 + ATP_Join --> [*] : βš™οΈ all parallel jobs complete } - - NoConflicts --> Ready: PR is mergeable - Ready --> Merged: πŸ‘€ maintainer merges - - state "Flake Fix Spinoff" as FlakeFix { - FlakeDispatched --> FlakeVerified: πŸ€– labelops-flake-fix re-verifies - FlakeVerified --> FlakeFixPR: πŸ€– flake-fix opens fix/quarantine PR + state "check_release_notes" as CRN { + direction LR + [*] --> CRN_GetRef : πŸ‘€ pull_request_target (opened/sync/reopened/labeled/unlabeled, main, release/*) + CRN_GetRef --> CRN_Checkout : βš™οΈ actions/github-script@v3: get PR head ref + repository + CRN_Checkout --> CRN_CheckNotes : βš™οΈ actions/checkout@v2 (PR head ref, fetch-depth: 0) + CRN_CheckNotes --> CRN_FindComment : βš™οΈ check release notes entries, NO_RELEASE_NOTES opt-out + CRN_FindComment --> CRN_CommentExists : βš™οΈ find-comment (body: DO_NOT_REMOVE: release_notes_check) + state CRN_CommentExists <> + CRN_CommentExists --> CRN_CreateComment : βš™οΈ comment-id == '' (no existing bot comment) + CRN_CommentExists --> CRN_UpdateComment : βš™οΈ comment-id != '' (existing bot comment found) + CRN_CreateComment --> [*] : βš™οΈ if: comment-id == '', createComment + CRN_UpdateComment --> [*] : βš™οΈ if: comment-id != '', updateComment + } + state "repository_lockdown_check" as RLC { + direction LR + [*] --> RLC_CheckLockdown : πŸ‘€ pull_request_target (opened/synchronize/reopened, branches: main, release/*) + RLC_CheckLockdown --> RLC_FindComment : βš™οΈ check vars.LOCKDOWN (exits 1 if "true") + RLC_FindComment --> RLC_CommentExists : βš™οΈ find-comment (body: DO_NOT_REMOVE: repository_lockdown) + state RLC_CommentExists <> + RLC_CommentExists --> RLC_NewComment : βš™οΈ comment-id == '' (no existing comment) + RLC_CommentExists --> RLC_OldComment : βš™οΈ comment-id != '' (existing comment found) + state RLC_NewIsLocked <> + RLC_NewComment --> RLC_NewIsLocked : βš™οΈ if: failure()? (lockdown was active) + RLC_NewIsLocked --> RLC_CreateComment : βš™οΈ yes β€” lockdown active: create notice + RLC_NewIsLocked --> [*] : βš™οΈ no β€” lockdown not active, no comment: nothing to do + state RLC_OldIsLocked <> + RLC_OldComment --> RLC_OldIsLocked : βš™οΈ if: failure()? (lockdown still active) + RLC_OldIsLocked --> RLC_UpdateComment : βš™οΈ yes β€” still locked: update notice + RLC_OldIsLocked --> RLC_DeleteComment : βš™οΈ no β€” lockdown lifted: delete notice + RLC_CreateComment --> [*] : βš™οΈ actions/github-script@v7: createComment (lockdown caution block) + RLC_UpdateComment --> [*] : βš™οΈ actions/github-script@v7: updateComment (lockdown caution block) + RLC_DeleteComment --> [*] : βš™οΈ actions/github-script@v7: deleteComment (lockdown lifted) } - - Merged --> [*] ``` -## PR Security Scan Lifecycle +--- + +## Group C β€” Comment & Slash Commands ```mermaid stateDiagram-v2 direction LR - - [*] --> ScanQueue: ⏰ labelops-pr-security-scan (1h) - - state "Per-PR Classification" as ScanLoop { - ScanQueue --> ReadRules: πŸ€– security-scan reads repo rules - ReadRules --> CheckDraft: πŸ€– security-scan checks isDraft - - state draftcheck <> - CheckDraft --> draftcheck - draftcheck --> SkipDraft: draft PR - draftcheck --> CheckMemory: non-draft PR - - SkipDraft --> [*]: skip - - CheckMemory --> CheckMemory2: πŸ€– security-scan reads state.json - - state memcheck <> - CheckMemory2 --> memcheck - memcheck --> AlreadyScanned: sha unchanged - memcheck --> ClassifyOrigin: new or updated PR - - AlreadyScanned --> [*]: skip - - state origin <> - ClassifyOrigin --> origin - origin --> NonFork: headRepository == this repo - origin --> ForkPR: headRepository != this repo - - NonFork --> Bypassed: πŸ€– adds AI-Tooling-Check-Bypassed - ForkPR --> ReadDiff: πŸ€– reads file list + diff - - state classify <> - ReadDiff --> classify - classify --> Clean: no categories match - classify --> Flagged: β‰₯1 category matches - - Clean --> ScannedClean: πŸ€– adds AI-Tooling-Check-Scanned-Clean - Flagged --> Labelled: πŸ€– adds ⚠️ labels + comment (if changed) + state "commands" as CMD { + direction LR + [*] --> CMD_CheckAccess : πŸ‘€ issue_comment (created) + CMD_CheckAccess --> CMD_Allowed : βš™οΈ authorize_commenter job: github-script getCollaboratorPermissionLevel + state CMD_Allowed <> + CMD_Allowed --> [*] : βš™οΈ not authorized (admin/write) OR not on a PR β€” parsing_job if: guard false + CMD_Allowed --> CMD_ParseComment : βš™οΈ allowed == 'true' AND issue.pull_request + CMD_ParseComment --> CMD_HasCmd : βš™οΈ parse (/run fantomas|ilverify|xlf|test-baseline) + state CMD_HasCmd <> + CMD_HasCmd --> [*] : βš™οΈ no recognized command β€” run-parsed-command if: guard false + CMD_HasCmd --> CMD_Checkout1 : βš™οΈ command recognized + CMD_Checkout1 --> CMD_CheckoutPR1 : βš™οΈ actions/checkout@v4 + CMD_CheckoutPR1 --> CMD_InstallDotnet : βš™οΈ gh auth setup-git && gh pr checkout + CMD_InstallDotnet --> CMD_InstallTools : βš™οΈ actions/setup-dotnet@v3 (global-json-file) + CMD_InstallTools --> CMD_IsTestBaseline : βš™οΈ dotnet tool restore + state CMD_IsTestBaseline <> + CMD_IsTestBaseline --> CMD_SetupRuntime : βš™οΈ command == '/run test-baseline' β€” actions/setup-dotnet@v4 (9.0.x) + CMD_IsTestBaseline --> CMD_RunCmd : βš™οΈ other command β€” skip .NET 9 runtime setup + CMD_SetupRuntime --> CMD_RunCmd : βš™οΈ .NET 9.0.x ready + CMD_RunCmd --> CMD_CreatePatch : βš™οΈ run command (continue-on-error): fantomas/xlf/ilverify/test-baseline + CMD_CreatePatch --> CMD_UploadArtifacts : βš™οΈ if: command, write run_step_outcome + hasPatch to result file + state "job boundary β€” cli-results artifact" as CMD_JobBoundary + CMD_UploadArtifacts --> CMD_JobBoundary : βš™οΈ upload-artifact@v4 (cli-results) + CMD_JobBoundary --> CMD_RunSucceeded : βš™οΈ apply-and-report job starts + state CMD_RunSucceeded <> + CMD_RunSucceeded --> [*] : βš™οΈ run-parsed-command.result != 'success' β€” apply-and-report if: guard false + CMD_RunSucceeded --> CMD_Checkout2 : βš™οΈ command != '' AND result == 'success' + CMD_Checkout2 --> CMD_CheckoutPR2 : βš™οΈ actions/checkout@v4 + CMD_CheckoutPR2 --> CMD_Download : βš™οΈ gh pr checkout + CMD_Download --> CMD_ReadMeta : βš™οΈ actions/download-artifact@v4 (cli-results) + CMD_ReadMeta --> CMD_HasPatch : βš™οΈ read run_step_outcome + hasPatch from result file + state CMD_HasPatch <> + CMD_HasPatch --> CMD_ValidatePaths : βš™οΈ outcome == 'success' AND hasPatch == 'true' + CMD_HasPatch --> CMD_GenReport : βš™οΈ outcome != 'success' OR hasPatch == 'false' + CMD_ValidatePaths --> CMD_ApplyPush : βš™οΈ outcome==success && hasPatch: validate paths (src/ tests/ vsintegration/) + CMD_ApplyPush --> CMD_CountStats : βš™οΈ outcome==success && hasPatch: patch + commit + push + CMD_CountStats --> CMD_GenReport : βš™οΈ outcome==success && hasPatch: git diff stats + CMD_GenReport --> CMD_CommentPR : βš™οΈ if: always, build markdown report => GITHUB_STEP_SUMMARY + pr_report.md + CMD_CommentPR --> [*] : βš™οΈ if: always, gh pr comment --body-file pr_report.md + } + state "backport" as BKP { + direction LR + [*] --> BKP_TriggerGuard : πŸ‘€ issue_comment (created) + [*] --> BKP_TriggerGuard : ⏰ schedule (cron: 0 13 * * *) + state BKP_ShouldRun <> + BKP_TriggerGuard --> BKP_ShouldRun : βš™οΈ if: contains(comment.body, '/backport to') OR event_name == 'schedule' + BKP_ShouldRun --> [*] : βš™οΈ false β€” comment does not contain /backport to (and not schedule run) + BKP_ShouldRun --> BKP_Backport : βš™οΈ true + BKP_Backport --> [*] : πŸ€– uses: dotnet/arcade backport-base.yml@main (reusable workflow) } - - Bypassed --> [*] - ScannedClean --> [*] - Labelled --> [*]: πŸ‘€ maintainer reviews flagged areas ``` -## Infrastructure Lifecycle +> **`/run` commands** β€” available via PR comment by users with admin/write access: +> +> | Command | Tool | Effect | +> |---|---|---| +> | `/run fantomas` | F# code formatter | applies `dotnet fantomas .` to the repo, commits the diff and pushes to the PR branch | +> | `/run ilverify` | IL verifier | runs `pwsh tests/ILVerify/ilverify.ps1`, reports failures; no push on failure | +> | `/run xlf` | localization tool | runs `dotnet build src/Compiler /t:UpdateXlf`, regenerates `.xlf` resource files, commits and pushes | +> | `/run test-baseline` | baseline test updater | runs `dotnet test` with `TEST_UPDATE_BSL=1` for the given filter, commits new `.bsl` files and pushes | + +--- + +## Group D β€” Push, Validation & Tooling ```mermaid stateDiagram-v2 direction LR + state "branch-merge" as BM { + direction LR + [*] --> BM_Merge : βš™οΈ push to main + [*] --> BM_Merge : βš™οΈ push to release/* + BM_Merge --> [*] : πŸ€– uses: dotnet/arcade inter-branch-merge-base.yml@main + } + state "skill-validation" as SV { + direction LR + [*] --> SV_Checkout : πŸ‘€ pull_request (paths: .github/skills/**, .github/agents/**) + [*] --> SV_Checkout : βš™οΈ push (branches: main, same paths) + [*] --> SV_Checkout : πŸ‘€ workflow_dispatch (inputs: none) + SV_Checkout --> SV_Download : βš™οΈ actions/checkout@v4 (sparse-checkout: .github/skills, .github/agents) + SV_Download --> SV_RunCheck : βš™οΈ curl download skill-validator binary (dotnet/skills nightly release) + SV_RunCheck --> [*] : βš™οΈ skill-validator check --skills --agents, write GITHUB_STEP_SUMMARY + } + state "copilot-setup-steps" as CSS { + direction LR + [*] --> CSS_Checkout : πŸ‘€ workflow_dispatch (inputs: none) + CSS_Checkout --> CSS_SetupDotnet : βš™οΈ actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 (v4.2.2) + CSS_SetupDotnet --> CSS_RestoreCompiler : βš™οΈ actions/setup-dotnet@v4 (global-json-file: global.json) + CSS_RestoreCompiler --> CSS_RestoreTools : βš™οΈ ./build.sh -c Release --verbosity quiet || true + CSS_RestoreTools --> [*] : βš™οΈ dotnet tool restore + } +``` - [*] --> CheckUpdate: ⏰ aw-auto-update (24h) - - state check <> - CheckUpdate --> check - check --> UpToDate: no changes detected - check --> ChangesDetected: gh aw upgrade produced diff +--- - UpToDate --> [*]: πŸ€– aw-auto-update noops +## Labels - ChangesDetected --> DedupeCheck: πŸ€– checks for existing PR/session +All labels in one place β€” who adds, removes, or reads each. **Cross-workflow flows** (e.g., `AI-Issue-Regression-PR` from `repo-assist` β†’ consumed by `regression-pr-shepherd`) are visible here. - state dedup <> - DedupeCheck --> dedup - dedup --> AlreadyOpen: open PR or session exists - dedup --> Delegate: no existing PR/session +| Label | Type | Added by | Removed by | Read by | Notes | +|---|---|---|---|---|---| +| `automation` | always-applied | ASM, LFF (PR), RA (issue) | β€” | β€” | via safe-output `labels:` | +| `NO_RELEASE_NOTES` | always-applied | ASM, LFF, RA (PR) | β€” | check_release_notes | via safe-output `labels:` | +| `Flaky` | always-applied | LFF (PR + issue) | β€” | β€” | via safe-output `labels:` | +| `AI-Issue-Regression-PR` | always-applied | RA (PR), RPS (push-to-PR-branch) | β€” | RPS (selects PRs) | **cross-workflow signal RA β†’ RPS** | +| `repo-assist` | always-applied | RA (issue) | β€” | β€” | via safe-output `labels:` | +| `AI-needs-CI-fix-input` | agent-add | LPM | β€” | β€” | escalation flag | +| `AI-Tooling-Check-Scanned-Clean` | agent-add | LPSS | β€” | β€” | per-PR scan outcome | +| `AI-Tooling-Check-Bypassed` | agent-add | LPSS | β€” | β€” | per-PR scan outcome | +| `⚠️ Affects-*` family (7) | agent-add | LPSS | β€” | β€” | Build-Infra, Compiler-Output, Bootstrap, Restore, Design-Time, Test-Tooling, Agent-Config | +| `⚠️ Suspicious-Prompting`, `⚠️ Scope-Review-Needed` | agent-add | LPSS | β€” | β€” | review-trigger flags | +| `AI-thinks-issue-fixed` | agent-add + agent-remove | RA | RPS, RA | β€” | **bidirectional** β€” RA proposes, RPS/RA retract | +| `AI-thinks-windows-only` | agent-add + agent-remove | RA | RA | β€” | RA self-corrects in Task 3 | +| `AI-Auto-Resolve-CI`, `AI-Auto-Resolve-Conflicts` | filter (read-only) | β€” (external) | β€” | LPM (`gh pr list --search label:...`) | **selection signal into LPM** | +| `Needs-Triage` | imperative | ATP (`apply-label` job, github-script) | β€” | β€” | classic workflow, not gh-aw | - AlreadyOpen --> [*]: πŸ€– aw-auto-update noops - Delegate --> AgentSession: πŸ€– aw-auto-update creates agent-session +--- - AgentSession --> WaitReview: πŸ€– Copilot Coding Agent opens PR - WaitReview --> Merged: πŸ‘€ maintainer reviews + merges - Merged --> [*] -``` +## Handover Map -## Label Dictionary - -| Label | Applied By | Read By | Meaning | -|-------|-----------|---------|---------| -| `AI-thinks-issue-fixed` | πŸ€– repo-assist | πŸ€– repo-assist, πŸ€– regression-pr-shepherd | Issue appears fixed; needs regression test verification | -| `AI-thinks-windows-only` | πŸ€– repo-assist | πŸ€– repo-assist | Issue requires Windows/VS to reproduce (may be reassessed) | -| `AI-Auto-Resolve-CI` | πŸ‘€ maintainer | πŸ€– labelops-pr-maintenance | Opt-in: agent should fix CI failures on this PR | -| `AI-Auto-Resolve-Conflicts` | πŸ‘€ maintainer | πŸ€– labelops-pr-maintenance | Opt-in: agent should resolve merge conflicts on this PR | -| `AI-needs-CI-fix-input` | πŸ€– labelops-pr-maintenance | πŸ€– labelops-pr-maintenance, πŸ‘€ maintainer | CI failure requires human intervention | -| `AI-Issue-Regression-PR` | πŸ€– repo-assist | πŸ€– regression-pr-shepherd, πŸ€– labelops-pr-maintenance (exclude) | PR is a regression test created by repo-assist | -| `Flaky` | πŸ€– labelops-flake-fix | πŸ‘€ maintainer | Test identified as non-deterministic | -| `AI-Tooling-Check-Scanned-Clean` | πŸ€– labelops-pr-security-scan | πŸ‘€ maintainer | Fork PR scanned, no safety concerns found | -| `AI-Tooling-Check-Bypassed` | πŸ€– labelops-pr-security-scan | πŸ‘€ maintainer | Non-fork PR, scan bypassed (trusted origin) | -| `⚠️ Affects-Build-Infra` | πŸ€– labelops-pr-security-scan | πŸ‘€ maintainer | PR modifies build infrastructure | -| `⚠️ Affects-Compiler-Output` | πŸ€– labelops-pr-security-scan | πŸ‘€ maintainer | PR affects compiler output | -| `⚠️ Affects-Bootstrap` | πŸ€– labelops-pr-security-scan | πŸ‘€ maintainer | PR affects bootstrap process | -| `⚠️ Affects-Restore` | πŸ€– labelops-pr-security-scan | πŸ‘€ maintainer | PR modifies restore/package resolution | -| `⚠️ Affects-Design-Time` | πŸ€– labelops-pr-security-scan | πŸ‘€ maintainer | PR affects design-time behavior | -| `⚠️ Affects-Test-Tooling` | πŸ€– labelops-pr-security-scan | πŸ‘€ maintainer | PR modifies test tooling | -| `⚠️ Affects-Agent-Config` | πŸ€– labelops-pr-security-scan | πŸ‘€ maintainer | PR modifies AI agent configuration | -| `⚠️ Suspicious-Prompting` | πŸ€– labelops-pr-security-scan | πŸ‘€ maintainer | PR contains prompt injection patterns | -| `⚠️ Scope-Review-Needed` | πŸ€– labelops-pr-security-scan | πŸ‘€ maintainer | PR diff exceeds stated scope | -| `automation` | πŸ€– aw-auto-update, πŸ€– labelops-flake-fix | πŸ‘€ maintainer | PR was created by automation | -| `NO_RELEASE_NOTES` | πŸ€– repo-assist, πŸ€– labelops-flake-fix | βš™οΈ CI | PR does not need release notes entry | -| `repo-assist` | πŸ€– repo-assist | πŸ€– repo-assist | Issue is managed by repo-assist (monthly summary) | +| Source Workflow | Signal | Target | Mechanism | Notes | +|---|---|---|---|---| +| `labelops-pr-maintenance.md` | Proven flake (flaky-test-detector β‰₯3 distinct unrelated PRs; test not introduced by current PR) | `labelops-flake-fix.md` | `dispatch-workflow: workflows: [labelops-flake-fix]` | Passes inputs; max 3/run | +| `aw-auto-update.md` | CHANGED_FILES non-empty after `gh aw upgrade + compile` | Copilot Coding Agent (CCA) | `create-agent-session` (base: main, max: 1) | CCA writes `.lock.yml` files using `COPILOT_GITHUB_TOKEN` | +| `agentic-state-machine.md` | State-machine doc changed | PR reviewer (human) | `create-pull-request` (labels: automation, NO_RELEASE_NOTES; allowed-files: .github/docs/**) | Writes `.github/docs/state-machine.md` | +| `repo-assist.md` | Regression test PR created (Task 2) | `regression-pr-shepherd.md` | Indirect via label `AI-Issue-Regression-PR` on PR | Shepherd picks up in subsequent scheduled run | +| `commands.yml` | `/run ` approved PR comment | PR branch | `git push origin HEAD:branch` (direct write) | Requires commenter admin/write access | +| `backport.yml` | `/backport to ` PR comment | `dotnet/arcade` backport-base.yml | `uses: dotnet/arcade/.github/workflows/backport-base.yml@main` | Reusable workflow; schedule trigger only cleans old runs | +| `branch-merge.yml` | Push to `release/*` or `main` | `dotnet/arcade` inter-branch-merge-base.yml | `uses: dotnet/arcade/.github/workflows/inter-branch-merge-base.yml@main` | Config: `.config/service-branch-merge.json` | -## Handover Map +--- -| From | To | Trigger | Mechanism | -|------|----|---------|-----------| -| πŸ€– repo-assist | πŸ€– regression-pr-shepherd | PR created with `AI-Issue-Regression-PR` label | Label-based pickup (⏰ 4h) | -| πŸ€– labelops-pr-maintenance | πŸ€– labelops-flake-fix | Proven flake detected (β‰₯3 PRs) | `dispatch-workflow` | -| πŸ€– regression-pr-shepherd | πŸ‘€ maintainer | PR is healthy (CI green, no feedback) | PR ready for review | -| πŸ€– regression-pr-shepherd | πŸ‘€ maintainer | Bug still exists (Category B3) | Comment + close PR + remove label | -| πŸ€– labelops-pr-maintenance | πŸ‘€ maintainer | CI unfixable | `AI-needs-CI-fix-input` label + escalation comment | -| πŸ‘€ maintainer | πŸ€– labelops-pr-maintenance | Adds `AI-Auto-Resolve-*` label to PR | Label-based pickup (⏰ 3h) | -| πŸ‘€ maintainer | πŸ€– repo-assist | `/repo-assist ` | Slash command | -| ⏰ scheduler | πŸ€– repo-assist | Every 12h | Cron schedule | -| ⏰ scheduler | πŸ€– labelops-pr-maintenance | Every 3h | Cron schedule | -| ⏰ scheduler | πŸ€– regression-pr-shepherd | Every 4h | Cron schedule | -| ⏰ scheduler | πŸ€– labelops-pr-security-scan | Every 1h | Cron schedule | -| ⏰ scheduler | πŸ€– aw-auto-update | Every 24h | Cron schedule | -| πŸ€– aw-auto-update | πŸ€– Copilot Coding Agent | Changes detected | `create-agent-session` safe output | -| πŸ€– repo-assist | πŸ€– repo-assist | Own PR has CI failure or conflicts | `push-to-pull-request-branch` (self-heal) | -| πŸ€– labelops-flake-fix | πŸ€– labelops-pr-maintenance | Fix PR created | Originating PR comment posted | - - + diff --git a/.github/workflows/agentic-state-machine.md b/.github/workflows/agentic-state-machine.md index 9fc491bba4b..8c655f39ca9 100644 --- a/.github/workflows/agentic-state-machine.md +++ b/.github/workflows/agentic-state-machine.md @@ -35,40 +35,526 @@ safe-outputs: # Agentic State Machine β€” Diagram Generator -You read all agentic workflow `.md` files in `.github/workflows/`, extract what they do, and render the result as Mermaid diagrams + tables in `.github/docs/state-machine.md`. +You are a workflow-automation documentor. You read all workflow files in `.github/workflows/`, build a structured model of their interactions, validate it adversarially, and render the result as Mermaid diagrams + tables in `.github/docs/state-machine.md`. Precision over prose. -1. Read ALL `.md` files in `.github/workflows/` except `shared/`, `docs/`, and `agentic-state-machine.md` (this file). -2. If `.github/docs/state-machine.md` exists, read it. Compare source hashes in the `` footer against current files (use `sha256sum`). If unchanged β†’ `noop`. If changed β†’ update incrementally, minimal diff. -3. Every transition edge must label its actor: πŸ‘€ human, πŸ€– agent-name, βš™οΈ CI, ⏰ scheduler. -4. Do not hardcode sections for "issues" or "PRs". Discover what lifecycle groups exist from the workflows themselves. A workflow that maintains files/branches is its own group. + +## Extraction rules +1. Read every file listed in `/tmp/workflow-manifest.txt` (built by the pre-step). Read each in FULL. **Both `.yml` AND `.md` files are workflows.** Agentic `.md` files (gh-aw) have YAML frontmatter between `---` markers defining triggers (`on:`, `schedule:`, `workflow_dispatch:`), `safe-outputs:`, `tools:`, and `labels:`. They ARE workflows and MUST be documented. **`copilot-setup-steps.yml` is a valid workflow file** β€” never exclude it unless it's in `EXCLUDE_FROM_DOCS`. +2. **NEVER INFER. NEVER HALLUCINATE.** Only document what is explicitly in source. Cite the YAML field or line. + **Workflow filenames are NOT evidence of behavior.** Only YAML content (jobs, steps, `run:`, `uses:`, `safe-outputs:`) defines what happens. Before writing any behavior into a workflow's row, point to the exact job/step/line. If you cannot cite it β†’ remove it. + **Shell commands: describe what the code DOES, not what you think it means.** A `gh` CLI search filter like `updated:>=` is a date filter, not a fork check. A `grep -q` is a string match, not a validation gate. Read the actual command arguments before characterizing behavior. + **Anti-hallucination checklist (MANDATORY per workflow).** After drafting each workflow's section, verify EACH stated detail: (a) tool/command names β€” re-read the `run:` or `uses:` line; (b) branch/path filters β€” re-read the `on:` block; (c) guard conditions β€” re-read the `if:` line; (d) input names β€” re-read the `inputs:` block; (e) concurrency β€” re-read for `concurrency:` key. If ANY detail cannot be found in source β†’ DELETE it. Common hallucinations: inventing install steps, wrong script names, wrong branch patterns, inventing concurrency blocks, wrong bot names. +3. **`labels:` = always applied. `allowed-labels:` = agent may choose.** For traditional `.yml` workflows, labels applied imperatively (e.g., `addLabels()`, `actions/labeler`) are **imperative labels**. gh-aw `labels:` under safe-outputs are NEVER imperative β€” they are always-applied (the engine applies them automatically). +4. **Trigger configuration β‰  inputs.** Fields like `slash_command:`, `reaction:`, `schedule: cron:` are trigger config. Only `workflow_dispatch: inputs:` defines formal inputs. + +## Modeling rules +5. **Scope of `if:`.** (a) workflow-level β†’ entire run skipped; (b) job-level β†’ job skipped, workflow runs; (c) step-level β†’ step skipped, job continues. **Shell-internal branches** are NOT step-level `if:`. + **Message-only variables β‰  control-flow gates.** A variable that only changes message text, annotations, or exit codes does NOT change control flow. Unless it appears in an `if:` guard β†’ do NOT model as `<>`. + - ❌ WRONG β€” opt-out flag as separate bypass gate: + ``` + state OptOutChoice <> + RunScript --> OptOutChoice : βš™οΈ if: OPT_OUT + OptOutChoice --> Pass : βš™οΈ true + OptOutChoice --> Fail : βš™οΈ false + ``` + - βœ… CORRECT β€” ONE pass/fail `<>` with opt-out as predicate conjunct: + ``` + RunScript --> PassFail <> + PassFail --> Pass : βš™οΈ check passed ∨ OPT_OUT + PassFail --> Fail : βš™οΈ check failed ∧ Β¬OPT_OUT + ``` + Apply to every "opt-out/suppress-exit-code" flag: fold into the pass/fail predicate, not a separate gate. + +6. **`<>`** = ONE `if:` with MUTUALLY EXCLUSIVE outcomes and β‰₯2 edges. Checklist: (Q1) single `if:`? (Q2) mutually exclusive? (Q3) β‰₯2 edges? If ANY is no β†’ don't use `<>`. + **HARD RULE: `<>` = exactly 2 outgoing edges (true/false).** A choice with 3+ edges is ALWAYS wrong. Decompose into nested binary `<>` nodes. + - ❌ WRONG β€” 3-way choice (schedule/PR/manual): + ``` + state TriggerChoice <> + [*] --> TriggerChoice + TriggerChoice --> SchedulePath : ⏰ schedule + TriggerChoice --> PRPath : πŸ‘€ pull_request + TriggerChoice --> ManualPath : πŸ‘€ workflow_dispatch + ``` + - βœ… CORRECT β€” nested binary: + ``` + state IsScheduled <> + state IsManual <> + [*] --> IsScheduled : βš™οΈ event routing + IsScheduled --> SchedulePath : ⏰ schedule + IsScheduled --> IsManual : βš™οΈ not schedule + IsManual --> ManualPath : πŸ‘€ workflow_dispatch + IsManual --> PRPath : πŸ‘€ pull_request + ``` + **Anti-patterns:** Boolean `if:` with 3+ edges (ALWAYS decompose into nested binary splits). Two independent conditions β†’ sequential `<>` nodes. Duplicate predicate labels β†’ broken Q2. **Two independent sequential `if:` steps (e.g., "Setup Xcode if macOS" then "Setup Java if needed") are NOT one `<>` β€” model them as sequential states, each with its own binary `<>`.** + **NO categorical exceptions.** Even N mutually exclusive categories MUST be decomposed into nested binary choices. 3 categories = 2 nested choices. 4 categories = 3 nested choices. This is non-negotiable. + **Edge-count audit (MANDATORY per choice):** Count outgoing edges from EVERY `<>`. If count β‰  2 β†’ ERROR. Fix before proceeding. count=1 β†’ missing false-path. countβ‰₯3 β†’ decompose into nested binary splits. + **`<>` is MANDATORY.** Zero `<>` nodes in the entire document = you missed binary branches. + **`<>`/`<>` for OVERLAPPING guards.** If Q2 fails (guards are NOT mutually exclusive β€” multiple branches can fire in the same run), use `<>`/`<>` instead. This includes: (a) dispatch inputs where one value triggers BOTH lanes (e.g., `type=Both` runs Issues AND Pulls), (b) safe-outputs that can ALL fire in the same run (not alternatives), (c) matrix fan-out to parallel jobs, (d) **cross-workflow concurrency** β€” multiple independent workflows subscribing to the same event shown in a shared lifecycle diagram (they ALL fire, not one-of), (e) **independent `jobs:` with no `needs:`** β€” if two jobs have no dependency between them, they run in parallel β†’ `<>`/`<>`. Self-test: "can branches A AND B both execute in one run?" Yes β†’ `<>`, not `<>`. + **`<>`/`<>` audit (MANDATORY).** After drafting, find every workflow with β‰₯2 independent jobs (no `needs:` between them). Each MUST use `<>`/`<>`. Also find every matrix strategy β€” each MUST use `<>`/`<>` with one edge per matrix leg (NOT a single-branch degenerate fork). A `<>` with only 1 outgoing edge is ALWAYS wrong β€” either add the missing parallel branch or remove the fork/join entirely. Missing fork/join = ERROR. Degenerate single-branch fork = ERROR. + **Per-item exclusivity β‰  parallelism.** If a loop processes items one-at-a-time and EACH item takes exactly one of N exclusive actions (e.g., close XOR warn per PR), that's `<>` per item, NOT `<>`. Self-test: "within ONE iteration, can both actions fire?" No β†’ `<>`. + - ❌ WRONG β€” overlapping guards as `<>`: + ``` + state DispatchType <> + Entry --> DispatchType + DispatchType --> IssuesLane : βš™οΈ type ∈ {Both, Issues} + DispatchType --> PullsLane : βš™οΈ type ∈ {Both, Pulls} + ``` + - βœ… CORRECT β€” overlapping guards as `<>`/`<>`: + ``` + state DispatchFork <> + state DispatchJoin <> + Entry --> DispatchFork + DispatchFork --> IssuesLane : βš™οΈ type ∈ {Both, Issues} + DispatchFork --> PullsLane : βš™οΈ type ∈ {Both, Pulls} + IssuesLane --> DispatchJoin + PullsLane --> DispatchJoin + ``` + **Syntax β€” copy this shape:** + ``` + state MyChoice <> + PrevState --> MyChoice : βš™οΈ if: + MyChoice --> TrueBranch : βš™οΈ true + MyChoice --> FalseBranch : βš™οΈ false + ``` + +7. **Two workflows on same trigger = parallel.** Each gets its own entry arrow and skip branch. +8. **Independent triggers = parallel.** Removing A wouldn't prevent B β†’ parallel arrows, not Aβ†’B. + **`success()`/`failure()` = SINGLE `<>`.** Not fan-out to two sibling choices. **No re-splitting** downstream on the same condition. + **`if: success() || failure()` = unconditional continuation.** Steps guarded by `success() || failure()` run regardless of prior step outcome β€” they are NOT skipped on failure. Model them as sequential continuation, not as part of a pass/fail branch. + - ❌ WRONG β€” cleanup step ends at pass/fail: + ``` + state Result <> + RunCheck --> Result : βš™οΈ if: exit code + Result --> Pass : βš™οΈ pass + Result --> Fail : βš™οΈ fail + Pass --> [*] + Fail --> [*] + ``` + - βœ… CORRECT β€” `if: success() || failure()` step continues after both branches: + ``` + state Result <> + RunCheck --> Result : βš™οΈ if: exit code + Result --> Pass : βš™οΈ pass + Result --> Fail : βš™οΈ fail + Pass --> Cleanup : βš™οΈ if: success() || failure() + Fail --> Cleanup : βš™οΈ if: success() || failure() + Cleanup --> [*] + ``` + **Fan-out from non-choice is FORBIDDEN unless truly parallel.** If a non-`<>` state has 2+ outgoing edges with conditions, it MUST be converted to `<>`. Self-test: "if I removed one edge, would the other still fire?" No β†’ it's a `<>`, declare it. + - ❌ WRONG (sequential masquerading as parallel fan-out): + ``` + PerItem --> DedupCheck : βš™οΈ check + PerItem --> NextStep : βš™οΈ continue + ``` + - βœ… CORRECT (chain): + ``` + PerItem --> DedupCheck <> + DedupCheck --> NextStep : βš™οΈ skip + DedupCheck --> Action : βš™οΈ proceed + Action --> NextStep + ``` + +9. **Shared guards.** Job-level guard on ALL events β†’ annotate ONCE on entry, not per sub-state. +10. **`workflow_dispatch`** always from `[*]`. Every workflow with `workflow_dispatch` MUST have a `[*]` entry arrow in some diagram. Cross-workflow dispatch = handover annotation (`note right of`), NOT inline transition. +11. **GITHUB_TOKEN suppression.** Default token fires NO events. **PAT/custom-token exception:** inspect `engine.env`, `github-token:` inputs, step-level `GH_TOKEN` for overrides β€” those DO fire events. +12. **Document ALL branches.** One `[*]` entry arrow PER filter value (never collapsed). Both true-path AND false-path for every `if:`. +13. **All event types** matter. All `types:` entries must be documented. +14. **Dual-scope** workflows (issues + PRs) appear in both lifecycles. +15. **All `if:` guards** on edges. Fork, repo, role guards. +16. **No dangling states.** Every state β†’ β‰₯1 outgoing edge or `[*]`. Scan after drawing. + **No orphan states.** Every non-`[*]` state needs β‰₯1 incoming edge. Unreachable = wiring error. + **Guard completeness.** Every `if:` guard = binary (true/false). A guard with only ONE outgoing edge (the true-path) is ALWAYS missing its false-path (β†’ next step or β†’ `[*]`). Self-test after drawing: count outgoing edges from every guarded transition. count=1 β†’ add the else path. This includes `needs:` job dependencies with conditional guards. +17. **Internal consistency.** Cross-references verifiable across overview, diagrams, dictionary, handover map. + **Count audit (MANDATORY β€” use bash).** Before emitting, run these verification commands: + (a) `grep -c '^=== FILE:' /tmp/workflow-manifest.txt` β€” must equal your stated workflow total. + (b) For EACH workflow with N stated steps: `grep -c '^\s*- name:' ` (.yml) or count step-level headings (.md). Must equal diagram states. The `STEP-COUNT` field in the manifest is ground truth. + (c) For label lists: enumerate source entries ONE BY ONE, then count. Must equal stated "N labels". + (d) If ANY mismatch β†’ fix. Do NOT estimate β€” compute. + **Classification consistency.** A label classified as "always-applied" in the dictionary MUST be "always-applied" everywhere (overview, diagrams). Never mix classifications across sections. +18. **Citation line precision.** For gh-aw `.md` files, the YAML frontmatter between `---` markers offsets all subsequent line numbers. After writing citations for an `.md` file, spot-check 3 by re-reading the source line β€” if off by β‰₯2 β†’ systematic offset error; recount ALL citations for that file. For safe-outputs blocks, cite the YAML config line where the key is declared, NOT the prose description section. +19. **Safeguard inventory.** For EVERY workflow with β‰₯3 `if:` guards or conditions, list ALL safeguards from source (cooldown timers, staleness checks, dedup checks, rate limits, threshold gates, age filters, budget caps, exclusion lists, fail-closed defaults). Each MUST appear in the diagram. Missing safeguard = HIGH error. +20. **Behavioral completeness.** Model all safeguards, memory ops, dedup checks, cooldown guards, time-based filters. + **Multi-conjunct safeguards.** Show EVERY conjunct. + **Dedup gates BEFORE actions β€” NEVER AFTER.** For every push/dispatch/create action, the dedup `<>` MUST appear UPSTREAM. Self-test: trace from `[*]` to the action β€” do you pass through a dedup gate? No β†’ error. + - ❌ WRONG (dedup after action): `Classify --> FixAction --> DedupCheck <>` + - βœ… CORRECT (dedup before action): `Classify --> DedupCheck <> --> FixAction` + **Path-universality.** "every"/"all"/"always" β†’ ALL paths. + **Completeness audit.** For each workflow, list every action (dispatch, label, push, comment, memory-write). Each must appear in diagram. +21. **Label read-only verification.** Grep ALL source files before classifying any label as read-only. +22. **No pipeline collapse.** N source steps = N diagram states. Source order is law. + **Pipeline continuation.** After branching, ALL non-exit branches MUST continue to the next pipeline step. A branch terminates early ONLY if source explicitly says "exit/stop/return." Dispatching, labeling, posting a comment, or skipping is NOT an implicit exit β€” the loop/pipeline continues to the next step. + **`core.setFailed()` / `process.exit(1)` is NOT terminal** if a downstream step has `if: failure()` or `if: success() || failure()`. `setFailed` sets the step outcome to `failure`, which makes `failure()` evaluate true for subsequent steps. Trace forward: if ANY later step in the same job has `if: failure()` or `if: always()` β†’ the `setFailed` step MUST route to it, not to `[*]`. + - ❌ WRONG β€” non-exit branches stop: + ``` + state ActionChoice <> + Step2 --> ActionChoice + ActionChoice --> DispatchHelper : βš™οΈ match + ActionChoice --> SkipAction : βš™οΈ no match + DispatchHelper --> [*] + SkipAction --> [*] + ``` + - βœ… CORRECT β€” non-exit branches continue: + ``` + state ActionChoice <> + Step2 --> ActionChoice + ActionChoice --> DispatchHelper : βš™οΈ match + ActionChoice --> SkipAction : βš™οΈ no match + DispatchHelper --> Step3 + SkipAction --> Step3 + Step3 --> [*] + ``` + **Loop-scope fidelity.** Global pre-loop steps outside the loop. Per-item steps inside. Never swap. + **Step-count audit.** Count source steps β†’ count diagram states. Mismatch = missing or collapsed step. + +23. **Correctness over completeness.** It is ALWAYS better to exclude a workflow (with an explicit `⚠️ Excluded β€” too complex for accurate automated documentation`) than to document it incorrectly. If you cannot verify every step, action, and safeguard for a workflow from source β†’ exclude it. An omitted workflow is a zero-error workflow. A partially-documented workflow with missing safeguards = HIGH errors. + +24. **Complex workflow deep-dive (STEP-COUNT > 30 or > 5 jobs).** For any workflow exceeding this threshold: + (a) Run bash: `grep -n '^\s*- name:' ` to get the COMPLETE ordered step list with line numbers. Store in `/tmp/-steps.txt`. + (b) Run bash: `grep -n 'if:' ` to get ALL guards. Store in `/tmp/-guards.txt`. + (c) Build a **per-job checklist** from these extractions: `JOB β†’ [step1 (Lnn), step2 (Lnn), ...]` with guards annotated. + (d) Model the workflow from this checklist β€” NOT from memory of reading the file. Every checklist entry MUST appear as a diagram state. + (e) After drafting, diff the checklist against diagram states. Missing step β†’ add it or exclude the entire workflow per Rule 23. + (f) For each step with an `if:` guard: the guard MUST appear on an edge or as a `<>`. Missing guard β†’ add it or exclude the workflow. + +25. **All trigger entry paths.** For EVERY trigger declared in the `on:` block, the diagram MUST have a corresponding entry path from `[*]`. If `on: [pull_request_target, schedule, workflow_dispatch]` β†’ 3 distinct `[*] -->` entries (one per trigger type). Missing trigger entry = MAJOR. + **Dead-trigger exception.** If ALL jobs are gated off for a specific trigger (e.g., every job has `if: github.event_name != 'pull_request_target'`), that trigger has NO effective entry path β€” either omit it entirely or annotate as `⚠️ trigger fires but all jobs gated off`. Never model transitions from a dead trigger into job/step states. + **Event subtypes.** `types: [opened, synchronize, labeled]` β€” if all subtypes route identically, a single entry with combined annotation is acceptable. If any subtype routes differently (different guards downstream) β†’ separate entries. + If you cannot model a trigger path accurately β†’ exclude the entire workflow per Rule 23. + +26. **Job-level guards on diagrams.** Every `jobs..if:` guard MUST appear in the diagram β€” either as a `<>` node gating the job's states, or as a guard label on the entry edge. Job-level guards determine whether ANY step in the job runs. Missing job-level guard = HIGH error (allows unreachable transitions). + **Common job-level guards (MUST document):** `github.repository == 'dotnet/'`, `github.repository_owner == 'dotnet'`, `!github.event.repository.fork`, actor/bot checks (`github.actor == 'dotnet-maestro[bot]'`). These prevent fork/external execution and MUST appear as entry guards. + **`needs:` = sequential dependency.** `jobs.B.needs: [A]` means B waits for A. Model as `A --> B`, NOT as parallel `<>` branches. `<>` implies concurrent start; `needs:` is the opposite. + +28. **Actor prefixes are MANDATORY on EVERY edge β€” ZERO EXCEPTIONS.** πŸ‘€ = human-initiated (manual dispatch, PR open, issue label, slash command), πŸ€– = bot/agent action, βš™οΈ = workflow engine (job conditions, step logic, push events), ⏰ = cron/schedule. An edge without a prefix is ALWAYS wrong. **This is the #1 most common error.** After drafting EVERY diagram, scan EVERY `-->` line. If the label after `:` does not start with one of πŸ‘€πŸ€–βš™οΈβ° β†’ add it NOW. Entry arrows from `[*]` MUST also have prefixes (πŸ‘€ for PR/issue/comment triggers, ⏰ for schedule, πŸ‘€ for workflow_dispatch). + +29. **Workflow_dispatch inputs MUST be enumerated β€” OR explicitly stated as "none".** For every workflow with `workflow_dispatch:`, check for an `inputs:` block. If `inputs:` exists β†’ list ALL inputs by name. If `workflow_dispatch:` has NO `inputs:` block (bare dispatch) β†’ state "inputs: none". **NEVER INVENT INPUTS.** If you cannot find `inputs:` with named keys in the YAML β†’ the workflow has NO inputs. Common hallucination: inventing plausible input names (max_prs, session_name, choice, skip_commit, etc.) for bare workflow_dispatch. This is HIGH error. + +30. **Reusable-workflow calls are NOT inline edges.** When a workflow calls another via `uses: org/repo/.github/workflows/X.yml@ref`, model it as a handover annotation (`note right of StateX: Delegates to org/repo X.yml`), NOT as inline states/edges. The called workflow's internal states belong to the OTHER repository/workflow. Only the call and its outputs are local. + +31. **gh-aw engine internals are INVISIBLE.** The gh-aw runtime has internal phases (pre-activation, activation, detection, conclusion, safe_outputs evaluation, aw_context dispatch). These are engine implementation details and NEVER appear as YAML keys in the workflow source `.md` file. **NEVER document them.** Only document what is explicitly in the YAML frontmatter: `on:`, `schedule:`, `workflow_dispatch:`, `slash_command:`, `safe-outputs:`, `tools:`, `labels:`, `roles:`. If you find yourself writing "pre_activation", "activation gate", "detection phase", "conclusion step", or "aw_context" β†’ STOP and DELETE. These are hallucinations. + +32. **Trigger existence verification.** Before documenting ANY trigger for a workflow, re-read the `on:` block (`.yml`) or YAML frontmatter (`.md`). The trigger MUST appear as an explicit key. Do NOT add `workflow_dispatch` to a workflow unless you can cite the exact `workflow_dispatch:` line. Invented triggers = HIGH error. + +33. **Source scope completeness.** Scan ALL files under `.github/workflows/` including subdirectories (e.g., `shared/`). README files (`*.README.md`, `README.md`) in subdirectories are documentation, not workflows β€” note them in the manifest but do NOT model them as workflow state machines. However, shared importable files (`shared/*.md` that are imported by workflows) MUST be documented as part of the importing workflow's behavior. + +27. **Text/count consistency.** When prose says "N steps", "N guards", "N labels", or "N nodes", the number MUST match the diagram AND the source. After writing any count in prose, immediately verify: (a) count diagram states for that workflow, (b) grep source steps. All three numbers must agree. Mismatch between prose and diagram (even if diagram is correct) = HIGH. + **Safeguard inventory counts.** If a safeguard section header says "N `if:` guards", count the bullets below it. They MUST match. Same for "N steps/nodes" in overview rows. + **Unnamed steps.** Steps without `- name:` still exist. When counting steps, use `grep -c '^\s*- ' ` (all list items under `steps:`), NOT just `grep -c '^\s*- name:'`. Unnamed steps that perform actions (checkout, setup, etc.) MUST be counted and modeled. + + +34. **dotnet/issue-labeler workflow patterns (labeler-*.yml).** These shared workflows appear across many dotnet repos and have CONSISTENT patterns that MUST be modeled correctly: + (a) **`cache_key` input is REQUIRED with default `ACTIVE`** β€” never document it as optional or omit it. `cache_key_suffix` in labeler-train may have default `staged`. + (b) **labeler-train.yml has TWO parallel pipelines** (issues lane + pulls lane). Jobs are gated by `inputs.type` (`Issues`/`Pull Requests`/`Both`). When `type=Both`, BOTH lanes fire β†’ use `<>`/`<>`. The dispatch `type` input is NOT three separate triggers β€” it's ONE `workflow_dispatch` with ONE `type` input. + (c) **labeler-promote.yml has TWO parallel root jobs** (`promote-issues`, `promote-pulls`) with boolean inputs. When both are true β†’ `<>`/`<>`. When only one β†’ single lane. + (d) **labeler-cache-retention.yml** often uses matrix strategy for issues/pulls β†’ parallel β†’ `<>`/`<>`. + (e) **labeler-predict-*.yml** jobs are gated by org ownership (`github.repository_owner == 'dotnet'`) on auto triggers. Prediction only happens on cache HIT β€” model the cache-miss skip path. + (f) **Never invent labeler inputs.** Read the `inputs:` block literally. Common real inputs: `cache_key`, `cache_key_suffix`, `limit`, `page_size`, `page_limit`, `issues`/`pulls` (booleans). Do NOT invent: `max_prs`, `session_name`, `choice`, `skip_commit`. + (g) **`needs:` chains in labeler-train** = sequential dependency, NOT parallel. `download-*` β†’ `train-*` β†’ `test-*` per lane. Model as chain, not fork. + +35. **backport.yml pattern.** The trigger is `issue_comment` with `types: [created]` (and optionally `schedule`). The `/backport to ` text match is a **JOB-LEVEL `if:` guard** (`contains(github.event.comment.body, '/backport to')`), NOT a trigger qualifier. Similarly, `github.event.issue.pull_request` is a job guard checking the comment is on a PR. Model: `[*] --> CommentGuard <>` with the body-match + PR-check as the guard predicate. + +36. **Common repo-owner guards.** Many dotnet repos gate jobs with `github.repository == 'dotnet/'` or `github.repository_owner == 'dotnet'` or `!github.event.repository.fork`. These are JOB-LEVEL guards that prevent execution in forks. They MUST appear as `<>` gates or entry-edge labels. Missing repo guard = HIGH. + +37. **Push/PR events use correct actor prefixes.** A `push` event is triggered by the git engine β€” use βš™οΈ. A `pull_request` or `issues` event opened by a human uses πŸ‘€. A `pull_request` event created by a bot (like dependabot) uses πŸ€– ONLY if the bot is explicitly named. Default: `pull_request.opened` = πŸ‘€, `push` = βš™οΈ, `schedule` = ⏰. + +38. **`always()` in guard expressions MUST be preserved.** When a job `if:` uses `always() && `, the `always()` modifier is semantically significant β€” it means the job evaluates even when predecessors fail/are skipped. Document the FULL expression including `always()`. Dropping `always()` changes the semantics and is HIGH error. + +39. **gh-aw `safe-outputs:` β€” signature, not enumeration.** For each gh-aw `.md` workflow, document its safe-output **signature**: which action verbs it can emit (`create-pull-request`, `add-comment`, `push-to-pull-request-branch`, `add-labels`, etc.) and the distinguishing config per verb. **Do NOT exhaustively list every leaf key.** Universal defaults are suppressed: `target: "*"`, `noop.report-as-issue: false`, `draft: false`. Per-workflow blocks list only OVERRIDES + behaviorally distinguishing fields: `max`, `title-prefix`, `labels`/`allowed`, `allowed-files`, `protected-files`, `reviewers`, `auto-merge`, `hide-older-comments` (when true), `base`. + **Format β€” PREFER PER-WORKFLOW MINI-TABLES** with columns `| Workflow | Output | Max | Key Constraints |`. Tables scan faster than run-on prose for any workflow with β‰₯3 actions or any action with β‰₯3 distinguishing fields. Multi-reviewer feedback (Sonnet + GPT-5.4 + Gemini, average 2.67/5 on first pass) ranked run-on safe-output prose as the #1 readability failure mode. Example: + ``` + | Workflow | Output | Max | Key Constraints | + |---|---|---|---| + | `labelops-flake-fix.md` | `create-pull-request` | 1 | title `[LabelOps Flake] `; labels `automation, Flaky, NO_RELEASE_NOTES`; protected-files fallback-to-issue | + | `labelops-flake-fix.md` | `create-issue` | 1 | title `[LabelOps Flake] `; labels `Flaky, automation` | + | `labelops-flake-fix.md` | `add-comment` | 1 | β€” | + ``` + Group all workflows for a Group (A1, A2, A3) into ONE shared table. Paragraph form is acceptable ONLY for trivial workflows (≀2 actions, each with ≀1 constraint). Missing action verb = HIGH. Missing override field (non-default value) = MED. Exhaustive enumeration of defaults = readability MAJOR (Rule 42). + +40. **NEVER reference internal rules or extraction artifacts in output.** The generated doc must be self-contained. Do NOT write "per Rule 23", "excluded per Rule N", or reference `/tmp/*.txt` extraction files. If excluding a workflow, write `⚠️ Excluded β€” too complex for accurate automated documentation` without referencing rule numbers. + +41. **Mermaid edge-label sanitization β€” RENDER OR DIE.** Every Mermaid block in the output MUST parse cleanly under `mermaid.parse()`. The `stateDiagram-v2` lexer inside `state X { ... }` composite blocks has known fragilities that silently break rendering. NEVER emit the following characters or patterns inside an edge label (the text after ` : ` on any `A --> B : ...` line): + - **Semicolon `;`** β€” the lexer treats `;` as a statement separator inside composite blocks. If ANY character after `;` resembles a new identifier (especially a hyphenated identifier like `allowed-files`, `fetch-depth`, `AI-thinks-issue-fixed`), the lexer aborts with `Lexical error … Unrecognized text`. **Replace `;` with `,` always.** When listing config entries, comma-separate: `(labels: a, b, c, allowed-files: docs/**)` β€” never `(labels: a, b, c; allowed-files: docs/**)`. + - **HTML control characters `<` `>` `&`** at top level of labels β€” although currently tolerated by stateDiagram-v2, they are HTML-rendered downstream and may break in browser viewports. Prefer Unicode replacements: `<` β†’ `≀` or `lt`; `>` β†’ `β‰₯` or `gt`; `&` β†’ `and` or `+`. Only inside backticked code spans are HTML chars safe. + - **Unbalanced quotes/brackets/parens** in labels β€” every `(` needs a matching `)`, every `"` needs a closing `"`, every `[` needs `]`. Unbalanced delimiters break diagram rendering even when individual chars are tolerated. + - **Backslash `\`** β€” escape sequences are interpreted; avoid entirely. + - **Triple-period `...` immediately followed by a hyphenated identifier** β€” same lexer class as `;` issue. + **Post-draft MANDATORY check:** after emitting each Mermaid block, scan every line matching `--> .* : ` and verify no `;` appears in the label text. If found β†’ replace with `,`. This is non-negotiable: a doc with one un-rendering Mermaid block fails Phase 3.5 with CRIT severity (verifier check (p)). + +42. **Compaction is mandatory β€” the doc must be readable on a single screen.** Tables and lists are for scanning, not enumeration. + - **Suppress universal defaults.** State once at section top: "gh-aw safe-output defaults (suppressed below): `target: \"*\"`, `noop.report-as-issue: false`, `draft: false`." Per-workflow blocks then list only deviations. + - **Group identical-pattern entries.** Label families (`⚠️ Affects-Build-Infra`, `⚠️ Affects-Compiler-Output`, …) β†’ ONE bullet: `⚠️ Affects-* family (7 labels: Build-Infra, …)`. Same applies to any 3+ rows with identical Classification + Applied-By + Action columns. + - **Per-workflow signature blocks, not per-leaf-key tables.** For safe-outputs, emit one paragraph per workflow (Rule 39). Total safe-output section lines for the whole doc MUST be ≀30 (not 90+). + - **Label index as ONE flat table, not semantic groups.** Columns: `Label | Type | Added by | Removed by | Read by | Notes`. The flat shape (a) shows producerβ†’consumer flows on one row (e.g., `AI-Issue-Regression-PR` added by RA, read by RPS β€” visible cross-workflow signal), (b) handles labels that are both added and removed (e.g., `AI-thinks-issue-fixed` β€” bidirectional) without splitting across sections, (c) scans faster for lookup ("where does X come from? who removes it?"). The earlier 5-group format was rejected by 2 of 3 readability reviewers as forcing taxonomy-first comprehension. Type values: `always-applied`, `agent-add`, `agent-remove`, `agent-add + agent-remove` (bidirectional), `filter (read-only)`, `imperative`. Group same-prefix label families (e.g., `⚠️ Affects-*`) into ONE row with the family expansion in Notes. Total Label section MUST be ≀30 lines. + - **Overview table ≀5 columns** β€” `# | Workflow | Trigger | Inputs | Primary Actions`. Drop "Type" (metadata). Inline `concurrency` in Inputs cell only when present. + - **Hard limits:** any single table > 25 rows = MAJOR. Doc total lines > 600 = MAJOR. Total `^|` pipe-row count > 80 = MAJOR. + +43. **Edge-label brevity β€” the diagram is a picture, not a config dump.** + - **≀80 chars per edge label.** If the source config needs more, split into intermediate states OR move the detail to the per-workflow signature block (Rule 39). + - **Behavior verb + brief object only.** `πŸ€– create-pull-request` βœ“. `πŸ€– create-pull-request (labels: automation, NO_RELEASE_NOTES, allowed-files: .github/docs/**, protected-files: allowed)` βœ— β€” move config to the sig block. + - **Permitted in edge labels:** action verbs (e.g., `create-pull-request`, `add-labels`, `push-to-pull-request-branch`), short object hints (≀25 chars: `(fix)`, `(skip)`, `(escalate)`), guard predicates from source (e.g., `if: success() || failure()`), `src Lnn` citations ONLY when proving a non-obvious safeguard exists (e.g., dedup gates). + - **Forbidden in edge labels:** full safe-output config blocks `(labels: …, max: …, protected-files: …)`; long shell commands (`gh pr list --search label:"…", drop drafts, forks, …, max 3 (seed=GITHUB_RUN_ID)` β†’ `gh pr list (label X, ≀3)`); multi-clause descriptions joined by `,`/`and`. + - **Post-draft scan:** `grep -oE ' : .{80,}$' .github/docs/state-machine.md` MUST return 0 lines. Any line > 80 chars after the ` : ` separator = MED. + +44. **Glossary mandatory β€” define every domain term at first use OR in a top-of-doc glossary.** The doc must be readable by a first-time engineer who has never seen this repo. The following term classes MUST be defined: + - **Project-specific tool names** that are not standard GitHub CLI / Actions vocabulary (e.g., `gh-aw`, `flaky-test-detector`, `skill-validator`, repo-specific scripts). + - **Custom frameworks and runtime concepts** (e.g., `safe-outputs`, `state-store branch`, `noop`, `report-as-issue`). + - **Acronyms** on first use (e.g., `CCA = Copilot Coding Agent`, `BSL = Baseline`, `CCS = Copilot Code Suggestion`, `LPM/LFF/LPSS/RPS/RA` if used as state-prefix conventions). Spell out on first use AND in the glossary. + - **Project-specific taxonomies** (e.g., regression-pr-shepherd's `Cat A/B/C`, `B0–B4` subtypes; labelops's `has_ci/has_conflicts/ci_blocked`). + - **Diagram convention key** β€” the actor-prefix emoji legend (⏰ schedule, πŸ‘€ human, βš™οΈ workflow engine, πŸ€– agent/bot) MUST appear as a legend block. Same for `<>` / `<>` / `<>` if used. + Place the glossary IMMEDIATELY after the title and intro paragraph, BEFORE the Overview table. A first-time reader rated 2/5 on a 5-point readability scale citing exactly these gaps. Missing glossary entry for a term used 3+ times = MAJOR. Missing emoji legend = MAJOR. + +45. **Self-contained β€” never use source-file pointers as documentation.** Any phrase like `"(see file.md L100–110)"`, `"per source line N"`, `"refer to "`, or `"as defined in "` in PLACE of actual content is a documentation failure. Inline the content. Citations `(src Lnn)` are permitted ONLY as provenance markers AFTER the documented content, never AS the content. Example: + - ❌ WRONG: `RA_T2_SkipCheck --> RA_TaskFinal : βš™οΈ check skip conditions (repo-assist.md L296–306)` + - βœ… CORRECT: `RA_T2_SkipCheck --> RA_TaskFinal : βš™οΈ check 6 skip conditions` + an inline `> **Skip conditions**: 1. closed; 2. existing PR; 3. existing coverage; 4. test-link comment; 5. untestable comment; 6. human coverage comment.` callout below the diagram. + Any source-pointer-as-content = MAJOR. Inlined skip conditions, taxonomy enumerations, and predicate lists belong in the doc itself. + +46. **Orientation paragraph mandatory at top.** Before any table, diagram, or section, the doc MUST open with 1–3 sentences answering: (a) **what is this doc** (a map / catalogue / spec of which artifacts?), (b) **who reads it** (new engineer onboarding? PR reviewer? auditor?), and (c) **how to use it** (read top-down? jump to glossary? cross-reference with Handover Map?). The current cryptic generator-version stamp (`> **15 workflows documented.** ... FULL_REWRITE (generator d4fe5640...)`) is NOT orientation β€” it is metadata. Three independent reviewers flagged the missing intro as a P0 issue. Missing intro paragraph = MAJOR. + +47. **Mermaid grouping β€” workflows with cross-dependencies STAY in the same block.** Reviewers may complain that multi-workflow Mermaid blocks are "too dense" and propose one-workflow-per-block. **Resist this for any group whose workflows have cross-workflow signals** β€” shared labels (producer/consumer), shared state-store branches, dispatch-workflow handovers, indirect-via-label handoffs. Splitting these into separate diagrams hides the dependency graph: the reader sees independent islands and has to reconstruct the cross-edges by mental cross-reference with the Labels table and Handover Map. Keep them grouped so the visual proximity itself communicates "these interact." Acceptable to split groups whose workflows are truly independent (e.g., Group D β€” `branch-merge` / `skill-validation` / `copilot-setup-steps` share nothing). Per-workflow splits that erase visible cross-edges = MAJOR. + + + +# Phase 0: Deterministic extraction + evolution mode detection. +EXCLUDE_LIST="${EXCLUDE_FROM_DOCS:-}" +MANIFEST="/tmp/workflow-manifest.txt" +SELF_FILE=".github/workflows/agentic-state-machine.md" +DOC_FILE=".github/docs/state-machine.md" +: > "$MANIFEST" + +# --- Evolution mode detection --- +SELF_SHA="$(shasum -a 256 "$SELF_FILE" 2>/dev/null | cut -c1-16 || echo "unknown")" +EXISTING_GEN_SHA="$(grep -o 'generator-version: [a-f0-9]*' "$DOC_FILE" 2>/dev/null | awk '{print $2}' || echo "none")" +SOURCE_SHAS="" + +echo "=== GENERATOR VERSION: $SELF_SHA ===" >> "$MANIFEST" +if [ "$SELF_SHA" = "$EXISTING_GEN_SHA" ]; then + echo "=== MODE: INCREMENTAL (same generator version) ===" >> "$MANIFEST" + EVOLUTION_MODE="incremental" +else + echo "=== MODE: FULL_REWRITE (generator version changed: $EXISTING_GEN_SHA β†’ $SELF_SHA) ===" >> "$MANIFEST" + EVOLUTION_MODE="full_rewrite" +fi + +for f in .github/workflows/*.md .github/workflows/*.yml; do + [ -f "$f" ] || continue + base="$(basename "$f")" + case "$base" in *.lock.yml) continue ;; esac + echo ",$EXCLUDE_LIST," | grep -qF ",$base," && continue + + sha="$(shasum -a 256 "$f" | cut -c1-8)" + echo "=== FILE: $base SHA: $sha ===" >> "$MANIFEST" + + echo "--- TRIGGERS ---" >> "$MANIFEST" + awk ' + /^on:|^"on":|^'\''on'\'':/ { capture=1; next } + capture && /^[a-zA-Z]/ && !/^ / { capture=0 } + capture { print NR": "$0 } + ' "$f" | head -40 >> "$MANIFEST" + grep -n 'workflow_dispatch\|schedule\|pull_request\|issues\|push\|issue_comment\|types:\|branches:\|paths:\|cron:' "$f" | head -20 >> "$MANIFEST" + + echo "--- GUARDS (if:) ---" >> "$MANIFEST" + grep -n ' if:' "$f" | head -20 >> "$MANIFEST" + + echo "--- ROLES ---" >> "$MANIFEST" + grep -n 'roles:' "$f" | head -5 >> "$MANIFEST" + + echo "--- LABELS ---" >> "$MANIFEST" + grep -n 'labels:\|allowed-labels:' "$f" | head -20 >> "$MANIFEST" + + echo "--- INPUTS ---" >> "$MANIFEST" + awk '/workflow_dispatch:/,/^[^ ]/' "$f" | grep -n '^\s\+\w' | head -40 >> "$MANIFEST" + + echo "--- REUSABLE ---" >> "$MANIFEST" + grep -n 'uses:' "$f" | grep -v 'actions/' | head -10 >> "$MANIFEST" + + echo "--- STEP-COUNT ---" >> "$MANIFEST" + steps=$(grep -c '^\s*- name:' "$f" 2>/dev/null || echo "0") + all_steps=$(awk '/^\s+steps:/{s=1;next} s && /^\s*- /{c++} s && /^[^ ]/{s=0} END{print c+0}' "$f" 2>/dev/null) + jobs=$(awk '/^jobs:/{f=1;next} f && /^ [a-z_-]+:/{c++} f && /^[^ ]/ && !/^ /{f=0} END{print c+0}' "$f" 2>/dev/null) + echo "named_steps=$steps all_steps=$all_steps jobs=$jobs" >> "$MANIFEST" + if [ "$steps" -gt 30 ] || [ "$jobs" -gt 5 ]; then + echo "COMPLEX=true" >> "$MANIFEST" + fi + + echo "" >> "$MANIFEST" +done + +echo "=== Manifest: $(grep -c '^=== FILE:' "$MANIFEST") workflow files ===" + +# --- Noop detection (incremental mode only) --- +if [ "$EVOLUTION_MODE" = "incremental" ]; then + ALL_SOURCE_SHAS="$(grep '^=== FILE:' "$MANIFEST" | awk '{print $4}' | sort | tr '\n' ',')" + EXISTING_SOURCE_SHAS="$(grep -o 'source-shas: [a-f0-9,]*' "$DOC_FILE" 2>/dev/null | awk '{print $2}' || echo "none")" + if [ "$ALL_SOURCE_SHAS" = "$EXISTING_SOURCE_SHAS" ]; then + echo "=== NOOP: all source SHAs unchanged ===" + echo "=== NOOP ===" >> "$MANIFEST" + fi +fi + + -1. `ls .github/workflows/*.md` β€” list source files. Read each. Compute `sha256sum` for fingerprint. -2. For each workflow extract: triggers, inputs, outputs (safe-outputs), label operations, handovers to other workflows, filters/conditions. -3. Group workflows by what they act on. Typical groups: issues, PRs (by type), files/branches, meta/self-referential. Let the data decide β€” do not force groups. -4. Write `.github/docs/state-machine.md` with: - **Workflow overview table** β€” one row per workflow: trigger, reads, writes, key labels. +## Phase 0.5: Evolution mode routing +0. Read the manifest header: + - If `=== NOOP ===` β†’ call `noop` safe-output and stop. Nothing to do. + - If `=== MODE: FULL_REWRITE ===` β†’ **ignore any existing `state-machine.md`**. Generate everything from scratch. The generator methodology changed β€” old output may use different modeling conventions, rules, or diagram patterns that are now obsolete. + - If `=== MODE: INCREMENTAL ===` β†’ read the existing `state-machine.md`. Only regenerate sections for workflows whose SHA changed. Preserve unchanged sections verbatim. + +## Phase 1: Build structured model +1. Read `/tmp/workflow-manifest.txt`. For gh-aw `.md` files, the manifest is authoritative. For `.yml` files, the manifest is an INDEX β€” cross-check with full source. +2. Read full source of each file for semantics. +3. Build per-workflow model: triggers, guards, inputs, writes+token, labels, downstream. +4. Cross-check: verify each trigger type exists in the manifest before documenting it. + +## Phase 1.5: Complex workflow extraction (Rule 24) +For every workflow where the manifest says `COMPLEX=true`: +5. Run: `grep -n '^\s*- name:' > /tmp/-steps.txt` +6. Run: `grep -n 'if:' > /tmp/-guards.txt` +7. Build per-job checklist: read the extraction files, group steps under their job, annotate each with its guards. +8. This checklist is the SOLE modeling input for complex workflows. If a step/guard appears in the checklist but you cannot accurately model it β†’ exclude the ENTIRE workflow per Rule 23. + +## Phase 2: Draft +5. Draft MUST contain β‰₯1 `<>` node. Produce `.github/docs/state-machine.md` with: + - **Overview table** β€” one row per workflow + - **Mermaid `stateDiagram-v2` per lifecycle group** β€” `direction LR` + - **Label dictionary** β€” always/agent-chosen/imperative, citing line numbers + - **Handover map** β€” token-aware + - **Footer** β€” `` + +## Phase 3: Adversarial self-review +6. Re-read manifest + source. For every workflow verify: triggers, guards, inputs, labels match. +7. Structural audits (run ALL, fix before proceeding): + - `<>` edge-count (boolean=2) + - Inline `<>` syntax (ONLY in `state X <>` declarations, NEVER on edges) + - Duplicate state IDs (each declared exactly once per diagram) + - Duplicate edges (no same source,target pair) + - Branch-filter entry-arrow count (N source values = N arrows) + - `<>` under-use check + - Cross-workflow dispatch scan (no cross-workflow edges) + - Fan-out audit (non-choice with 2+ edges β†’ verify truly parallel) + +8. Behavioral audits: + - **Dedup/skip audit:** For EVERY `dispatch-workflow`, `push`, `create-issue`, `create-pull-request` action in source β†’ trace backward in diagram. A `<>` gate MUST exist before it. Missing gate = error. + - **Safeguard audit:** For EVERY workflow with β‰₯3 steps, list ALL guards in source (cooldown timers, staleness checks, "recent commit" checks, rate limits). Each MUST appear in diagram. + - **Branch completeness:** Every `<>` β†’ both arms drawn. "Else is uninteresting" is never valid. + - **Pipeline continuation:** Every `β†’ [*]` in a multi-step pipeline β†’ cite the source line that says "stop." Can't cite β†’ continue to next step. + - **Loop audit:** Every for-each/per-item loop β†’ show iteration edge, show cap if source mentions one, show one-at-a-time if source requires sequential. + +9. Cross-section consistency: + - Labels: diagrams ↔ dictionary ↔ overview (all must agree) + - Handover: producer/consumer matches dictionary writer/reader + - Dictionary β†’ Overview cross-check (every writer/reader label in correct overview column) + - Handover β†’ Dictionary cross-check (task numbers match) - **One Mermaid `stateDiagram-v2` per lifecycle group** β€” `direction LR`, composite states for sub-types within a group, `<>` for decision points. Max ~15 states per diagram; split if larger. Include βš™οΈ CI wherever a workflow reacts to check results. +10. Fix all errors. Then run final passes: sequential-pipeline, dangling-state, orphan-state. All must be clean. - **Label dictionary** β€” every label: who applies, who reads, meaning. +## Phase 3.5: Subagent self-verification (MANDATORY) +11. **Write draft** to `.github/docs/state-machine.md`. +12. **Launch a verification subagent** (task tool, agent_type: `general-purpose`) with this prompt: + > You are a strict technical verifier. Read `.github/docs/state-machine.md` (the draft documentation) and ALL workflow files listed in `/tmp/workflow-manifest.txt`. + > Verify these checks β€” report ONLY failures: + > (a) **File count**: `grep -c '^=== FILE:' /tmp/workflow-manifest.txt` vs stated total in doc. + > (b) **Per-workflow step count**: for each .yml, `grep -c '^\s*- name:'` vs diagram states for that workflow. For .md, count step-level content blocks. + > (c) **Label count**: for each label list claiming "N labels", enumerate actual source entries and compare. + > (d) **Safeguard completeness**: for each workflow with β‰₯3 guards, list ALL source `if:` conditions, thresholds, timers, caps, age filters, re-occurrence windows, budget limits, exclusion lists, fail-closed defaults. Verify EACH appears in the diagram or safeguard bullets. + > (e) **Citation spot-check**: pick 10 random `L` citations, read the actual source line, verify content matches (Β±1 line tolerance). Offset β‰₯2 on β‰₯3 citations from same file = systematic error β†’ HIGH. + > (f) **copilot-setup-steps.yml**: if it exists in the repo, verify it has a section in the doc. + > (g) **Diagram wiring**: for each mermaid block, verify: (1) every `<>` has β‰₯2 outgoing edges; (2) boolean choices have EXACTLY 2; (3) no orphan states (every non-[*] state has β‰₯1 incoming edge); (4) no dangling states (every state has β‰₯1 outgoing edge or β†’ [*]); (5) if a `<>` has 3+ edges, verify the conditions are truly categorical/mutually exclusive β€” independent sequential `if:` steps are NOT `<>`; (6) **fork/choice audit**: for every `<>`, verify guards are mutually exclusive. If branches can BOTH fire in one run (overlapping guards, parallel safe-outputs, dispatch type=Both) β†’ must be `<>`/`<>`, not `<>`. + > (h) **Count audit**: for every stated number in the doc ("N nodes", "N labels", "N workflows"), count the actual items. Mismatch = finding. + > (i) **Complex workflow checklist**: for any workflow with COMPLEX=true in manifest, read `/tmp/-steps.txt` and verify every step appears as a diagram state. Missing step = HIGH. + > (j) **Trigger entry audit**: for each workflow, count distinct trigger types in the `on:` block (pull_request_target, schedule, workflow_dispatch, issues, push, etc.), then count `[*] -->` entry arrows in the diagram for that workflow. If trigger count > entry arrow count β†’ MAJOR (missing trigger path). + > (k) **Guard else-path audit**: for every `<>` or guarded transition with exactly 1 outgoing edge, verify the else/false path exists. Missing else = HIGH. + > (l) **Job-level guard audit**: for every `jobs..if:` in source, verify the guard appears in the diagram (as <> or edge label). Missing job-level guard = HIGH. + > (m) **Phantom state audit**: for every state referenced in a transition, verify it is declared (has `state X` or appears as a composite). Undefined state = HIGH. + > (n) **Dead-trigger audit**: for each trigger in `on:`, check if ALL jobs gate it off (e.g., `if: event_name != 'X'`). If so, verify NO diagram entry path exists for that trigger. Modeled dead trigger = HIGH. + > (o) **Prose-count audit**: scan the doc for every phrase matching "N guards/steps/nodes/labels/workflows" (any number). For each: count the actual items listed below or drawn in the diagram. If stated count β‰  actual count β†’ MAJOR. Pay special attention to safeguard inventory headers ("N `if:` guards") and overview row step counts. + > (p) **Mermaid renderability β€” CRIT**: every ```` ```mermaid ``` ```` block in the doc MUST parse cleanly. Run this exact bash: + > ```bash + > # Phase 3.5 (p): Mermaid syntactic renderability check + > if ! command -v node >/dev/null; then echo "ERROR: node required for check (p)"; exit 1; fi + > mkdir -p /tmp/mermaid-check && cd /tmp/mermaid-check + > test -d node_modules || { npm init -y >/dev/null 2>&1; npm install --silent mermaid jsdom 2>&1 | tail -3; } + > cat > check.mjs <<'JS' + > import { JSDOM } from 'jsdom'; + > const dom = new JSDOM(''); + > global.document = dom.window.document; global.window = dom.window; + > const m = (await import('mermaid')).default; + > m.initialize({ startOnLoad: false }); + > const fs = await import('fs'); + > const doc = fs.readFileSync(process.argv[2], 'utf8'); + > const re = /```mermaid\n([\s\S]*?)```/g; + > let i = 0, mm, fails = 0; + > while ((mm = re.exec(doc)) !== null) { + > i++; + > try { await m.parse(mm[1]); } + > catch (e) { fails++; console.log('CRIT\tmermaid-block-' + i + '\t' + String(e.message).replace(/\n/g,' | ').slice(0,200)); } + > } + > if (fails === 0) console.log('CLEAN'); + > JS + > node check.mjs .github/docs/state-machine.md + > ``` + > ANY parse failure = CRIT (the diagram cannot render in GitHub or browsers). Root-cause class: `;` inside edge labels in composite states triggers `Lexical error` when followed by hyphenated identifiers (see Rule 41). Fix by replacing `;` with `,` in every edge label. + > (q) **Readability metrics β€” MAJOR if violated** (Rules 42, 43). Run: + > ```bash + > LINES=$(wc -l < .github/docs/state-machine.md) + > PIPES=$(grep -c '^|' .github/docs/state-machine.md) + > LONGEDGES=$(grep -oE ' : .{80,}$' .github/docs/state-machine.md | wc -l | tr -d ' ') + > MAXTABLE=$(awk 'BEGIN{m=0;c=0} /^\|/{c++; if(c>m)m=c} !/^\|/{c=0} END{print m}' .github/docs/state-machine.md) + > echo "LINES=$LINES PIPES=$PIPES LONGEDGES=$LONGEDGES MAXTABLE=$MAXTABLE" + > FAIL=0 + > [ "$LINES" -gt 600 ] && { echo "MAJOR\treadability\tdoc lines $LINES > 600 (Rule 42)"; FAIL=1; } + > [ "$PIPES" -gt 80 ] && { echo "MAJOR\treadability\tpipe-row count $PIPES > 80 (Rule 42)"; FAIL=1; } + > [ "$MAXTABLE" -gt 25 ] && { echo "MAJOR\treadability\tlargest contiguous table $MAXTABLE rows > 25 (Rule 42)"; FAIL=1; } + > [ "$LONGEDGES" -gt 0 ] && { echo "MED\treadability\t$LONGEDGES edge labels > 80 chars (Rule 43)"; FAIL=1; } + > [ "$FAIL" -eq 0 ] && echo "CLEAN (readability)" + > ``` + > These limits encode user-validated screen-readability bounds. Any violation = budget breach (verifier failure), even when the content is technically complete. Compaction (Rule 42) and edge-label brevity (Rule 43) trump exhaustive enumeration (Rule 39 sig form is correct). + > Output format: one line per failure β€” `SEVERITYWORKFLOWFINDING` or `CLEAN` if all pass. +13. **Fix all findings** from the subagent. If β‰₯3 findings, re-run the subagent after fixes. Repeat until CLEAN or ≀2 MINOR. - **Handover map** β€” agent↔agent, human↔agent, schedulerβ†’agent. One table. +## Phase 4: Mermaid sanitization (MANDATORY post-process) +13a. Before emitting, run a final deterministic sanitization pass to strip lethal characters from every Mermaid edge label: +```bash +python3 - <<'PY' +import re +path = '.github/docs/state-machine.md' +src = open(path).read() +def sanitize_block(match): + out = [] + for line in match.group(0).split('\n'): + m = re.match(r'^(\s*(?:\[\*\]|\S+) --> \S+ : )(.*)$', line) + if m: + label = m.group(2).replace(';', ',') + out.append(m.group(1) + label) + else: + out.append(line) + return '\n'.join(out) +out = re.sub(r'```mermaid\n[\s\S]*?```', sanitize_block, src) +open(path, 'w').write(out) +PY +``` +This is belt-and-suspenders: even if the model emits `;` somewhere, this step rewrites it. It MUST run before Phase 5 emits the final file. Re-run check (p) after sanitization to confirm zero parse failures. - **Footer**: `` +## Phase 5: Emit +14. Write final `.github/docs/state-machine.md` (overwriting the draft from Phase 3.5). +15. Open PR via `create-pull-request`. -5. Open PR via `create-pull-request`. -- `stateDiagram-v2`, `direction LR` for wide screen layout. -- Composite states for sub-types: `state "Regression PRs" as RegPR { ... }` -- Cross-composite transitions go OUTSIDE the composite blocks (Mermaid limitation). -- `<>` for decision points, notes for context. -- Actor prefixes on every edge: `πŸ€– (⏰ 12h)`, `βš™οΈ CI passes`, `πŸ‘€ maintainer merges`. -- No placeholder/fake names in examples β€” agent discovers real names from source files. +- `stateDiagram-v2`, `direction LR`. +- Composite states for sub-types. **Composite entry rule:** if an external edge targets a composite, add `[*] --> FirstInnerState` inside the composite so Mermaid knows which inner state to enter. **Composite exit rule:** the inner `[*]` only terminates the nested substate β€” if the composite itself needs to continue to a downstream state, add an EXPLICIT outer transition `CompositeState --> NextState`. Missing outer exit = dangling composite = ERROR. Cross-composite transitions outside. +- `<>` ONLY after Q1/Q2/Q3. Verify you haven't UNDER-used it. `<>`/`<>` when guards overlap (Rule 6). +- **`<>` is DECLARATION ONLY.** Never `A --> B <>`. Always `state B <>` on its own line. +- **Unique state IDs.** Each ID declared once per diagram (not both simple and composite). +- Actor prefixes on EVERY edge (Rule 28): πŸ‘€ human, πŸ€– agent-name, βš™οΈ workflow, ⏰ cron. **ZERO EXCEPTIONS.** Every `-->` line MUST have one of these emoji prefixes in its label. `[*] --> State` entries MUST have a prefix indicating what triggers it (πŸ‘€ for manual/PR/issue events, ⏰ for cron, βš™οΈ for push/workflow_call). **Post-draft MANDATORY fix:** scan every `-->` line in every diagram. If ANY edge lacks a prefix β†’ add it before emitting. This is the #1 most common error. +- Every conditional: BOTH branches with guard scope. +- **Terminal states β†’ `[*]`.** Post-draft: verify every leaf state has `β†’ [*]`. Missing terminal exits = ERROR. Scan after drawing: every state that has no outgoing edge MUST have `--> [*]`. **NEVER use custom sink states** (like `Done`, `End`, `Finished`, `*_End`). The ONLY valid terminal is `[*]`. Custom sinks = ERROR. +- **One entry arrow per filter value.** `branches: [main, release/*]` = TWO arrows. +- Cross-workflow dispatches: handover annotation, not inline transition. +- **Edge-label safety (Rule 41):** NEVER `;` in labels β€” use `,`. Avoid `<` `>` `&` outside backticks β€” use `≀` `β‰₯` `and`. Match all `(`/`)`, `"`/`"`, `[`/`]`. NEVER `\`.