Improve agentic state-machine generator prompt#19901
Open
T-Gro wants to merge 9 commits into
Open
Conversation
…docs/ output The agentic-state-machine workflow has been failing since PR #19721 moved the output to .github/docs/state-machine.md. Files under .github/ are treated as protected by gh-aw (agent instruction files, security config). The allowed-files config permits WHICH files can be modified but does not override the built-in protected-files blocking. Adding protected-files: allowed explicitly opts in, which is safe since allowed-files already restricts writes to .github/docs/** only. Fixes #19739 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Evolve the 74-line freeform prompt to a 412-line structured extraction pipeline with multi-phase self-verification. The agent now extracts structured IR per workflow before rendering diagrams, with two self-verification passes (structural + safeguard). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
Run the new generator prompt end-to-end and verify the output converges under five parallel adversarial verifiers (triggers, diagram wiring, behavior/safe-outputs, labels/citations, counts/consistency) to 0 CRIT / 0 HIGH / 0 MED / 0 technically-incorrect findings. The doc now enumerates all 15 workflows (including copilot-setup-steps), adds dedup choice gates per Rule 20, exhaustively enumerates every gh-aw safe-outputs leaf key per Rule 39, and uses correct actor prefixes on every edge. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mermaid stateDiagram-v2's lexer treats `;` as a statement separator
inside `state X { ... }` composite blocks. When followed by a token
containing a hyphen (e.g., `allowed-files`, `fetch-depth`,
`AI-thinks-issue-fixed`), the lexer aborts with "Lexical error:
Unrecognized text", which prevents the diagram from rendering in GitHub
or any browser viewport.
5 of 6 diagrams in the generated doc were failing to render. None of
the existing 40 generator rules covered Mermaid syntactic safety, and
none of the 15 Phase 3.5 verifier checks parsed the output.
Generator changes:
- Add Rule 41 (Mermaid edge-label sanitization): forbid `;` and HTML
control chars in labels; require balanced delimiters; explain the
lexer interaction with hyphenated identifiers.
- Add Phase 3.5 verifier check (p): parse every Mermaid block with
`mermaid.parse()` via jsdom; any parse failure is CRIT.
- Add Phase 4 deterministic sanitization post-process (Python) that
rewrites `;` to `,` in every edge label before emit; runs as
belt-and-suspenders even if the model regresses.
- Add a one-line edge-label safety summary to <diagram-guidelines>.
Doc changes:
- Apply the Phase 4 sanitization to the existing state-machine.md
(40 line touches, no semantic change beyond `;` -> `,`).
- Verified: all 6 Mermaid blocks now parse cleanly under mermaid 10.x.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous generator optimized for verifier completeness (every leaf key documented) and produced an unreadable wall of tables: 90 rows of safe-output keys (mostly defaults like "target: '*'" repeated 13 times), 24-row label dictionary with 9 near-identical rows for the "Affects-*" family, 7-column overview that scrolled horizontally, and diagram edge labels dumping full config inline. Doc changes: - Overview: 7 cols -> 5 cols (drop Type and Concurrency; inline serialization in Inputs cell when present). - Safe-outputs: 3 tables totaling 90 rows -> 9 per-workflow signature paragraphs. Universal defaults (target '*', noop.report-as-issue false, draft false) stated once at top, suppressed below. - Label Dictionary: 24-row 5-col table -> 5 semantic groups (always-applied, agent-chosen add, agent-chosen remove, trigger filters, imperative). "Affects-*" family collapsed to one bullet. - Diagram edge labels: shortened from full-config dumps to behavior verbs + brief object hints (all now <80 chars). Generator changes: - Rule 39 rewritten: signature-level documentation, not exhaustive enumeration. Sig form: one paragraph per workflow listing action verbs with override config; defaults suppressed. - Rule 42 added (Compaction): hard limits on doc lines (<=600), pipe rows (<=80), per-table rows (<=25). Mandate semantic grouping for label dictionaries and per-workflow signature blocks. - Rule 43 added (Edge-label brevity): <=80 chars per label, behavior verb + brief object only; full config goes to sig blocks. Post-draft grep verifies 0 lines exceed. - Phase 3.5 verifier check (q) added: deterministic bash readability metrics (LINES/PIPES/LONGEDGES/MAXTABLE) with explicit thresholds. Metrics: - Doc: 603 -> 501 lines (-17%); 143 -> 27 pipe rows (-81%); max table 30 -> 17 rows; longest edge label was 250+ chars -> all <80. - All 6 Mermaid blocks still render (zero regression on Rule 41 fix). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three independent rubber-duck reviewers (Sonnet, GPT-5.4, Gemini) read the doc fresh and averaged 2.3/5 on readability. Convergent complaint across all three: domain jargon used without definition (gh-aw, safe-outputs, CCA, flaky-test-detector, Cat A/B/C, B0-B4), no orientation paragraph, and one diagram edge that pointed at source-file line numbers in place of actual content. Doc changes: - Add 'What this doc is' intro paragraph above the Overview. - Add a Glossary section defining the 8 domain terms that all three reviewers flagged as undefined. - Add a Legend table for the actor-prefix emoji convention (clock/person/gear/robot) and the choice/fork/join pseudo-states. - Inline the 6 repo-assist Task 2 skip conditions where the diagram previously linked to repo-assist.md L296-306. Source-pointer was not documentation. - Add a 'task ordering' callout immediately after the repo-assist diagram explaining why Task 1 -> Task 3 -> Task 2 -> Task FINAL is non-sequential by design. - Convert the repo-assist safe-output signature from a 9-item single-line comma soup into a 10-line bulleted action list. - Make the commands.yml two-job artifact boundary visible by inserting a CMD_JobBoundary intermediate state. - Shorten one pre-existing 83-char edge label to satisfy Rule 43. Generator changes (so this stays fixed in future regenerations): - Rule 44 (Glossary mandatory): every domain term used must be defined at first use or in a top-of-doc glossary. Enumerates the term classes (project-specific tools, custom frameworks, acronyms, taxonomies, diagram convention key). - Rule 45 (Self-contained): source-file pointers like '(see file.md L100-110)' are documentation failures; inline the content and use citations only as provenance markers. - Rule 46 (Orientation paragraph mandatory): doc must open with 1-3 sentences answering what/who/how before any table or diagram. Generator-version stamps are metadata, not orientation. Metrics: 501 -> 547 lines (intro+glossary cost ~46 lines); 6/6 Mermaid blocks still render; 0 edge labels over 80 chars. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nd glossary Second 3-reviewer rubber-duck cycle: 0 source-file consults (glossary works), but average plateaued at 2.67/5 with a fresh layer of gaps the prior pass didn't address. This commit closes those: Doc: - Safe-output sections converted from run-on prose (Sonnet's #1 readability failure: 'YAML serialized into prose') to per-group mini-tables with columns: Workflow | Output | Max | Key Constraints. - New '/run commands' callout table after Group C diagram (4 rows: fantomas, ilverify, xlf, test-baseline). Sonnet and GPT both flagged these as the user-facing value of commands.yml but undefined. - Glossary expanded from 9 to 15 entries: .lock.yml, dotnet/skills, BSL (baseline), FCS (F# Compiler Service), the LPM internal flags (12h stuck guard / ci_blocked / has_ci / has_conflicts), and the two repo-specific magic constants (milestone 29, 2026-05-12 cutoff). - Removed 4 residual '(src Lnn)' provenance markers from edge labels (footer SHAs already pin source). - Replaced 'etc.' in RA_CmdOutputs edge with explicit '9 safe-outputs (see Safe-outputs below)'. Generator: - Rule 39 amended: prefer per-workflow mini-tables for safe-outputs; paragraph form acceptable only for trivial workflows (<=2 actions, <=1 constraint each). Tables won this round of reviewer feedback. Metrics: 547 -> 580 lines (<600 limit); 35 -> 69 pipe rows (<80 limit); all edge labels still <=80 chars; 6/6 Mermaid blocks render. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
User taste-calls based on three rubber-duck reviewer cycles: 1. Flatten the 5-group Labels section into ONE table (GPT preference, beats Sonnet's grouping preference) — makes cross-workflow label flows visible on a single row. 2. Keep mermaid workflow groupings (overrules Gemini's split-per- workflow recommendation) — the visual proximity communicates the hidden dependencies (shared labels, dispatch handovers, indirect signals). 3. Trim, don't grow. Doc: - Labels: 5 bulleted groups -> single 14-row table with columns Label | Type | Added by | Removed by | Read by | Notes. Producer/ consumer flows now on one row (AI-Issue-Regression-PR added by RA, read by RPS; AI-Auto-Resolve-* read by LPM; AI-thinks-issue-fixed bidirectional RA<->RPS/RA). - Overview: drop unused # column; unify Inputs sentinel. - Handover Map: drop the spurious intra-workflow row (RA task signaling is in the task-ordering blockquote, not a cross- workflow handover). - Glossary: split the 4-concept overloaded bullet (has_ci, has_conflicts, ci_blocked, 12h stuck guard) into 4 short bullets. - Group intros dropped where they just restated the diagram; noop rows collapsed to a per-group preamble. Generator: - Rule 42 flipped: prefer flat Labels table over semantic groups (with rationale and column shape). The earlier 5-group rule was rejected by 2 of 3 readability reviewers. - Rule 47 added: mermaid workflow groupings stay intact for any group whose workflows share cross-dependencies (labels, state branches, dispatch). Splitting them erases the visible dependency graph. Per-workflow split that loses cross-edges = MAJOR. Metrics: 580 -> 560 lines (-3.4%); 69 -> 78 pipe rows (still under 80 limit); 0 edge labels over 80 chars; 6/6 Mermaid blocks render; max table 17 rows. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The generator now extracts structured IR per workflow before rendering diagrams, with two self-verification passes (structural + safeguard). This fixes incorrect guard expressions, wrong lifecycle ordering, and missing safeguards in the generated docs.