feat(batch-bug-shepherd): recommendation-fold loop + Copilot+CI gates + mergeability table by danielmeppiel · Pull Request #1518 · microsoft/apm

danielmeppiel · 2026-05-27T20:07:04Z

feat(batch-bug-shepherd): recommendation-fold loop + Copilot+CI gates + mergeability table

TL;DR

Refactor batch-bug-shepherd into a single shepherd-driver convergence
loop that closes four production gaps surfaced by the in-flight bug-queue
sweep: (1) fold-by-default of every panel/Copilot recommendation,
(2) Copilot review address loop (2-round cap), (3) post-push CI watch +
recovery (3-iteration cap), (4) orchestrator ownership signal via
assign + status/shepherding. Plus a new per-PR mergeability snapshot
that aggregates into a saga-end status table the orchestrator emits at
terminal. Validated on the wave-2 run that drove PRs #1472, #1512,
#1513, #1514, #1515, #1516; PR #1514 exercised the cap (4 outer
iterations, 11 folds, 1 deferral).

Problem (WHY)

The previous skill split shepherd review and completion into two phases.
That seam hard-coded a "post advisory, address it later" pattern that
left foldable items as unbounded backlog and produced four observable
production gaps during the bug-queue sweep:

No fold discipline. Panel CEO follow-ups and Copilot inline
review items were posted as advisory bullets and rarely folded
into the same PR; severity, not scope, was the default decision
axis, so reviewers re-read the same items across multiple
iterations without convergence.
No Copilot address loop. copilot-pull-request-reviewer[bot]
comments were treated as "for the author" rather than as input
the shepherd had to classify and either fold or decline with a
recorded rationale.
No CI verification after push. A push that flipped CI red was
discovered only on the next human pass. The shepherd could close
a session claiming "ready" while CI was still failing.
No ownership signal. Issues and PRs the shepherd was actively
driving carried no machine-readable indication, so concurrent
sessions or community contributors could pick up the same item.

The mergeability table was missing too: orchestrator-side aggregation
of gh pr view --json mergeable,mergeStateStatus,statusCheckRollup
across every shepherded PR was done ad-hoc, by hand, in the final
human report. There was no canonical schema for it.

Approach (WHAT)

Collapse shepherd + completion into one shepherd-driver subagent
that runs an iterative convergence loop per PR. The loop owns
classification, panel re-run, fold/defer decision, push, CI watch,
and terminal-state signalling. Hard caps bound the loop. A new
fold-vs-defer rubric defines the decision axis as SCOPE-CREEP, not
severity. The orchestrator owns label + assignee writes on pickup
and clears them on terminal.

Concern	Old	New
Loop shape	shepherd then completion	single shepherd-driver, up to 4 outer iterations
Default for follow-ups	post and defer	FOLD, with defer as scope-creep exception
Decision axis	severity	scope-creep risk (rubric)
Copilot review	author handles	classify LEGIT/NOT-LEGIT each round, fold LEGIT (2-round cap)
Post-push CI	hope	`gh pr checks --watch` + 4-bucket recovery (3-iter cap)
Ownership	none	assign + `status/shepherding` on pickup, clear on terminal
Mergeability	ad-hoc	per-PR snapshot + saga-end aggregated table

Implementation (HOW)

File	Change
`SKILL.md`	Rewritten as orchestrator-saga over four fan-out waves; composes `apm-review-panel` rather than re-implementing it; documents fold-by-default and ownership invariants.
`design.md`	NEW. Genesis design record (mermaid + interface sketch) that the natural-language SKILL.md is derived from; refactors update both in lockstep.
`assets/shepherd-driver-prompt.md`	NEW. Replaces deleted `shepherd-prompt.md` + `completion-prompt.md`. Defines steps X.0..X.8 of the per-PR loop.
`assets/fold-vs-defer-rubric.md`	NEW. Decision authority. Axis = scope-creep, not severity. Subagent capacity is NEVER a defer reason.
`assets/copilot-classification-prompt.md`	NEW. Phase X.0 template: fetch Copilot review via `gh api`, classify LEGIT/NOT-LEGIT with rationale.
`assets/ci-recovery-checklist.md`	NEW. Post-push `gh pr checks --watch` contract + 4 failure buckets (lint / test / infra / unknown) + 3-iteration cap.
`assets/strategic-alignment-prompt.md`, `conflict-resolution-prompt.md`, `progress-diagram.md`	NEW. Supporting prompts for the alignment and conflict-resolution sub-phases.
`assets/verdict-schema.json`	Adds four optional `completion_return` fields: `head_sha`, `mergeable`, `merge_state_status`, `ci_status`.
`assets/final-report-template.md`	Adds per-PR Mergeability status row in the PR ADVISORY block AND a saga-end Mergeability status table in the FINAL REPORT block.
`assets/ground-truth-table.md`, `fix-prompt.md`	Edited to honor the new wave shape and the fold-by-default discipline.
`references/mergeability-gate.md`, `references/strategic-alignment-gate.md`	NEW reference material the driver loads via the loaded-specs contract.
`CHANGELOG.md`	One bullet under `## [Unreleased]` / `### Added` describing the refactor + 4 gaps + mergeability table; references wave-2 PR numbers.

Mergeability snapshot mechanics

Shepherd-driver step X.8 (added in this PR) runs:

gh pr view $PR_NUMBER --repo microsoft/apm \
   --json number,headRefOid,mergeable,mergeStateStatus,statusCheckRollup

and projects into the return shape:

head_sha -> .headRefOid (the sha actually pushed last)
mergeable -> MERGEABLE | CONFLICTING | UNKNOWN
merge_state_status -> CLEAN | BLOCKED | BEHIND | DIRTY | UNSTABLE | HAS_HOOKS | UNKNOWN
ci_status -> coarse projection from statusCheckRollup
(green / yellow / red / blocked)

UNKNOWN triggers one 5-second-delay retry (GitHub computes
mergeability asynchronously after a push). The shepherd-driver
emits a one-row table fragment into the PR advisory comment; the
orchestrator aggregates all rows into the FINAL REPORT
Mergeability status table at saga-end.

Diagram

flowchart TD
    A[Orchestrator picks up PR<br/>assign + status/shepherding] --> B[Spawn shepherd-driver subagent]
    B --> C[X.0 Fetch + classify Copilot]
    C --> D[X.1 Run apm-review-panel]
    D --> E[X.2 Apply fold-vs-defer rubric]
    E --> F[X.3 Edit code, fold foldable items]
    F --> G[X.4 Lint chain silent]
    G --> H[X.5 Push to fork or superseding PR]
    H --> I[X.6 gh pr checks --watch<br/>cap 3 CI recovery iters]
    I --> J{Terminal?<br/>cap 4 outer iters}
    J -- No --> C
    J -- Yes --> K[X.8 Capture mergeability snapshot<br/>gh pr view --json mergeable,mergeStateStatus,statusCheckRollup]
    K --> L[Post final advisory comment<br/>incl. per-PR mergeability row]
    L --> M[Return completion_return JSON<br/>incl. head_sha, mergeable, ci_status]
    M --> N[Orchestrator clears status/shepherding<br/>aggregates saga-end Mergeability status table]

Trade-offs

Caps are hard, not soft. 4 outer iterations, 2 Copilot rounds,
3 CI recovery iterations. Hitting a cap returns advisory-with- deferred or blocked, not silent continuation. Cost: some PRs
terminate with unfolded items; benefit: bounded subagent budget
and predictable convergence.
Fold-by-default raises the in-scope quality bar. PRs may grow
past their original diff with regression-trap tests, CHANGELOG
entries, and doc-drift fixes. Cost: larger diffs; benefit: no
follow-up issue backlog from the panel pass.
Mergeability fields are advisory only. The skill does NOT
auto-merge or block on mergeStateStatus; the table is for the
maintainer's situational awareness.
JSON schema fields are optional. Pre-refactor completion_return
payloads still validate; the four new fields land non-required to
preserve backwards compatibility during the wave-2 transition.

Validation evidence

Wave-2 shepherd runs

The refactor was validated on a real bug-queue sweep that drove six
PRs through the new loop:

PR	Outer iters	Folds	Deferrals	Copilot rounds	Outcome
#1472	1	small	0	1	ready-to-merge
#1512	1-2	small	0	1	ready-to-merge
#1513	1-2	small	0	1	ready-to-merge
#1514	4 (cap hit)	11	1	2	advisory-with-deferred
#1515	1-2	small	0	1	ready-to-merge
#1516	1-2	small	0	1	ready-to-merge

PR #1514 specifically exercised the cap: 4 outer iterations, 11
items folded into the same PR, 1 item deferred with an explicit
scope-boundary note. This demonstrates the rubric does what it
claims (defer is the exception; capacity is not a defer axis).

Lint chain

This PR touches NO Python; the only applicable repo lint gates are:

$ python3 -c "import pathlib; bad=[]; ... rglob('*') ..."
OK

$ bash scripts/lint-auth-signals.sh
[*] Rule A: get_bearer_provider boundary (any reference)
[*] Rule B: git ls-remote auth-delegated annotation
[+] auth-signal lint clean

ruff / ruff format / pylint R0801 do not apply (their scope is
src/ and tests/); see .apm/instructions/linting.instructions.md.

Schema

assets/verdict-schema.json validates as draft-07 (verified with
python3 -c "import json; json.load(open('...'))" -> JSON OK).

How to test

Check out the branch:

gh pr checkout <this-pr> --repo microsoft/apm

Inspect the new shepherd-driver contract:

cat .agents/skills/batch-bug-shepherd/assets/shepherd-driver-prompt.md
cat .agents/skills/batch-bug-shepherd/assets/fold-vs-defer-rubric.md
cat .agents/skills/batch-bug-shepherd/assets/ci-recovery-checklist.md

Verify the mergeability-snapshot wiring:
- assets/shepherd-driver-prompt.md step X.8 names the exact
  gh pr view --json mergeable,mergeStateStatus,statusCheckRollup
  command and the field projections.
- assets/verdict-schema.json has new completion_return
  properties head_sha, mergeable, merge_state_status,
  ci_status (all optional, all enum-constrained).
- assets/final-report-template.md PR ADVISORY block carries a
  one-row mergeability table; FINAL REPORT block carries the
  aggregated saga-end mergeability table.

Spot-check JSON validity:

python3 -c "import json; json.load(open('.agents/skills/batch-bug-shepherd/assets/verdict-schema.json')); print('OK')"

Re-run the ASCII guard:

python3 -c "import pathlib; bad=[]; [bad.append((p,i+1)) for p in pathlib.Path('.agents/skills/batch-bug-shepherd').rglob('*') if p.is_file() for i,line in enumerate(p.read_text(errors='replace').splitlines()) if any(ord(c)>126 or (ord(c)<32 and c not in chr(9)) for c in line)]; print('NON-ASCII:', bad) if bad else print('OK')"

Expect OK.

Scope discipline

This PR touches ONLY .agents/skills/batch-bug-shepherd/ and
CHANGELOG.md. No production code, no tests, no workflows. The
worktree contained unrelated in-flight edits to install.ps1 and
an untracked tests/unit/install/test_windows_shim_template.py;
both were explicitly excluded from this commit.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Evals (genesis Step 6 / Step 8 backfill)

Commit c8ddee45 backfills the evals plan that the structural
refactor shipped without. Per genesis the EVALS GATE blocks any
future skill-body change from shipping without a passing eval
suite.

Files added under .agents/skills/batch-bug-shepherd/evals/:

content/fold-vs-defer-panel.json (+ fixtures) -- Phase X.2
rubric application on a panel CEO ship_with_followups return:
3 follow-ups folded, 1 deferred with scope_boundary_crossed.
content/copilot-classification-and-fold.json (+ fixtures) --
Phase X.0 LEGIT/NOT-LEGIT classification on 5 inline comments:
4 LEGIT folded, 1 NOT-LEGIT dismissed with rationale logged.
content/ci-recovery-lint-bucket.json (+ fixtures) -- Phase X.6
bucket-1 lint recovery via ruff format + push + re-watch.
real-task-refinement.md -- wave-1 vs wave-2 evidence (see
table below) plus the iter-4 fix(opencode): validate-and-warn on incompatible agent frontmatter at install (Phase 1 of #581) #1513 CI-infra rollback as a
positive trace of the new bucket-3 recovery path.

A brief Evals section was added to SKILL.md near the bottom
pointing at evals/ and stating the EVALS GATE.

Wave-1 (v1 SKILL.md) vs Wave-2 (v2 SKILL.md)

Metric	Wave-1	Wave-2
Per-PR follow-up deferrals (median)	6 / 7	0 - 1 / 7
Per-PR follow-ups folded into this PR (median)	0 - 1	5 - 11
Terminal status: ship_with_followups	6 / 7	0 / 7
Terminal status: ship_now / ready-to-merge	1 / 7	6 / 7
Copilot inline comments classified	0 / 7 PRs	7 / 7 PRs
CI recovery iterations triggered	0 (CI ignored)	4 (across 3 PRs)

Eval suite results

python3 .agents/skills/batch-bug-shepherd/scripts/run_evals.py --quiet --no-write

triggers val split: should-fire 1.0, should-not-fire 1.0 (gate

= 0.5 / < 0.5).
content scenarios: 5 / 5 passed; delta_anchors = 5, 7, 7, 7, 8
(gate >= 1).

Why structured-input evals (not live-PR)

This skill takes 30+ minutes per real-PR shepherd-driver run and
composes against network-gated assets (panel, Copilot, CI). A true
live with_skill vs without_skill comparison is infeasible at CI
cadence. The structured-input evals exercise the LOAD-BEARING
decision policy (fold-vs-defer rubric, Copilot classification, CI
bucket routing) rather than the long-running orchestration. The
wave-1 -> wave-2 table above stands as the
real-task-refinement evidence; the structured-input evals stand as
the per-change regression guard.

… mergeability table Refactor the batch-bug-shepherd skill into a single shepherd-driver convergence loop that closes four production gaps surfaced by the in-flight bug-queue sweep: 1. Recommendation-fold loop. Every panel CEO follow-up and Copilot inline review item is run through assets/fold-vs-defer-rubric.md and folded unless it crosses the PR's stated scope. Default is fold; defer is the scope-creep exception with a one-line scope_boundary_crossed note. 2. Copilot PR review address loop. Phase X.0 fetches copilot-pull-request-reviewer[bot] review per assets/copilot-classification-prompt.md, classifies each item LEGIT/NOT-LEGIT, and folds LEGIT into the same iteration. 2-round cap on Copilot fetches. 3. Post-push CI verification loop. gh pr checks --watch after every push, with assets/ci-recovery-checklist.md bucketing failures (lint / test / infra / unknown) under a 3-iteration cap. 4. Orchestrator ownership signal. Assigns the shepherd actor and applies status/shepherding on pickup; the label is cleared on terminal. New asset assets/shepherd-driver-prompt.md replaces the old shepherd-prompt / completion-prompt split. New supporting assets: fold-vs-defer-rubric.md, copilot-classification-prompt.md, ci-recovery-checklist.md, strategic-alignment-prompt.md, conflict-resolution-prompt.md, progress-diagram.md. New references/ directory with mergeability-gate.md and strategic-alignment-gate.md. Genesis design record in design.md. Mergeability status table (new in this commit). Shepherd-driver step X.8 captures a per-PR mergeability snapshot via gh pr view <n> --json mergeable,mergeStateStatus,statusCheckRollup immediately after the last push. The snapshot lands as a one-row table in the PR advisory comment (final-report-template.md PR ADVISORY COMMENT block) and is aggregated by the orchestrator at saga-end into a Mergeability status table in the FINAL REPORT block (PR, head SHA, CEO stance, outer iterations, folds, deferrals, Copilot rounds, CI status, mergeable, mergeStateStatus, notes). verdict-schema.json grows four optional completion-return fields: head_sha, mergeable, merge_state_status, ci_status. Validated on the wave-2 shepherd run that drove PRs #1472, #1512, #1513, #1514, #1515, #1516 to advisory-terminal. PR #1514 hit 4 outer iterations with 11 folds + 1 deferral, exercising the fold-by-default discipline at the cap. CHANGELOG entry under [Unreleased] / Added. Lint notes: this commit touches NO Python (.agents/ skill files are markdown + JSON + CHANGELOG markdown). The only applicable lint gates are the ASCII guard and bash scripts/lint-auth-signals.sh, both silent. ruff / pylint / ruff format are skipped per .apm/instructions/linting.instructions.md scope (src/ tests/ only). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ommendation-fold-loop # Conflicts: # CHANGELOG.md

Copilot

Pull request overview

Refactors the batch-bug-shepherd skill from a two-phase shepherd+completion split into a single shepherd-driver convergence loop, adds new orchestrator-side gates (strategic-alignment, mergeability), and extends the verdict schema and final-report template to carry per-PR mergeability snapshots aggregated into a saga-end status table. All changes are confined to the skill's markdown assets, JSON schema, and a single CHANGELOG.md line.

Changes:

New shepherd-driver-prompt.md (with fold-vs-defer rubric, Copilot classification, CI recovery checklist) replaces deleted shepherd-prompt.md / completion-prompt.md; SKILL.md rewritten around the new four-wave shape and status/shepherding ownership signal.
verdict-schema.json extends completion_return with iteration/Copilot/CI caps, fold/defer arrays, and mergeability fields (head_sha, mergeable, merge_state_status, ci_status); new advisory-with-deferred status added.
New supporting docs: design.md, progress-diagram.md, references/{mergeability,strategic-alignment}-gate.md, conflict-resolution-prompt.md, strategic-alignment-prompt.md, plus final-report-template updates with the saga-end mergeability table.

Show a summary per file

File	Description
`CHANGELOG.md`	Adds one Unreleased/Added entry summarizing the refactor.
`.agents/skills/batch-bug-shepherd/SKILL.md`	Rewritten around the shepherd-driver loop + ownership signaling + fold-by-default invariants.
`.agents/skills/batch-bug-shepherd/design.md`	New genesis design record; counterpart to SKILL.md.
`.agents/skills/batch-bug-shepherd/assets/shepherd-driver-prompt.md`	New unified per-PR convergence loop with steps X.0..X.8.
`.agents/skills/batch-bug-shepherd/assets/fold-vs-defer-rubric.md`	New rubric defining scope-creep as the decision axis.
`.agents/skills/batch-bug-shepherd/assets/copilot-classification-prompt.md`	New LEGIT/NOT-LEGIT classification template for Copilot review.
`.agents/skills/batch-bug-shepherd/assets/ci-recovery-checklist.md`	New post-push watch + 4-bucket recovery loop (cap 3).
`.agents/skills/batch-bug-shepherd/assets/strategic-alignment-prompt.md`	New Phase 1.5 ceo-align spawn body; promises schema validation.
`.agents/skills/batch-bug-shepherd/assets/conflict-resolution-prompt.md`	New Phase 5b spawn body; references a comment block name not present in the template.
`.agents/skills/batch-bug-shepherd/assets/progress-diagram.md`	New operator-visibility mermaid; subgraph labels still use pre-refactor phase names.
`.agents/skills/batch-bug-shepherd/assets/verdict-schema.json`	Extends completion_return + new status enum; missing `strategic_alignment_return` definition the new prompts promise.
`.agents/skills/batch-bug-shepherd/assets/final-report-template.md`	Adds per-PR mergeability row + saga-end aggregated table + folded/deferred sections.
`.agents/skills/batch-bug-shepherd/assets/ground-truth-table.md`	Adds new `shepherd-driver-iter-*` and `advisory-with-deferred` statuses.
`.agents/skills/batch-bug-shepherd/assets/fix-prompt.md`	Notes the hand-off to shepherd-driver and CI checklist on first-push red.
`.agents/skills/batch-bug-shepherd/references/mergeability-gate.md`	New Phase 5 reference procedure (5a probe, 5b fan-out, 5c synthesis).
`.agents/skills/batch-bug-shepherd/references/strategic-alignment-gate.md`	New Phase 1.5 reference procedure with fail-open semantics.
`.agents/skills/batch-bug-shepherd/assets/shepherd-prompt.md`	Deleted (absorbed into shepherd-driver).
`.agents/skills/batch-bug-shepherd/assets/completion-prompt.md`	Deleted (absorbed into shepherd-driver).

Copilot's findings

Files reviewed: 18/18 changed files
Comments generated: 3

+   resolved`. Render from the RESOLUTION CONFIRMATION COMMENT block
+   in `final-report-template.md`. Include:


+    subgraph WAVE2[" "]
+        direction LR
+        P3a["Phase 3a<br/>shepherd<br/>k = <k> PRs in flight"]:::pending
+        P3b["Phase 3b<br/>fix dispatch<br/>m = <m> rows without PR"]:::pending
+    end
+
+    P4["Phase 4<br/>completion<br/>F = <F> PRs needing follow-up"]:::pending
+


+        "panel_final_verdict": {
+          "type": "string",
+          "enum": ["ship_now", "ship_with_followups", "needs_discussion", "needs_rework"],
+          "description": "CEO stance from the final panel pass in this run."
+        },


…evidence Backfills the genesis Step 6 EVALS PLAN and Step 8 EVALS GATE that the structural refactor (PR #1518) shipped without. Per genesis the EVALS GATE blocks any future shipping of skill-body changes. Adds three structured-input content evals that exercise the load- bearing decision policies introduced by the v2 refactor: * fold-vs-defer-panel -- Phase X.2 rubric application on a panel CEO ship_with_followups return (3 fold, 1 defer with scope_boundary_crossed). * copilot-classification-and-fold -- Phase X.0 LEGIT/NOT-LEGIT classification on 5 inline comments (4 LEGIT folded, 1 NOT-LEGIT dismissed with rationale). * ci-recovery-lint-bucket -- Phase X.6 bucket-1 lint recovery via ruff format + push + watch re-entry, cap 3. Each scenario ships with_skill and without_skill fixtures and a regex rubric scored by the existing scripts/run_evals.py runner. All five content scenarios + the trigger val split pass: triggers val: 1.0 should-fire, 1.0 should-not-fire content delta_anchors: 5, 7, 7, 7, 8 (gate: >=1) Adds evals/real-task-refinement.md capturing the wave-1 (v1 SKILL.md) vs wave-2 (v2 SKILL.md) comparison that drove the default-fold + Copilot-first-class + CI-recovery-first-class edits, plus the iter-4 #1513 CI-infra rollback as a positive trace of the new bucket-3 recovery path. Adds an Evals section near the bottom of SKILL.md pointing at evals/ and stating the EVALS GATE for future skill-body changes. ASCII guard clean. auth-signals lint clean. Scope: skill-only; install.ps1 and tests/unit/install/test_windows_shim_template.py remain untracked (they belong to PR #1512, not here). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 27, 2026 20:07

danielmeppiel requested a review from sergio-sisternes-epam as a code owner May 27, 2026 20:07

danielmeppiel force-pushed the feat/bbs-shepherd-recommendation-fold-loop branch from f6f5ecb to 1d3c2d5 Compare May 27, 2026 20:07

Copilot started reviewing on behalf of danielmeppiel May 27, 2026 20:07 View session

danielmeppiel self-assigned this May 27, 2026

Merge remote-tracking branch 'origin/main' into feat/bbs-shepherd-rec…

30a660b

…ommendation-fold-loop # Conflicts: # CHANGELOG.md

Copilot AI reviewed May 27, 2026

View reviewed changes

danielmeppiel merged commit fbf3b06 into main May 27, 2026
19 checks passed

danielmeppiel deleted the feat/bbs-shepherd-recommendation-fold-loop branch May 27, 2026 22:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(batch-bug-shepherd): recommendation-fold loop + Copilot+CI gates + mergeability table#1518

feat(batch-bug-shepherd): recommendation-fold loop + Copilot+CI gates + mergeability table#1518
danielmeppiel merged 3 commits into
mainfrom
feat/bbs-shepherd-recommendation-fold-loop

danielmeppiel commented May 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		resolved`. Render from the RESOLUTION CONFIRMATION COMMENT block
		in `final-report-template.md`. Include:

Conversation

danielmeppiel commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat(batch-bug-shepherd): recommendation-fold loop + Copilot+CI gates + mergeability table

TL;DR

Problem (WHY)

Approach (WHAT)

Implementation (HOW)

Mergeability snapshot mechanics

Diagram

Trade-offs

Validation evidence

Wave-2 shepherd runs

Lint chain

Schema

How to test

Scope discipline

Evals (genesis Step 6 / Step 8 backfill)

Wave-1 (v1 SKILL.md) vs Wave-2 (v2 SKILL.md)

Eval suite results

Why structured-input evals (not live-PR)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielmeppiel commented May 27, 2026 •

edited

Loading