Add dedicated outcome evaluators for `create_pull_request` and `push_to_pull_request_branch` by Copilot · Pull Request #35100 · github/gh-aw

Copilot · 2026-05-27T02:32:42Z

This change replaces generic/fallback outcome checks for the two highest-signal safe-output types with type-specific evaluators. The new logic classifies PR creation and PR-branch pushes using post-action GitHub state, review signals, staleness, and commit-retention evidence instead of existence-style acceptance.

create_pull_request: typed outcome classification
- Added a dedicated evaluator path for create_pull_request.
- Classifies:
  - accepted (strong): merged PR
  - accepted (medium): approved PR with no requested changes
  - rejected (strong/medium): closed without merge, with stronger signal when closing labels/comments indicate explicit rejection
  - pending: open PR with no reviews, or open with only CHANGES_REQUESTED
  - ignored: stale open PR (configurable stale threshold)
  - unknown: mixed/ambiguous states or API degradation (including review API failure)
push_to_pull_request_branch: commit-retention based evaluation
- Added a dedicated evaluator path for push_to_pull_request_branch.
- Uses persisted commit metadata (commit_sha, before-head variants) plus branch history comparison to distinguish:
  - retained pushed commits
  - force-pushed/reset-away commits
  - closed-without-merge outcomes
  - pending review states for open PRs
  - ignored for stale open PRs with no review on pushed commits
  - unknown fallback states (including review API failure)
Evaluator architecture and maintainability
- evaluateItem now dispatches by safe-output type to dedicated evaluator functions.
- Introduced API-function injection in evaluation calls for deterministic unit tests (no mutable global test hook).
- Tightened closing-comment keyword matching to reduce false-positive strong rejections.
- Improved SHA matching to correctly handle short-vs-full SHA forms in retention/head comparisons.
- Excluded ambiguous head_sha from pushed-commit extraction to avoid misclassifying retained pushes.
- Added inline semantics comments for compare-status interpretation.
Focused test coverage
- Added actions/setup/js/evaluate_outcomes.test.cjs with branch-complete coverage for both evaluators:
  - accepted/rejected/pending/ignored/unknown classification paths
  - merge/close/review/stale cases
  - pushed-commit retained vs reverted/force-pushed cases
  - mixed review-state handling, review API failure handling, no-commit-SHA fallback, short/full SHA matching, and head_sha ambiguity guard

// Type-dispatch in evaluator entrypoint
if (type === "create_pull_request") {
  return evaluateCreatePullRequestOutcome(item, itemRepo, out, ghAPIFn);
}
if (type === "push_to_pull_request_branch") {
  return evaluatePushToPullRequestBranchOutcome(item, itemRepo, out, ghAPIFn);
}

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>

pelikhan · 2026-05-27T11:38:45Z

@copilot resolve the merge conflicts in this pull request

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-27T11:55:52Z

@copilot resolve the merge conflicts in this pull request

Conflicts are resolved in merge commit cc28ec3.

github-actions · 2026-05-27T12:08:28Z

🧪 Test Quality Sentinel completed test quality analysis.

github-actions · 2026-05-27T12:08:29Z

✅ PR Code Quality Reviewer completed the code quality review.

github-actions · 2026-05-27T12:08:29Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #35100 does not have the 'implementation' label and has 0 new lines of code in default business logic directories (well below the 100-line threshold).

github-actions · 2026-05-27T12:08:30Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

Copilot

Pull request overview

This PR introduces type-specific outcome evaluators for the two highest-signal safe-output item types (create_pull_request and push_to_pull_request_branch) so outcome classification is based on post-action GitHub state (merge/close/reviews/staleness/commit retention) rather than generic existence-style checks.

Changes:

Added evaluator dispatch in evaluateItem() to route create_pull_request and push_to_pull_request_branch items to dedicated evaluators.
Implemented PR-state/review-based outcome logic for create_pull_request and commit-retention-based outcome logic for push_to_pull_request_branch.
Added focused Vitest coverage for the new evaluator branches and introduced API-function injection for deterministic tests.

Show a summary per file

File	Description
actions/setup/js/evaluate_outcomes.cjs	Adds dedicated evaluators + dispatch and helper utilities for PR-number/commit retention/closing-signal evaluation.
actions/setup/js/evaluate_outcomes.test.cjs	Adds unit tests covering the new evaluator classification paths with a mockable GitHub API function.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 6

+
+  if (data.merged === true) {
+    out.result = "accepted";
+    out.detail = "merged (strong)";


+
+  if (data.merged === true) {
+    out.result = "accepted";
+    out.detail = pushedIncluded ? "merged with pushed commit retained (strong)" : "merged";


+  const pushedStillHead = currentHead ? pushedShas.some(sha => sha === currentHead) : false;
+  const pushedIncluded =
+    currentHead && pushedShas.length > 0
+      ? pushedShas.some(sha => {
+          const inHistory = isCommitInBranchHistory(itemRepo, sha, currentHead, ghAPIFn);
+          return inHistory === true;
+        })
+      : false;


+
+  // A strong rejection requires before-head metadata from execution time so we
+  // can distinguish "commit not retained" from "insufficient history context".
+  if (pushedShas.length > 0 && !pushedIncluded && beforeHead) {


+ */
+function resolvePRNumber(item) {
+  if (typeof item.number === "number" && item.number > 0) return item.number;
+  const candidates = [item.pull_request_number, item.pr_number, item.item_number];


+
+    const result = evaluateItem({ type: "create_pull_request", repo: "owner/repo", url: "https://github.com/owner/repo/pull/12", timestamp: "2026-05-20T00:00:00Z" }, "owner/repo", { ghAPI });
+    expect(result.result).toBe("accepted");
+    expect(result.detail).toContain("strong");


pelikhan · 2026-05-27T12:13:24Z

@copilot fix https://github.com/github/gh-aw/actions/runs/26509978987/job/78071970079?pr=35100
review all comments

github-actions · 2026-05-27T12:17:27Z

🧪 Test Quality Sentinel Report

✅ Test Quality Score: 100/100 — Excellent

Analyzed 13 test(s): 13 design, 0 implementation, 0 guideline violation(s).

📊 Metrics & Test Classification (13 tests analyzed)

Metric	Value
New/modified tests analyzed	13
✅ Design tests (behavioral contracts)	13 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	13 (100%)
Duplicate test clusters	0
Test inflation detected	No (test +251 lines vs production +333 lines, ratio 0.75)
🚨 Coding-guideline violations	0

Test Classification Details

Test	File	Classification	Issues Detected
`classifies merged PR as accepted (strong)`	`evaluate_outcomes.test.cjs`	✅ Design	None
`classifies approved open PR as accepted (medium)`	`evaluate_outcomes.test.cjs`	✅ Design	None
`classifies closed-with-label PR as rejected (strong)`	`evaluate_outcomes.test.cjs`	✅ Design	None
`classifies closed-without-merge PR as rejected`	`evaluate_outcomes.test.cjs`	✅ Design	None
`classifies open PR with no reviews as pending`	`evaluate_outcomes.test.cjs`	✅ Design	None
`classifies stale open PR with no reviews as ignored`	`evaluate_outcomes.test.cjs`	✅ Design	None
`classifies open PR with mixed review state as unknown`	`evaluate_outcomes.test.cjs`	✅ Design	None
`classifies merged PR with pushed commit included as accepted (strong)`	`evaluate_outcomes.test.cjs`	✅ Design	None
`classifies open PR with pushed commit still at HEAD as accepted (medium)`	`evaluate_outcomes.test.cjs`	✅ Design	None
`classifies force-pushed-away commits as rejected (strong)`	`evaluate_outcomes.test.cjs`	✅ Design	None
`classifies closed-without-merge PR as rejected (medium)`	`evaluate_outcomes.test.cjs`	✅ Design	None
`classifies open PR with no reviews on pushed commits as pending`	`evaluate_outcomes.test.cjs`	✅ Design	None
`falls back to unknown when pushed commits are reviewed but not head/merged`	`evaluate_outcomes.test.cjs`	✅ Design	None

Language Support

Tests analyzed:

🐹 Go (*_test.go): 0 tests
🟨 JavaScript (*.test.cjs): 13 tests (vitest)

Verdict

✅ Check passed. 0% of new tests are implementation tests (threshold: 30%). All 13 tests verify observable behavioral contracts of evaluateItem() across the full outcome state machine (accepted/rejected/pending/ignored/unknown). The mockAPI helper correctly targets only external GitHub API calls — a legitimate mock boundary. Stale-detection tests appropriately use vi.useFakeTimers(). Test-to-production line ratio is 0.75 (well under the 2:1 inflation threshold).

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · sonnet46 1.2M · ◷

github-actions

✅ Test Quality Sentinel: 100/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%).

github-actions

Skills-Based Review 🧠

Applied /tdd, /zoom-out, and /grill-with-docs — no blocking correctness bugs found, but several targeted improvements worth addressing.

📋 Key Themes & Highlights

Key Themes

CLOSING_COMMENT_KEYWORDS over-breadth: "won't" and "closing" as bare substrings will produce false-positive rejected classifications on perfectly ordinary review comments. Worth tightening before this evaluator is in production.
Unnecessary reviews API call for closed PRs: evaluateCreatePullRequestOutcome fetches reviews unconditionally for all non-merged PRs, but only uses the result in the open branch. Easy fix: move the fetch inside the open guard.
Test precision: The "mixed review state" test actually only tests CHANGES_REQUESTED alone — the true APPROVED+CHANGES_REQUESTED mixed case is untested and has different semantics worth pinning.
Architectural asymmetry: push_to_pull_request_branch has no ignored/stale path while create_pull_request does. Either add it or document why pushes are intentionally exempt from stale degradation.
Missing degraded-metadata test: Push items without any commit SHA field are an important real-world failure mode (the agent failed to record metadata) and the code path is uncovered.

Positive Highlights

✅ API-injection pattern (options.ghAPI) is clean and makes the new evaluators deterministically testable without mutable global hooks — great design decision.
✅ resolvePRNumber covers multiple field-name conventions (number, pull_request_number, pr_number, URL fallback) — robust against different safe-output shapes.
✅ isCommitInBranchHistory semantics are correct and the inline comments explaining base...head compare status are helpful.
✅ Both evaluators are exported, making them independently callable in integration tests if needed.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.9M

github-actions · 2026-05-27T12:18:10Z

@@ -41,6 +41,8 @@ const SUMMARY_PATH = "/tmp/gh-aw/outcome-summary.json";
 // Noop types that are tracked but not counted as actionable
 // ---------------------------------------------------------------------------
 const NOOP_TYPES = new Set(["noop", "missing_tool", "missing_data", "report_incomplete"]);


[/grill-with-docs] CLOSING_COMMENT_KEYWORDS contains two over-broad entries — "won't" and "closing" — that will produce false-positive rejections on ordinary review comments.

💡 Why this matters and suggested tightening

Substring matching means:

"won't" matches "This won't break anything", "I won't be available to review", etc.

"closing" matches "closing price", "This is a closing brace", and even common polite endings like "We're closing in on a solution".

Both entries are much looser than the corresponding entries in CLOSING_LABEL_KEYWORDS, which at least requires full label-name matches.

Consider replacing them with more specific patterns:

const CLOSING_COMMENT_KEYWORDS = [ "not planned", "won't fix", "wontfix", "duplicate", "invalid", "declined", "rejected", "closing as", // "closing as won't fix", "closing as duplicate" "closing this", // "closing this PR" ];

Or switch to whole-phrase matching rather than substring inclusion.

github-actions · 2026-05-27T12:18:11Z

+    if (out.review_comments === 0 && out.comments === 0) {
+      out.zero_touch = true;
+    }
+    return out;


[/tdd] The reviews API call is made unconditionally for all non-merged PRs, but the result is only consumed in the open branch — so every closed (non-merged) PR evaluation incurs a wasted network round-trip.

💡 Suggested fix

Move the reviews fetch inside the open state block:

if (data.state === "closed") { const closingSignal = hasClosingSignal(itemRepo, num, data, ghAPIFn); out.result = "rejected"; out.detail = closingSignal ? "closed without merge (strong)" : "closed without merge"; if (data.created_at && data.closed_at) { out.resolution_sec = secondsBetween(data.created_at, data.closed_at); } return out; } // Only reached for open PRs — fetch reviews here const reviewsRaw = ghAPIFn(`repos/${itemRepo}/pulls/${num}/reviews`); ...

This also makes the data-dependency structure clearer to future readers.

github-actions · 2026-05-27T12:18:11Z

+
+  if (data.merged === true) {
+    out.result = "accepted";
+    out.detail = pushedIncluded ? "merged with pushed commit retained (strong)" : "merged";


[/zoom-out] push_to_pull_request_branch never classifies as ignored/stale, while create_pull_request does. This asymmetry means a push whose PR sits untouched for months is classified pending indefinitely instead of eventually degrading to ignored.

💡 Suggestion

If the asymmetry is intentional (e.g. pushes are considered actionable regardless of staleness), document it with a brief comment here so future maintainers don't wonder if it was accidentally omitted.

If it should mirror create_pull_request, add the same stale-threshold check before returning pending:

if (!hasReviewOnPushedCommit) { setPendingAge(out, timestamp); if (isStalePending(out.pending_age_sec)) { out.result = "ignored"; out.detail = "open and stale with no review on pushed commits"; } else { out.result = "pending"; out.detail = "open with no review on pushed commits"; } return out; }

github-actions · 2026-05-27T12:18:11Z

+  it("classifies closed-without-merge PR as rejected (medium)", () => {
+    const ghAPI = mockAPI({
+      "repos/owner/repo/pulls/24": { state: "closed", merged: false, head: { sha: "bbbbbbb" } },
+      "repos/owner/repo/compare/aaaaaaa...bbbbbbb": { status: "ahead" },


[/tdd] The test is named "classifies open PR with mixed review state as unknown" but the mock only has a single CHANGES_REQUESTED review — that's not a truly mixed state. A real mixed state (APPROVED + CHANGES_REQUESTED simultaneously) is untested.

💡 Suggested additional test case

it("classifies open PR with both APPROVED and CHANGES_REQUESTED reviews as unknown", () => { const ghAPI = mockAPI({ "repos/owner/repo/pulls/19": { state: "open", merged: false }, "repos/owner/repo/pulls/19/reviews": [ { state: "APPROVED" }, { state: "CHANGES_REQUESTED" }, ], }); const result = evaluateItem( { type: "create_pull_request", repo: "owner/repo", url: "https://github.com/owner/repo/pull/19", timestamp: "2026-05-20T00:00:00Z" }, "owner/repo", { ghAPI } ); expect(result.result).toBe("unknown"); expect(result.detail).toContain("mixed"); });

Also consider renaming the existing test to "classifies open PR with only CHANGES_REQUESTED as unknown" to reflect what it actually tests.

github-actions · 2026-05-27T12:18:11Z

+    );
+    expect(result.result).toBe("unknown");
+  });
+});


[/tdd] Missing test: push_to_pull_request_branch item with no commit SHA fields at all. In that case extractPushedCommitSHAs returns [], pushedStillHead is false, pushedIncluded is false, and the force-push rejection guard (pushedShas.length > 0) skips, landing in the reviews path. This whole degraded-metadata code path has no coverage.

💡 Suggested test

it("falls back to pending when push item has no commit SHA", () => { const ghAPI = mockAPI({ "repos/owner/repo/pulls/27": { state: "open", merged: false, head: { sha: "bbbbbbb" } }, "repos/owner/repo/pulls/27/reviews": [], }); const result = evaluateItem( { type: "push_to_pull_request_branch", repo: "owner/repo", pull_request_number: 27, // no commit_sha, no before_head_sha timestamp: "2026-05-20T00:00:00Z", }, "owner/repo", { ghAPI } ); expect(result.result).toBe("pending"); });

github-actions

Three blocking issues found

🔎 Findings summary

1. SHA length mismatch silently degrades `pushedStillHead` in production (high)

normalizeCommitSHA does not pad short SHAs to 40 chars. GitHub API always returns full 40-char SHAs for head.sha; workflow items often store 7-char short SHAs. The === comparison in pushedStillHead is always false in production, silently bypassing the fast path. Tests don't catch this because all test SHAs are uniformly 7 chars.

2. `"closing"` keyword in `CLOSING_COMMENT_KEYWORDS` causes false-positive strong rejections (medium)

Any comment body containing the word "closing" (e.g. auto-generated summaries, PR descriptions with "closing remarks", words like "disclosing") will trigger a "strong" rejection classification. Replace with more specific phrases like "closing as" or "closed as".

3. Open PR with only `CHANGES_REQUESTED` classified as `unknown` instead of `pending` (medium)

The open branch in evaluateCreatePullRequestOutcome only handles approved+no-CR and no-reviews. A PR with only CHANGES_REQUESTED falls to unknown, discarding the reviewer's explicit signal.

4. `isCommitInBranchHistory` `behind` semantics and misleading comment (low)

The inline comment conflates behind and diverged. For behind, HEAD is behind the pushed commit (branch was reset), not the same as "commit not retained" from a rebase. The false return is a safe-default but the comment needs clarification.

🔎 Code quality review by PR Code Quality Reviewer · sonnet46 3.1M

github-actions · 2026-05-27T12:18:49Z

 // Noop types that are tracked but not counted as actionable
 // ---------------------------------------------------------------------------
 const NOOP_TYPES = new Set(["noop", "missing_tool", "missing_data", "report_incomplete"]);
+const CLOSING_LABEL_KEYWORDS = ["not planned", "not_planned", "wontfix", "won't fix", "duplicate", "invalid", "declined", "rejected"];


Overly broad "closing" keyword causes false positives in rejection classification: any comment body containing the word "closing" (e.g., "closing remarks", "disclosing", "I am closing this PR for now to revisit later") will count as a closing signal and upgrade a rejection to "strong".

💡 Suggested fix

Replace the vague "closing" keyword with a more intentional phrase:

const CLOSING_COMMENT_KEYWORDS = [ "not planned", "wontfix", "duplicate", "invalid", "declin", "reject", "closing as", "closed as", // ← instead of bare "closing" ];

A standalone "closing" substring is far too permissive. Any English sentence that uses the word will trip it, including auto-generated bot summaries or release notes that happen to contain the word.

github-actions · 2026-05-27T12:18:49Z

+  if (data.state === "closed") {
+    out.result = "rejected";
+    out.detail = "closed without merge";
+    if (data.created_at && data.closed_at) {


SHA length mismatch silently breaks pushedStillHead in production: normalizeCommitSHA keeps the original length (7–40 hex chars) — it does not pad short SHAs to 40 chars. In production, data.head.sha from the GitHub API is always the full 40-char SHA, while the workflow item's commit_sha is often a 7-char short SHA stored at push time. The === comparison therefore always returns false and pushedStillHead is always false, silently falling through to the more expensive compare-API path (which may still resolve correctly via GitHub's server-side short-SHA expansion, but only by accident).

💡 Suggested fix

Normalize all SHAs to a common prefix length before comparison, or use startsWith / includes to handle short-vs-full mismatches:

function shaMatches(a, b) { if (!a || !b) return false; const short = a.length < b.length ? a : b; const long = a.length < b.length ? b : a; // short SHAs are at least 7 chars; treat as matching if long starts with short return short.length >= 7 && long.startsWith(short); }

Then replace:

const pushedStillHead = currentHead ? pushedShas.some(sha => sha === currentHead) : false;

with:

const pushedStillHead = currentHead ? pushedShas.some(sha => shaMatches(sha, currentHead)) : false;

The same fix applies to the commitSHA === branchHeadSHA early return in isCommitInBranchHistory.

Note: the test suite avoids this entirely because all test SHAs are uniformly 7 chars ("aaaaaaa", "bbbbbbb"), so the regression is invisible in tests.

github-actions · 2026-05-27T12:18:49Z

+  const compareData = ghAPIFn(`repos/${repo}/compare/${commitSHA}...${branchHeadSHA}`);
+  if (!compareData || typeof compareData.status !== "string") return null;
+  const status = compareData.status.toLowerCase();
+  // compare base...head semantics:


isCommitInBranchHistory compare-endpoint direction comment is misleading and the semantics are subtly inverted for behind: the comment says "behind/diverged => base commit is not retained" which is accurate for diverged but for behind it means the branch HEAD is behind the pushed commit — i.e., the pushed commit is actually ahead of (newer than) the current HEAD, which is a different situation from "not retained". A rebased branch where the pushed commit was dropped could appear as either diverged or behind depending on the rebase shape, making false an under-precise return for behind.

💡 Details

GitHub compare base...head (where base = pushed commit, head = current branch HEAD):

identical — pushed commit is the current head ✓

ahead — HEAD is ahead of pushed commit; pushed commit is an ancestor of HEAD ✓

behind — HEAD is behind the pushed commit; the pushed commit is newer/further than HEAD — this doesn't mean the commit was "dropped", it means the branch hasn't caught up yet. Could happen if HEAD was reset to before the push.

diverged — branches have diverged; pushed commit is definitively not in HEAD ancestry ✓

For the behind case, returning false ("not in history") is probably the correct safe-default, but the inline comment says "not retained on the evaluated branch tip" which skips this nuance. At minimum, treat behind and diverged distinctly in the comment, or consider returning null for behind to avoid a false-negative rejection classification.

github-actions · 2026-05-27T12:18:49Z

+  if (data.state === "open") {
+    if (hasApproved && !hasChangesRequested) {
+      out.result = "accepted";
+      out.detail = "approved without requested changes";


An open PR with only CHANGES_REQUESTED reviews falls through to unknown, losing signal: the evaluation for open PRs checks hasApproved && !hasChangesRequested (accepted) and reviews.length === 0 (pending/ignored). Any other review state — including a PR that has only CHANGES_REQUESTED — hits the final unknown branch. A PR where the reviewer explicitly asked for changes is actionable signal (more like rejected-pending-fix than unknown), and classifying it as unknown discards that.

💡 Suggested fix

if (hasChangesRequested && !hasApproved) { out.result = "pending"; out.detail = "open with changes requested"; setPendingAge(out, timestamp); return out; } // existing unknown fallback below out.result = "unknown"; out.detail = "open with mixed review state";

This keeps the semantic accurate: the output was not accepted, but it is still alive and actionable, just waiting on a fix cycle.

github-actions · 2026-05-27T12:18:49Z

+  }
+
+  const reviewsRaw = ghAPIFn(`repos/${itemRepo}/pulls/${num}/reviews`);
+  const reviews = Array.isArray(reviewsRaw) ? reviewsRaw : [];


Reviews API null return silently coerces to [], causing wrong pending/ignored classification on API failure: Array.isArray(null) is false, so reviewsRaw null → reviews = [] → reviews.length === 0 → the PR is classified as pending or ignored instead of surfacing the API error. A transient reviews API failure permanently misfires the outcome.

💡 Suggested fix

const reviewsRaw = ghAPIFn(`repos/${itemRepo}/pulls/${num}/reviews`); if (reviewsRaw === null) { out.result = "unknown"; out.detail = "reviews api error"; setPendingAge(out, timestamp); return out; } const reviews = Array.isArray(reviewsRaw) ? reviewsRaw : [];

Same pattern applies in evaluatePushToPullRequestBranchOutcome (the reviews fetch near the end of that function has the same null-coercion bug).

github-actions · 2026-05-27T12:18:49Z

+
+  // A strong rejection requires before-head metadata from execution time so we
+  // can distinguish "commit not retained" from "insufficient history context".
+  if (pushedShas.length > 0 && !pushedIncluded && beforeHead) {


Compare API failure (null return) produces a false rejected / force-pushed outcome: isCommitInBranchHistory returns null when the API call fails or the status is unrecognised. In pushedIncluded, null === true is false, so pushedIncluded = false. When beforeHead is also set, the condition pushedShas.length > 0 && !pushedIncluded && beforeHead evaluates to true and the item is classified as rejected (force-pushed away) — a hard negative outcome caused entirely by a transient API error.

💡 Suggested fix

Track whether history lookup was indeterminate, and gate the rejection on an explicit false rather than "not true":

let pushedIncluded = false; let historyCheckIndeterminate = false; if (currentHead && pushedShas.length > 0) { for (const sha of pushedShas) { const inHistory = isCommitInBranchHistory(itemRepo, sha, currentHead, ghAPIFn); if (inHistory === true) { pushedIncluded = true; break; } if (inHistory === null) { historyCheckIndeterminate = true; } } } // later: if (pushedShas.length > 0 && !pushedIncluded && beforeHead) { if (historyCheckIndeterminate) { out.result = "unknown"; out.detail = "commit history check indeterminate (api error)"; setPendingAge(out, timestamp); return out; } out.result = "rejected"; out.detail = "pushed commits were force-pushed away or branch reset"; return out; }

github-actions · 2026-05-27T12:18:49Z

+function extractPushedCommitSHAs(item) {
+  /** @type {string[]} */
+  const shas = [];
+  const candidates = [item.commit_sha, item.pushed_commit_sha, item.head_sha, item?.metadata?.commit_sha, item?.metadata?.pushed_commit_sha];


item.head_sha is ambiguous and should not be used as a pushed-commit SHA: head_sha typically denotes the current branch tip at observation time, not a commit that was actually pushed in this operation. Including it in extractPushedCommitSHAs means pushedStillHead will match trivially whenever item.head_sha equals the current data.head.sha — even if the actual pushed commits were completely replaced by a force-push that left the branch tip at the same SHA by coincidence.

💡 Suggested fix

Remove item.head_sha from the scalar candidates:

const candidates = [ item.commit_sha, item.pushed_commit_sha, // item.head_sha ← removed: ambiguous, not a pushed-commit identifier item?.metadata?.commit_sha, item?.metadata?.pushed_commit_sha, ];

If a caller genuinely stores the pushed commit SHA in head_sha, they should rename the field to commit_sha or pushed_commit_sha to be unambiguous.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-27T12:28:20Z

@copilot fix https://github.com/github/gh-aw/actions/runs/26509978987/job/78071970079?pr=35100
review all comments

Addressed and pushed in commits 3422c05 and fe07a0b.

I reviewed and fixed the evaluator review threads:

merged detail normalization now preserves strong accepted evidence
commit-retention unknown vs definitive-false is handled distinctly before force-push rejection
force-push rejection now requires definitive non-retention evidence
PR number aliases now include pr and pull_number
tests expanded for normalized strength, unknown-retention behavior, and alias resolution

I also reproduced the linked CI failure set locally; the previously timed-out safe_outputs_handlers tests pass in focused runs.

done

pelikhan · 2026-05-27T12:30:49Z

@copilot review all pull request review comments from the github-actions user

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-27T12:46:49Z

@copilot review all pull request review comments from the github-actions user

Reviewed all github-actions review comments and addressed the actionable ones in commits f4c432e, 9db9231, and 62ee25f.

Initial plan

79f6523

Copilot AI assigned Copilot and gh-aw-bot May 27, 2026

Copilot started work on behalf of gh-aw-bot May 27, 2026 02:32 View session

Copilot AI linked an issue May 27, 2026 that may be closed by this pull request

[plan] Implement dedicated outcome evaluators for create_pull_request and push_to_pull_request_branch #35037

Open

7 tasks

Copilot AI and others added 4 commits May 27, 2026 02:46

Add dedicated PR and push outcome evaluators

8ff5141

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>

Format evaluator helper arrays

9aa84f8

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>

Inject API client into outcome evaluator tests

c253b0a

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>

Add missing evaluator JSDoc parameter types

2a3a0c3

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Implement dedicated outcome evaluators for create_pull_request and push_to_pull_request_branch~~ Add dedicated outcome evaluators for create_pull_request and push_to_pull_request_branch May 27, 2026

Copilot finished work on behalf of gh-aw-bot May 27, 2026 03:00

Copilot AI requested a review from gh-aw-bot May 27, 2026 03:00

Copilot started work on behalf of pelikhan May 27, 2026 11:38 View session

Merge origin/main and resolve evaluator conflicts

cc28ec3

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 27, 2026 11:56

Copilot AI requested a review from pelikhan May 27, 2026 11:56

pelikhan marked this pull request as ready for review May 27, 2026 12:04

Copilot AI review requested due to automatic review settings May 27, 2026 12:04

Copilot started reviewing on behalf of pelikhan May 27, 2026 12:04 View session

Copilot AI reviewed May 27, 2026

View reviewed changes

Copilot started work on behalf of pelikhan May 27, 2026 12:13 View session

github-actions Bot mentioned this pull request May 27, 2026

[aw] No-Op Runs #34919

Open

github-actions Bot approved these changes May 27, 2026

View reviewed changes

github-actions Bot reviewed May 27, 2026

View reviewed changes

github-actions Bot previously requested changes May 27, 2026

View reviewed changes

Copilot AI and others added 2 commits May 27, 2026 12:26

Fix evaluator review feedback and add edge-case tests

3422c05

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Clarify commit-retention variable names

fe07a0b

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 27, 2026 12:28

Merge branch 'main' into copilot/implement-outcome-evaluators

85dc98e

Copilot started work on behalf of pelikhan May 27, 2026 12:31 View session

Copilot AI and others added 3 commits May 27, 2026 12:40

Address github-actions evaluator review feedback

f4c432e

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Tighten closing keywords and document SHA matching

9db9231

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Document ambiguous head SHA exclusion

62ee25f

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 27, 2026 12:48

Merge branch 'main' into copilot/implement-outcome-evaluators

f9c2826

Conversation

Copilot AI commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pelikhan commented May 27, 2026

Uh oh!

Copilot AI commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

pelikhan commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

🧪 Test Quality Sentinel Report

Test Classification Details

Language Support

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

github-actions Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Three blocking issues found

1. SHA length mismatch silently degrades pushedStillHead in production (high)

2. "closing" keyword in CLOSING_COMMENT_KEYWORDS causes false-positive strong rejections (medium)

3. Open PR with only CHANGES_REQUESTED classified as unknown instead of pending (medium)

4. isCommitInBranchHistory behind semantics and misleading comment (low)

Uh oh!

github-actions Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

1. SHA length mismatch silently degrades `pushedStillHead` in production (high)

2. `"closing"` keyword in `CLOSING_COMMENT_KEYWORDS` causes false-positive strong rejections (medium)

3. Open PR with only `CHANGES_REQUESTED` classified as `unknown` instead of `pending` (medium)

4. `isCommitInBranchHistory` `behind` semantics and misleading comment (low)