Normalize safe output outcome evaluation and demote existence-only fallback by Copilot · Pull Request #35040 · github/gh-aw

Copilot · 2026-05-26T20:21:25Z

This change introduces a shared normalized outcome model for safe output evaluation and updates the existence-only fallback to report weak unknown evidence instead of accepted. It also propagates the normalized fields through JSONL/telemetry output and adds summary buckets so dashboards can distinguish strong outcomes from fallback-only signals.

Shared outcome model
- Adds OutcomeStatus with: accepted, rejected, pending, ignored, skipped, unknown
- Adds EvidenceStrength with: strong, medium, weak
- Adds shared OutcomeEvaluation fields:
  - outcome_status
  - evidence_strength
  - signal
Fallback semantics
- Changes the generic existence-only evaluator from implicit acceptance to:
  - outcome_status: unknown
  - evidence_strength: weak
  - signal: target_exists_only
- Prevents existence-only fallbacks from inflating accepted metrics
Reporting and telemetry
- Emits normalized fields in outcome JSONL
- Carries normalized fields into OTLP outcome spans
- Keeps legacy result alongside normalized fields for compatibility where needed
Summary / dashboard fields
- Adds evidence-strength acceptance buckets:
  - accepted_strong
  - accepted_medium
  - accepted_weak
- Adds fallback tracking:
  - fallback_exists_only_count
Coverage
- Adds focused Go tests for:
  - normalized outcome derivation
  - existence-only fallback behavior
  - summary exclusion of fallback-only outcomes from accepted counts
  - JSONL emission of normalized fields
- Adds focused JS tests for normalization and OTLP span export fields

Example of the normalized payload shape:

{
  "result": "unknown",
  "outcome_status": "unknown",
  "evidence_strength": "weak",
  "signal": "target_exists_only",
  "detail": "object still exists"
}

pr-sous-chef: updated branch on https://github.com/github/gh-aw/actions/runs/26481536311

Generated by 👨‍🍳 PR Sous Chef · gpt54 17.2M · ◷

pr-sous-chef: branch updated during run https://github.com/github/gh-aw/actions/runs/26484943570

Generated by 👨‍🍳 PR Sous Chef · gpt54 7.1M · ◷

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

Copilot

Pull request overview

This PR introduces a shared, normalized outcome model for safe output evaluation (status + evidence strength + signal), changes the existence-only fallback to report weak/unknown instead of accepted, and propagates the normalized fields through JSONL/telemetry outputs so dashboards can distinguish strong outcomes from fallback-only signals.

Changes:

Add normalized outcome fields (outcome_status, evidence_strength, signal) and derive them from legacy result/detail when needed.
Demote existence-only fallback from accepted to weak/unknown and track it explicitly in summaries.
Emit normalized fields in JSONL and include them in OTLP spans/summary attributes, with tests covering the new behavior.

Show a summary per file

File	Description
pkg/cli/outcome_evaluation.go	Adds normalized outcome model types and normalization logic.
pkg/cli/outcome_eval.go	Embeds normalized fields in reports; updates summary aggregation to use normalized status/strength and adds new buckets.
pkg/cli/outcome_eval_test.go	Adds Go tests for normalization, existence-only fallback behavior, summary bucketing, and JSONL emission.
pkg/cli/outcome_eval_jsonl.go	Emits normalized fields in JSONL and adds write error handling.
pkg/cli/outcome_eval_generic.go	Changes generic existence-only fallback to weak/unknown and makes GH API getter injectable for tests.
actions/setup/js/evaluate_outcomes.cjs	Adds JS normalization helper and propagates normalized fields into JSONL + fleet summary counters.
actions/setup/js/evaluate_outcomes.test.cjs	Adds JS tests for normalization behavior.
actions/setup/js/emit_outcome_spans.cjs	Adds normalized span attributes + status mapping and summary attributes for new buckets.
actions/setup/js/emit_outcome_spans.test.cjs	Updates span export tests for normalized attributes and new summary fields.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 9/9 changed files
Comments generated: 2

github-actions · 2026-05-26T22:43:34Z

Please resolve the unstable merge/check state so Copilot can keep this moving.

pr-sous-chef: https://github.com/github/gh-aw/actions/runs/26478924995

Generated by 👨‍🍳 PR Sous Chef · gpt54 33.6M · ◷

github-actions · 2026-05-27T01:29:07Z

@copilot review all comments and address the unresolved review feedback.

Generated by 👨‍🍳 PR Sous Chef · gpt54 7.1M · ◷

pelikhan · 2026-05-27T11:11:13Z

@copilot resolve the merge conflicts in this pull request

…e-model # Conflicts: # pkg/cli/outcome_eval.go Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-27T11:20:36Z

@copilot resolve the merge conflicts in this pull request

Resolved. The only conflict was in pkg/cli/outcome_eval.go (OutcomeSummary struct — the new bucket fields from this branch conflicted with main's version). Kept all new fields (AcceptedStrong, AcceptedMedium, AcceptedWeak, FallbackExistsOnlyCount, CostPerAcceptedOutcome) and also fixed a stray extra argument in a test call that was left from development. All CLI tests pass.

Initial plan

a4179c3

Copilot AI assigned Copilot and mnkiefer May 26, 2026

Copilot started work on behalf of mnkiefer May 26, 2026 20:21 View session

Copilot AI linked an issue May 26, 2026 that may be closed by this pull request

[plan] Define shared outcome model and fallback semantics for safe output evaluation #35034

Closed

7 tasks

Copilot AI and others added 4 commits May 26, 2026 20:31

feat: normalize outcome evaluation fields

6930925

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

fix: handle outcome JSONL write errors

7918e7e

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

test: cover outcome normalization helpers

74cff65

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

chore: polish outcome normalization validation

e217e0c

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Implement normalized outcome model and fallback semantics~~ Normalize safe output outcome evaluation and demote existence-only fallback May 26, 2026

Copilot finished work on behalf of mnkiefer May 26, 2026 20:45

Copilot AI requested a review from mnkiefer May 26, 2026 20:45

mnkiefer approved these changes May 26, 2026

View reviewed changes

mnkiefer marked this pull request as ready for review May 26, 2026 21:15

Copilot AI review requested due to automatic review settings May 26, 2026 21:15

Copilot started reviewing on behalf of mnkiefer May 26, 2026 21:15 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

Comment thread actions/setup/js/emit_outcome_spans.cjs

Comment thread actions/setup/js/evaluate_outcomes.cjs

Copilot started work on behalf of pelikhan May 27, 2026 11:11 View session

Copilot AI and others added 2 commits May 27, 2026 11:19

Merge remote-tracking branch 'origin/main' into copilot/define-outcom…

995411d

…e-model # Conflicts: # pkg/cli/outcome_eval.go Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

fix: resolve merge conflict with main and fix test argument mismatch

5e19085

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 27, 2026 11:20

Copilot AI requested a review from pelikhan May 27, 2026 11:20

pelikhan reviewed May 27, 2026

View reviewed changes

Comment thread actions/setup/js/emit_outcome_spans.cjs

Copilot started work on behalf of pelikhan May 27, 2026 11:24 View session

Copilot finished work on behalf of pelikhan May 27, 2026 11:31

Copilot AI requested a review from pelikhan May 27, 2026 11:31

pelikhan merged commit 737ae5a into main May 27, 2026
33 checks passed

pelikhan deleted the copilot/define-outcome-model branch May 27, 2026 11:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize safe output outcome evaluation and demote existence-only fallback#35040

Normalize safe output outcome evaluation and demote existence-only fallback#35040
pelikhan merged 7 commits into
mainfrom
copilot/define-outcome-model

Copilot AI commented May 26, 2026 •

edited by github-actions Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

pelikhan commented May 27, 2026

Uh oh!

Copilot AI commented May 27, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Copilot AI commented May 26, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

pelikhan commented May 27, 2026

Uh oh!

Copilot AI commented May 27, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented May 26, 2026 •

edited by github-actions Bot

Loading