Normalize safe output outcome evaluation and demote existence-only fallback#35040
Conversation
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a shared, normalized outcome model for safe output evaluation (status + evidence strength + signal), changes the existence-only fallback to report weak/unknown instead of accepted, and propagates the normalized fields through JSONL/telemetry outputs so dashboards can distinguish strong outcomes from fallback-only signals.
Changes:
- Add normalized outcome fields (
outcome_status,evidence_strength,signal) and derive them from legacyresult/detailwhen needed. - Demote existence-only fallback from accepted to weak/unknown and track it explicitly in summaries.
- Emit normalized fields in JSONL and include them in OTLP spans/summary attributes, with tests covering the new behavior.
Show a summary per file
| File | Description |
|---|---|
| pkg/cli/outcome_evaluation.go | Adds normalized outcome model types and normalization logic. |
| pkg/cli/outcome_eval.go | Embeds normalized fields in reports; updates summary aggregation to use normalized status/strength and adds new buckets. |
| pkg/cli/outcome_eval_test.go | Adds Go tests for normalization, existence-only fallback behavior, summary bucketing, and JSONL emission. |
| pkg/cli/outcome_eval_jsonl.go | Emits normalized fields in JSONL and adds write error handling. |
| pkg/cli/outcome_eval_generic.go | Changes generic existence-only fallback to weak/unknown and makes GH API getter injectable for tests. |
| actions/setup/js/evaluate_outcomes.cjs | Adds JS normalization helper and propagates normalized fields into JSONL + fleet summary counters. |
| actions/setup/js/evaluate_outcomes.test.cjs | Adds JS tests for normalization behavior. |
| actions/setup/js/emit_outcome_spans.cjs | Adds normalized span attributes + status mapping and summary attributes for new buckets. |
| actions/setup/js/emit_outcome_spans.test.cjs | Updates span export tests for normalized attributes and new summary fields. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 9/9 changed files
- Comments generated: 2
|
Please resolve the unstable merge/check state so Copilot can keep this moving. pr-sous-chef: https://github.com/github/gh-aw/actions/runs/26478924995
|
|
|
|
@copilot resolve the merge conflicts in this pull request |
…e-model # Conflicts: # pkg/cli/outcome_eval.go Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Resolved. The only conflict was in |
This change introduces a shared normalized outcome model for safe output evaluation and updates the existence-only fallback to report weak unknown evidence instead of accepted. It also propagates the normalized fields through JSONL/telemetry output and adds summary buckets so dashboards can distinguish strong outcomes from fallback-only signals.
Shared outcome model
OutcomeStatuswith:accepted,rejected,pending,ignored,skipped,unknownEvidenceStrengthwith:strong,medium,weakOutcomeEvaluationfields:outcome_statusevidence_strengthsignalFallback semantics
outcome_status: unknownevidence_strength: weaksignal: target_exists_onlyReporting and telemetry
resultalongside normalized fields for compatibility where neededSummary / dashboard fields
accepted_strongaccepted_mediumaccepted_weakfallback_exists_only_countCoverage
Example of the normalized payload shape:
{ "result": "unknown", "outcome_status": "unknown", "evidence_strength": "weak", "signal": "target_exists_only", "detail": "object still exists" }pr-sous-chef: updated branch on https://github.com/github/gh-aw/actions/runs/26481536311
pr-sous-chef: branch updated during run https://github.com/github/gh-aw/actions/runs/26484943570