Skip to content

Normalize safe output outcome evaluation and demote existence-only fallback#35040

Merged
pelikhan merged 7 commits into
mainfrom
copilot/define-outcome-model
May 27, 2026
Merged

Normalize safe output outcome evaluation and demote existence-only fallback#35040
pelikhan merged 7 commits into
mainfrom
copilot/define-outcome-model

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 26, 2026

This change introduces a shared normalized outcome model for safe output evaluation and updates the existence-only fallback to report weak unknown evidence instead of accepted. It also propagates the normalized fields through JSONL/telemetry output and adds summary buckets so dashboards can distinguish strong outcomes from fallback-only signals.

  • Shared outcome model

    • Adds OutcomeStatus with: accepted, rejected, pending, ignored, skipped, unknown
    • Adds EvidenceStrength with: strong, medium, weak
    • Adds shared OutcomeEvaluation fields:
      • outcome_status
      • evidence_strength
      • signal
  • Fallback semantics

    • Changes the generic existence-only evaluator from implicit acceptance to:
      • outcome_status: unknown
      • evidence_strength: weak
      • signal: target_exists_only
    • Prevents existence-only fallbacks from inflating accepted metrics
  • Reporting and telemetry

    • Emits normalized fields in outcome JSONL
    • Carries normalized fields into OTLP outcome spans
    • Keeps legacy result alongside normalized fields for compatibility where needed
  • Summary / dashboard fields

    • Adds evidence-strength acceptance buckets:
      • accepted_strong
      • accepted_medium
      • accepted_weak
    • Adds fallback tracking:
      • fallback_exists_only_count
  • Coverage

    • Adds focused Go tests for:
      • normalized outcome derivation
      • existence-only fallback behavior
      • summary exclusion of fallback-only outcomes from accepted counts
      • JSONL emission of normalized fields
    • Adds focused JS tests for normalization and OTLP span export fields

Example of the normalized payload shape:

{
  "result": "unknown",
  "outcome_status": "unknown",
  "evidence_strength": "weak",
  "signal": "target_exists_only",
  "detail": "object still exists"
}

pr-sous-chef: updated branch on https://github.com/github/gh-aw/actions/runs/26481536311

Generated by 👨‍🍳 PR Sous Chef · gpt54 17.2M ·


pr-sous-chef: branch updated during run https://github.com/github/gh-aw/actions/runs/26484943570

Generated by 👨‍🍳 PR Sous Chef · gpt54 7.1M ·

Copilot AI and others added 4 commits May 26, 2026 20:31
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement normalized outcome model and fallback semantics Normalize safe output outcome evaluation and demote existence-only fallback May 26, 2026
Copilot AI requested a review from mnkiefer May 26, 2026 20:45
@mnkiefer mnkiefer marked this pull request as ready for review May 26, 2026 21:15
Copilot AI review requested due to automatic review settings May 26, 2026 21:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a shared, normalized outcome model for safe output evaluation (status + evidence strength + signal), changes the existence-only fallback to report weak/unknown instead of accepted, and propagates the normalized fields through JSONL/telemetry outputs so dashboards can distinguish strong outcomes from fallback-only signals.

Changes:

  • Add normalized outcome fields (outcome_status, evidence_strength, signal) and derive them from legacy result/detail when needed.
  • Demote existence-only fallback from accepted to weak/unknown and track it explicitly in summaries.
  • Emit normalized fields in JSONL and include them in OTLP spans/summary attributes, with tests covering the new behavior.
Show a summary per file
File Description
pkg/cli/outcome_evaluation.go Adds normalized outcome model types and normalization logic.
pkg/cli/outcome_eval.go Embeds normalized fields in reports; updates summary aggregation to use normalized status/strength and adds new buckets.
pkg/cli/outcome_eval_test.go Adds Go tests for normalization, existence-only fallback behavior, summary bucketing, and JSONL emission.
pkg/cli/outcome_eval_jsonl.go Emits normalized fields in JSONL and adds write error handling.
pkg/cli/outcome_eval_generic.go Changes generic existence-only fallback to weak/unknown and makes GH API getter injectable for tests.
actions/setup/js/evaluate_outcomes.cjs Adds JS normalization helper and propagates normalized fields into JSONL + fleet summary counters.
actions/setup/js/evaluate_outcomes.test.cjs Adds JS tests for normalization behavior.
actions/setup/js/emit_outcome_spans.cjs Adds normalized span attributes + status mapping and summary attributes for new buckets.
actions/setup/js/emit_outcome_spans.test.cjs Updates span export tests for normalized attributes and new summary fields.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 9/9 changed files
  • Comments generated: 2

Comment thread actions/setup/js/emit_outcome_spans.cjs
Comment thread actions/setup/js/evaluate_outcomes.cjs
@github-actions
Copy link
Copy Markdown
Contributor

Please resolve the unstable merge/check state so Copilot can keep this moving.

pr-sous-chef: https://github.com/github/gh-aw/actions/runs/26478924995

Generated by 👨‍🍳 PR Sous Chef · gpt54 33.6M ·

@github-actions
Copy link
Copy Markdown
Contributor

@copilot review all comments and address the unresolved review feedback.

Generated by 👨‍🍳 PR Sous Chef · gpt54 7.1M ·

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot resolve the merge conflicts in this pull request

Copilot AI and others added 2 commits May 27, 2026 11:19
…e-model

# Conflicts:
#	pkg/cli/outcome_eval.go

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 27, 2026

@copilot resolve the merge conflicts in this pull request

Resolved. The only conflict was in pkg/cli/outcome_eval.go (OutcomeSummary struct — the new bucket fields from this branch conflicted with main's version). Kept all new fields (AcceptedStrong, AcceptedMedium, AcceptedWeak, FallbackExistsOnlyCount, CostPerAcceptedOutcome) and also fixed a stray extra argument in a test call that was left from development. All CLI tests pass.

Copilot AI requested a review from pelikhan May 27, 2026 11:20
Comment thread actions/setup/js/emit_outcome_spans.cjs
@pelikhan pelikhan merged commit 737ae5a into main May 27, 2026
33 checks passed
@pelikhan pelikhan deleted the copilot/define-outcome-model branch May 27, 2026 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[plan] Define shared outcome model and fallback semantics for safe output evaluation

4 participants