Skip to content

[BUG] never-worked baseline handling is not fully integrated into trusted CI reporting #1057

@Rahul-2k4

Description

@Rahul-2k4

Sample platform commit (found at the bottom of each page) : 66f057a

In raising this issue, I confirm the following (please check boxes, eg [X]):

  • I have read and understood the contributors guide.
  • I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • I have checked that the issue I'm posting isn't already reported.
  • I have checked that the issue I'm posting isn't already solved and no duplicates exist in closed issues and in opened issues
  • I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.

My familiarity with the project is as follows (check one, eg [X]):

  • I have never visited/used the platform.
  • I have used the platform just a couple of times.
  • I have used the platform extensively, but have not contributed previously.
  • I am an active contributor to the platform.

Summary

The recently added BaselineStatus model state still needs end-to-end integration in the CI and reporting flow.

Today the platform only partially understands never_worked:

  • baseline state can exist at the model and schema level, but reporting can still misclassify failures
  • baseline history should only be refreshed from trusted main-repo commit runs, not from unmerged PR or fork runs
  • never worked needs to be evaluated per platform using last_passed_on_linux and last_passed_on_windows, otherwise a test that passed on Linux can be misreported on Windows
  • historical migration backfill should trust known pass history and avoid over-classifying ambiguous old failures

Expected behavior

  • completed main-repo commit runs refresh baseline status using full regression-result logic
  • PR and fork runs do not mutate shared baseline history
  • PR comments distinguish new failures, failures already failing on master, and tests that have never worked on the current platform
  • migration backfill marks tests as established only when trusted pass history exists; otherwise they remain unknown until refreshed by trusted runs

Why this matters

This keeps CI failure signals accurate. Without it, contributors can get misleading PR comments and unmerged runs can pollute the baseline used to classify regressions.

Verification

Focused controller and regression tests cover trusted baseline refresh, no PR baseline mutation, platform-specific never-worked reporting, and existing PR comment behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions