Skip to content

[ci-failure-scan] Tighten KBE filing rules and PR-search coverage#127961

Open
kotlarmilos wants to merge 7 commits intodotnet:mainfrom
kotlarmilos:kotlarmilos/ci-failure-scan-tighten
Open

[ci-failure-scan] Tighten KBE filing rules and PR-search coverage#127961
kotlarmilos wants to merge 7 commits intodotnet:mainfrom
kotlarmilos:kotlarmilos/ci-failure-scan-tighten

Conversation

@kotlarmilos
Copy link
Copy Markdown
Member

@kotlarmilos kotlarmilos commented May 8, 2026

Description

Refines the ci-failure-scan agentic-workflow prompt to address failure modes seen in past runs:

  • Generic ErrorMessage signatures (bare test names, exception types, or truncated prefixes) that match [PASS] / [SKIP] lines for the same test and turn the KBE into a false-positive matcher against passing builds.
  • Malformed JSON fences (4-backtick opens, mismatched fence lengths, multiple skeletons in one body) that cause Build Analysis to silently skip the issue.
  • Issues filed under the Known Build Error label with no JSON block at all, so nothing matches future failures.
  • Muting PRs that link to a non-existent issue number or duplicate an in-flight fix authored by a maintainer (because the search was scoped too narrowly to [ci-scan] PRs).
  • Muting PRs opened against issues whose area owners had already provided a PR
  • Wrong-KBE links where the candidate KBE matched only on test name but was filed against a different architecture / failure signature.

Changes

The body of .github/workflows/ci-failure-scan.md is reorganized into a clear walk-through:

  1. Two-pass KBE → PR flow is now a numbered six-step pre-flight walk (existing KBE / area tracker / muting PR / in-flight fix PR / issue resolves / mute is welcome) followed by an explicit action selection (file KBE, open muting PR, optionally open fix PR).
  2. KBE-match verification — four questions (test, signature, OS, architecture) the agent must answer before linking an existing KBE. Wrong answers mean filing a fresh KBE rather than reusing the wrong one.
  3. Body checks — eight explicit checks on the issue body covering the JSON fence, exact ```json opening, single-line/no-escapes signature, and a negative-match test against [PASS] / `[SKIP]` and build-time output.
  4. Bad → Good examples for both signature shape (bare test name, truncated prefix, bare exception type) and platform/csproj scope (linux-arm-only, single-arch NativeAOT, single stress mode).
  5. Coverage-discipline section trimmed to its unique contribution (pipeline ordering, per-pipeline tally, run summary). Redundant Submit section removed — its content is now covered by the numbered walk.

Test run results

Workflow run 25570821336 was dispatched against this branch (commit 3cd6399dd70, pre-Copilot-fixup) and completed successfully (~28 min). Outputs:

# Type Title Linked tracker
#127963 PR (draft) [ci-scan] Skip AsyncProfilerTests on Android and tvOS #127951
#127964 PR (draft) [ci-scan] Skip System.Net.Sockets IPv6 tests on Android #127565
#127965 Issue (KBE) [ci-scan] Known Build Error: System.Net.NameResolution DnsGetHostAddresses_LocalhostSubdomainWithTrailingDot fails on Android new
#127966 Issue (regression) [ci-scan] Test failure: XslCompiledTransformApiTests (82 tests) on all NativeAOT legs — PlatformNotSupportedException (Reflection.Emit) new
#127967 PR (draft) [ci-scan] Exclude Vector3Interop from GC stress #127827

Refines the ci-failure-scan agentic-workflow prompt to address failure
modes seen in past runs:

- Generic ErrorMessage signatures (bare test names, exception types, or
  truncated prefixes) that match [PASS]/[SKIP] lines for the same test
  and turn the KBE into a false-positive matcher against passing builds.
- Malformed JSON fences (4-backtick opens, mismatched fence lengths,
  multiple skeletons in one body) that cause Build Analysis to silently
  skip the issue.
- Issues filed under the Known Build Error label with no JSON block at
  all, so nothing matches future failures.
- Muting PRs that link to a non-existent issue number or duplicate an
  in-flight fix authored by a maintainer (because the search was scoped
  too narrowly to [ci-scan] PRs).
- Muting PRs opened against issues whose area owners had already
  objected to disabling the test.
- Wrong-KBE links where the candidate KBE matched only on test name but
  was filed against a different architecture / failure signature.

Reorganizes the prompt body into:

1. A numbered six-step pre-flight walk (existing KBE / area tracker /
   muting PR / in-flight fix PR / issue resolves / mute is welcome)
   followed by an explicit action selection.
2. A four-question KBE-match verification (test, signature, OS, arch).
3. Eight explicit body checks covering the JSON fence, exact opening,
   single-line/no-escape signature, and a negative-match test against
   [PASS]/[SKIP] and build-time output.
4. Bad/Good examples for signature shape and platform/csproj scope.

Coverage-discipline section trimmed to its unique contribution
(pipeline ordering, per-pipeline tally, run summary). Redundant
Submit section removed.

Only .github/workflows/ci-failure-scan.md is touched.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the ci-failure-scan agent workflow prompt to tighten the “KBE then muting PR” decision flow and to reduce common false-positive / malformed-KBE failure modes (over-broad ErrorMessage, broken JSON fences, and too-narrow PR search).

Changes:

  • Replaces the previous per-failure instructions with an explicit six-step preflight walk (existing KBE, tracker, muting PR, in-flight fix PR, issue existence check, do-not-mute check) plus an action selection section.
  • Strengthens Known Build Error body requirements with a literal ```json example, an 8-point pre-submit checklist, and “Bad → Good” signature guidance.
  • Trims/rewrites the coverage discipline section to emphasize systematic per-pipeline processing and consistent per-signature outcomes.
Show a summary per file
File Description
.github/workflows/ci-failure-scan.md Reworks the workflow prompt’s per-failure decision procedure and KBE body/signature rules to reduce incorrect KBEs and improve PR/issue dedupe coverage.

Copilot's findings

  • Files reviewed: 1/1 changed files
  • Comments generated: 3

Comment thread .github/workflows/ci-failure-scan.md
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md Outdated
- Step 2: do not treat a non-KBE area tracker as a substitute for a KBE.
  Build Analysis only matches issues with the Known Build Error label and
  a valid JSON body, so a plain tracker doesn't unblock PR CI on its own.
  Always file a KBE, and cross-link the tracker as Tracking: ... in both
  the KBE body and the muting PR body.

- Standardize the per-step outcome vocabulary. Steps 3-4 now record
  -> existing-PR #<n> (matching Coverage discipline) instead of the
  divergent -> already-covered: ... form.

- Anchor the regex examples in the Bad/Good signature table. The previous
  examples used unbounded .* and were unanchored, contradicting the
  surrounding rule. Replaced with anchored, character-class-bounded
  patterns ([^\n]* instead of .*, leading ^, word-boundary on test names).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@steveisok
Copy link
Copy Markdown
Member

Nice tightening — the failure modes you're addressing line up closely with material already curated in dotnet/arcade-skills. Even though the agentic workflow can't load skills or call out to PowerShell scripts today, three pieces there could either be cross-linked from the workflow body or lifted inline as the rules harden further:

1. ci-analysis/references/kbe-issue-creation.md

Covers most of what your new "Body checks" section is enforcing, with a few additions worth folding in:

  • JSON escaping rules"\", \\\, and the regex double-escape gotcha (\\. in JSON to get \. at the regex engine). Catches a class of malformed-fence issues your prose doesn't explicitly name.
  • Exact ErrorPattern regex flavorSingleline | IgnoreCase | NonBacktracking, 50 ms/line timeout. Useful when the agent reaches for regex over String.Contains.
  • ErrorMessage specificity rationale — "pattern length doesn't matter; specificity does." Same Bad → Good direction as your new examples, just framed around when to escalate from single-line to the multi-line array form (and the constraint that each array element must match a different line).

2. ci-analysis/scripts/Test-KnownIssuePattern.ps1

Implements the same matching logic Build Analysis uses (String.Contains ordinal/case-sensitive for ErrorMessage; Regex with the flags above for ErrorPattern) and emits a RESULT: PASS/FAIL + a validated JSON blob.

This is exactly the mandatory validation gate the new "negative-match against [PASS] / [SKIP]" check is trying to enforce by prose — it would deterministically catch the bare-test-name / truncated-prefix false-positive class instead of depending on the agent to remember to do it. Today it can't be invoked from the workflow (bash: allowlist excludes pwsh), but the algorithm is small enough to inline as a bash/jq snippet, and is a natural drop-in once pwsh lands in the allowlist.

3. known-issue-history

Mines userContentEdits on KBE issues to reconstruct the actual hit-count timeline. Useful as a second cross-check on the new four-question KBE-match block — before linking a candidate KBE, confirm it has actually been hitting in the window where this failure was observed. Pure gh api graphql, so the underlying query could be inlined into the workflow even without a script runner; helps cut the "matched on test name only, wrong arch / dormant signature" wrong-link class.

Happy to put up a follow-up that cross-links these from ci-failure-scan.md and inlines a minimal bash equivalent of the validation gate within the existing tool allowlist if that's useful.

Build Analysis matches on either ErrorMessage (literal String.Contains)
or ErrorPattern (regex), but populating both is undefined behavior — the
parser may use only one and the caller cannot predict which. The current
skeleton encouraged filling ErrorMessage with the test name and
ErrorPattern with the regex, producing the worst of both worlds: the
test-name substring matches every [PASS] line for that test on every
future run, muting legitimate regressions.

Replace the single skeleton with two alternative skeletons — one for
literal-substring match, one for regex — and require the agent to pick
exactly one field per JSON block and delete the other. Strengthen check
#6 to forbid bare test names in ErrorPattern regexes too, not just in
ErrorMessage literals.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 8, 2026 18:37
Adopt the substantive items @steveisok flagged from
dotnet/arcade-skills:

- Cross-link the canonical kbe-issue-creation.md reference at the head
  of the body-checks section, and Test-KnownIssuePattern.ps1 in the
  validate-before-filing block. Lifts the burden of keeping this prompt
  exhaustive while keeping the in-band rules as a self-contained
  fallback.
- Add a JSON-escaping body-check (item 9): " -> \", \\ for backslash,
  real newlines -> \n, and the regex double-escape gotcha (\\. in
  JSON -> \. at the engine).
- Document the exact ErrorPattern regex flavor (Singleline |
  IgnoreCase | NonBacktracking, 50ms/line timeout) in body-check 8.
- Document the ErrorMessage / ErrorPattern multi-line array form: each
  element matches a separate line in order, lines between OK, do not
  pad with generic tokens, do not mix ErrorMessage and ErrorPattern in
  the same array.
- Add an in-band 'Validate before filing' block with grep -F / grep -E
  smoke tests against the failure log on disk. grep -F reproduces the
  Build-Analysis ErrorMessage matcher (literal, ordinal, case-sensitive)
  exactly; grep -E approximates the regex matcher closely enough to
  flag over-broad patterns. Calls out that the canonical validator is
  the pwsh script and that pwsh is not in this workflow's allowlist.
- Add 'pattern length doesn't matter; specificity does' to the trailing
  guidance to discourage agents from shortening unique multi-line
  signatures into pithy one-liners.
- Add an optional fifth question to the KBE-match check: when the
  candidate KBE is older than ~14 days, confirm Build Analysis is still
  matching it via gh api graphql over userContentEdits.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 1/1 changed files
  • Comments generated: 4

Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md Outdated
Comment thread .github/workflows/ci-failure-scan.md
Comment thread .github/workflows/ci-failure-scan.md
@kotlarmilos
Copy link
Copy Markdown
Member Author

Nice tightening — the failure modes you're addressing line up closely with material already curated in dotnet/arcade-skills. Even though the agentic workflow can't load skills or call out to PowerShell scripts today, three pieces there could either be cross-linked from the workflow body or lifted inline as the rules harden further:

1. ci-analysis/references/kbe-issue-creation.md

Covers most of what your new "Body checks" section is enforcing, with a few additions worth folding in:

  • JSON escaping rules"\", \\\, and the regex double-escape gotcha (\\. in JSON to get \. at the regex engine). Catches a class of malformed-fence issues your prose doesn't explicitly name.
  • Exact ErrorPattern regex flavorSingleline | IgnoreCase | NonBacktracking, 50 ms/line timeout. Useful when the agent reaches for regex over String.Contains.
  • ErrorMessage specificity rationale — "pattern length doesn't matter; specificity does." Same Bad → Good direction as your new examples, just framed around when to escalate from single-line to the multi-line array form (and the constraint that each array element must match a different line).

2. ci-analysis/scripts/Test-KnownIssuePattern.ps1

Implements the same matching logic Build Analysis uses (String.Contains ordinal/case-sensitive for ErrorMessage; Regex with the flags above for ErrorPattern) and emits a RESULT: PASS/FAIL + a validated JSON blob.

This is exactly the mandatory validation gate the new "negative-match against [PASS] / [SKIP]" check is trying to enforce by prose — it would deterministically catch the bare-test-name / truncated-prefix false-positive class instead of depending on the agent to remember to do it. Today it can't be invoked from the workflow (bash: allowlist excludes pwsh), but the algorithm is small enough to inline as a bash/jq snippet, and is a natural drop-in once pwsh lands in the allowlist.

3. known-issue-history

Mines userContentEdits on KBE issues to reconstruct the actual hit-count timeline. Useful as a second cross-check on the new four-question KBE-match block — before linking a candidate KBE, confirm it has actually been hitting in the window where this failure was observed. Pure gh api graphql, so the underlying query could be inlined into the workflow even without a script runner; helps cut the "matched on test name only, wrong arch / dormant signature" wrong-link class.

Happy to put up a follow-up that cross-links these from ci-failure-scan.md and inlines a minimal bash equivalent of the validation gate within the existing tool allowlist if that's useful.

Thanks, inlined your suggestions here.

kotlarmilos and others added 2 commits May 8, 2026 20:48
The two sections were the same gate at different specificity: Verify
check #7 was the prose form ('would String.Contains accidentally
match...'), Validate was the executable form ('grep -F vs the log').
Merge them so the rule is stated once with its concrete how-to inline,
and drop the duplicate header.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Step 5: replace fictitious 'get_issue dotnet/runtime#<n>' with the
  actual gh-aw github MCP tool name and argument shape — issue_read
  with method=get and {owner, repo, issue_number}. Confirmed against
  github/github-mcp-server: the tool is registered as 'issue_read' and
  the existence-check method is 'get'.
- Step 6: drop hardcoded labels (do-not-mute, investigation-active,
  keep-failing) — none of these exist in dotnet/runtime, so the check
  silently no-ops. Reframe as a generic 'do not mute / disable' check
  on body and recent area-owner comments, with an explicit instruction
  to verify any label exists before relying on it.
- KBE template comment + regex skeleton: clarify 'single-line regex'
  meant 'JSON value is a single-line string'. Spell out that multi-line
  matching is achieved via the regex \n escape inside that single-line
  string or via the array form, not by embedding real newlines in the
  JSON.
- Bad to Good signature table: replace the literal ellipsis '...' in
  the ErrorMessage 'Good' examples with the array form. The previous
  examples (e.g. 'TestName ... SocketException') would have been
  matched as three literal dots by Build Analysis and never matched
  any log line. Add a leading paragraph stating that '...' in
  ErrorMessage is matched literally, not as a wildcard, and document
  which form each row represents (plain string -> ErrorMessage,
  prefixed ErrorPattern: -> ErrorPattern, JSON array -> ErrorMessage
  array form).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 8, 2026 18:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 1/1 changed files
  • Comments generated: 1

Comment thread .github/workflows/ci-failure-scan.md Outdated
Copilot review (line 377) flagged that the section said the body 'MUST
look exactly like the example below', but the example contained two
JSON fences (one ErrorMessage, one ErrorPattern). That contradicts the
later 'Verify the body' check #2 ('exactly one fenced JSON block') and
would have led an agent to copy both blocks.

Restructure as Template A (literal substring, the default) and
Template B (regex), each a self-contained body with a single fenced
JSON block. The header explicitly says 'pick exactly one'. Each
template's HTML comment narrates only the field it uses, removing the
'EXACTLY ONE of ErrorMessage or ErrorPattern' instruction that
previously had to be re-asserted from inside an example showing both.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants