[ci-failure-scan] Tighten KBE filing rules and PR-search coverage by kotlarmilos · Pull Request #127961 · dotnet/runtime

kotlarmilos · 2026-05-08T17:47:30Z

Description

Refines the ci-failure-scan agentic-workflow prompt to address failure modes seen in past runs:

Generic ErrorMessage signatures (bare test names, exception types, or truncated prefixes) that match [PASS] / [SKIP] lines for the same test and turn the KBE into a false-positive matcher against passing builds.
Malformed JSON fences (4-backtick opens, mismatched fence lengths, multiple skeletons in one body) that cause Build Analysis to silently skip the issue.
Issues filed under the Known Build Error label with no JSON block at all, so nothing matches future failures.
Muting PRs that link to a non-existent issue number or duplicate an in-flight fix authored by a maintainer (because the search was scoped too narrowly to [ci-scan] PRs).
Muting PRs opened against issues whose area owners had already provided a PR
Wrong-KBE links where the candidate KBE matched only on test name but was filed against a different architecture / failure signature.

Changes

The body of .github/workflows/ci-failure-scan.md is reorganized into a clear walk-through:

Two-pass KBE → PR flow is now a numbered six-step pre-flight walk (existing KBE / area tracker / muting PR / in-flight fix PR / issue resolves / mute is welcome) followed by an explicit action selection (file KBE, open muting PR, optionally open fix PR).
KBE-match verification — four questions (test, signature, OS, architecture) the agent must answer before linking an existing KBE. Wrong answers mean filing a fresh KBE rather than reusing the wrong one.
Body checks — eight explicit checks on the issue body covering the JSON fence, exact ```json opening, single-line/no-escapes signature, and a negative-match test against [PASS] / `[SKIP]` and build-time output.
Bad → Good examples for both signature shape (bare test name, truncated prefix, bare exception type) and platform/csproj scope (linux-arm-only, single-arch NativeAOT, single stress mode).
Coverage-discipline section trimmed to its unique contribution (pipeline ordering, per-pipeline tally, run summary). Redundant Submit section removed — its content is now covered by the numbered walk.

Test run results

Workflow run 25570821336 was dispatched against this branch (commit 3cd6399dd70, pre-Copilot-fixup) and completed successfully (~28 min). Outputs:

#	Type	Title	Linked tracker
#127963	PR (draft)	`[ci-scan] Skip AsyncProfilerTests on Android and tvOS`	#127951
#127964	PR (draft)	`[ci-scan] Skip System.Net.Sockets IPv6 tests on Android`	#127565
#127965	Issue (KBE)	`[ci-scan] Known Build Error: System.Net.NameResolution DnsGetHostAddresses_LocalhostSubdomainWithTrailingDot fails on Android`	new
#127966	Issue (regression)	`[ci-scan] Test failure: XslCompiledTransformApiTests (82 tests) on all NativeAOT legs — PlatformNotSupportedException (Reflection.Emit)`	new
#127967	PR (draft)	`[ci-scan] Exclude Vector3Interop from GC stress`	#127827

Refines the ci-failure-scan agentic-workflow prompt to address failure modes seen in past runs: - Generic ErrorMessage signatures (bare test names, exception types, or truncated prefixes) that match [PASS]/[SKIP] lines for the same test and turn the KBE into a false-positive matcher against passing builds. - Malformed JSON fences (4-backtick opens, mismatched fence lengths, multiple skeletons in one body) that cause Build Analysis to silently skip the issue. - Issues filed under the Known Build Error label with no JSON block at all, so nothing matches future failures. - Muting PRs that link to a non-existent issue number or duplicate an in-flight fix authored by a maintainer (because the search was scoped too narrowly to [ci-scan] PRs). - Muting PRs opened against issues whose area owners had already objected to disabling the test. - Wrong-KBE links where the candidate KBE matched only on test name but was filed against a different architecture / failure signature. Reorganizes the prompt body into: 1. A numbered six-step pre-flight walk (existing KBE / area tracker / muting PR / in-flight fix PR / issue resolves / mute is welcome) followed by an explicit action selection. 2. A four-question KBE-match verification (test, signature, OS, arch). 3. Eight explicit body checks covering the JSON fence, exact opening, single-line/no-escape signature, and a negative-match test against [PASS]/[SKIP] and build-time output. 4. Bad/Good examples for signature shape and platform/csproj scope. Coverage-discipline section trimmed to its unique contribution (pipeline ordering, per-pipeline tally, run summary). Redundant Submit section removed. Only .github/workflows/ci-failure-scan.md is touched. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dotnet-policy-service · 2026-05-08T17:48:51Z

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR refactors the ci-failure-scan agent workflow prompt to tighten the “KBE then muting PR” decision flow and to reduce common false-positive / malformed-KBE failure modes (over-broad ErrorMessage, broken JSON fences, and too-narrow PR search).

Changes:

Replaces the previous per-failure instructions with an explicit six-step preflight walk (existing KBE, tracker, muting PR, in-flight fix PR, issue existence check, do-not-mute check) plus an action selection section.
Strengthens Known Build Error body requirements with a literal ```json example, an 8-point pre-submit checklist, and “Bad → Good” signature guidance.
Trims/rewrites the coverage discipline section to emphasize systematic per-pipeline processing and consistent per-signature outcomes.

Show a summary per file

File	Description
.github/workflows/ci-failure-scan.md	Reworks the workflow prompt’s per-failure decision procedure and KBE body/signature rules to reduce incorrect KBEs and improve PR/issue dedupe coverage.

Copilot's findings

Files reviewed: 1/1 changed files
Comments generated: 3

- Step 2: do not treat a non-KBE area tracker as a substitute for a KBE. Build Analysis only matches issues with the Known Build Error label and a valid JSON body, so a plain tracker doesn't unblock PR CI on its own. Always file a KBE, and cross-link the tracker as Tracking: ... in both the KBE body and the muting PR body. - Standardize the per-step outcome vocabulary. Steps 3-4 now record -> existing-PR #<n> (matching Coverage discipline) instead of the divergent -> already-covered: ... form. - Anchor the regex examples in the Bad/Good signature table. The previous examples used unbounded .* and were unanchored, contradicting the surrounding rule. Replaced with anchored, character-class-bounded patterns ([^\n]* instead of .*, leading ^, word-boundary on test names). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

steveisok · 2026-05-08T18:28:09Z

Nice tightening — the failure modes you're addressing line up closely with material already curated in dotnet/arcade-skills. Even though the agentic workflow can't load skills or call out to PowerShell scripts today, three pieces there could either be cross-linked from the workflow body or lifted inline as the rules harden further:

1. `ci-analysis/references/kbe-issue-creation.md`

Covers most of what your new "Body checks" section is enforcing, with a few additions worth folding in:

JSON escaping rules — " → \", \ → \\, and the regex double-escape gotcha (\\. in JSON to get \. at the regex engine). Catches a class of malformed-fence issues your prose doesn't explicitly name.
Exact ErrorPattern regex flavor — Singleline | IgnoreCase | NonBacktracking, 50 ms/line timeout. Useful when the agent reaches for regex over String.Contains.
ErrorMessage specificity rationale — "pattern length doesn't matter; specificity does." Same Bad → Good direction as your new examples, just framed around when to escalate from single-line to the multi-line array form (and the constraint that each array element must match a different line).

2. `ci-analysis/scripts/Test-KnownIssuePattern.ps1`

Implements the same matching logic Build Analysis uses (String.Contains ordinal/case-sensitive for ErrorMessage; Regex with the flags above for ErrorPattern) and emits a RESULT: PASS/FAIL + a validated JSON blob.

This is exactly the mandatory validation gate the new "negative-match against [PASS] / [SKIP]" check is trying to enforce by prose — it would deterministically catch the bare-test-name / truncated-prefix false-positive class instead of depending on the agent to remember to do it. Today it can't be invoked from the workflow (bash: allowlist excludes pwsh), but the algorithm is small enough to inline as a bash/jq snippet, and is a natural drop-in once pwsh lands in the allowlist.

3. `known-issue-history`

Mines userContentEdits on KBE issues to reconstruct the actual hit-count timeline. Useful as a second cross-check on the new four-question KBE-match block — before linking a candidate KBE, confirm it has actually been hitting in the window where this failure was observed. Pure gh api graphql, so the underlying query could be inlined into the workflow even without a script runner; helps cut the "matched on test name only, wrong arch / dormant signature" wrong-link class.

Happy to put up a follow-up that cross-links these from ci-failure-scan.md and inlines a minimal bash equivalent of the validation gate within the existing tool allowlist if that's useful.

Build Analysis matches on either ErrorMessage (literal String.Contains) or ErrorPattern (regex), but populating both is undefined behavior — the parser may use only one and the caller cannot predict which. The current skeleton encouraged filling ErrorMessage with the test name and ErrorPattern with the regex, producing the worst of both worlds: the test-name substring matches every [PASS] line for that test on every future run, muting legitimate regressions. Replace the single skeleton with two alternative skeletons — one for literal-substring match, one for regex — and require the agent to pick exactly one field per JSON block and delete the other. Strengthen check #6 to forbid bare test names in ErrorPattern regexes too, not just in ErrorMessage literals. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@steveisok

Adopt the substantive items @steveisok flagged from dotnet/arcade-skills: - Cross-link the canonical kbe-issue-creation.md reference at the head of the body-checks section, and Test-KnownIssuePattern.ps1 in the validate-before-filing block. Lifts the burden of keeping this prompt exhaustive while keeping the in-band rules as a self-contained fallback. - Add a JSON-escaping body-check (item 9): " -> \", \\ for backslash, real newlines -> \n, and the regex double-escape gotcha (\\. in JSON -> \. at the engine). - Document the exact ErrorPattern regex flavor (Singleline | IgnoreCase | NonBacktracking, 50ms/line timeout) in body-check 8. - Document the ErrorMessage / ErrorPattern multi-line array form: each element matches a separate line in order, lines between OK, do not pad with generic tokens, do not mix ErrorMessage and ErrorPattern in the same array. - Add an in-band 'Validate before filing' block with grep -F / grep -E smoke tests against the failure log on disk. grep -F reproduces the Build-Analysis ErrorMessage matcher (literal, ordinal, case-sensitive) exactly; grep -E approximates the regex matcher closely enough to flag over-broad patterns. Calls out that the canonical validator is the pwsh script and that pwsh is not in this workflow's allowlist. - Add 'pattern length doesn't matter; specificity does' to the trailing guidance to discourage agents from shortening unique multi-line signatures into pithy one-liners. - Add an optional fifth question to the KBE-match check: when the candidate KBE is older than ~14 days, confirm Build Analysis is still matching it via gh api graphql over userContentEdits. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot's findings

Files reviewed: 1/1 changed files
Comments generated: 4

kotlarmilos · 2026-05-08T18:48:02Z

Nice tightening — the failure modes you're addressing line up closely with material already curated in dotnet/arcade-skills. Even though the agentic workflow can't load skills or call out to PowerShell scripts today, three pieces there could either be cross-linked from the workflow body or lifted inline as the rules harden further:

1. ci-analysis/references/kbe-issue-creation.md

Covers most of what your new "Body checks" section is enforcing, with a few additions worth folding in:

JSON escaping rules — " → \", \ → \\, and the regex double-escape gotcha (\\. in JSON to get \. at the regex engine). Catches a class of malformed-fence issues your prose doesn't explicitly name.

Exact ErrorPattern regex flavor — Singleline | IgnoreCase | NonBacktracking, 50 ms/line timeout. Useful when the agent reaches for regex over String.Contains.

ErrorMessage specificity rationale — "pattern length doesn't matter; specificity does." Same Bad → Good direction as your new examples, just framed around when to escalate from single-line to the multi-line array form (and the constraint that each array element must match a different line).

2. ci-analysis/scripts/Test-KnownIssuePattern.ps1

Implements the same matching logic Build Analysis uses (String.Contains ordinal/case-sensitive for ErrorMessage; Regex with the flags above for ErrorPattern) and emits a RESULT: PASS/FAIL + a validated JSON blob.

This is exactly the mandatory validation gate the new "negative-match against [PASS] / [SKIP]" check is trying to enforce by prose — it would deterministically catch the bare-test-name / truncated-prefix false-positive class instead of depending on the agent to remember to do it. Today it can't be invoked from the workflow (bash: allowlist excludes pwsh), but the algorithm is small enough to inline as a bash/jq snippet, and is a natural drop-in once pwsh lands in the allowlist.

3. known-issue-history

Mines userContentEdits on KBE issues to reconstruct the actual hit-count timeline. Useful as a second cross-check on the new four-question KBE-match block — before linking a candidate KBE, confirm it has actually been hitting in the window where this failure was observed. Pure gh api graphql, so the underlying query could be inlined into the workflow even without a script runner; helps cut the "matched on test name only, wrong arch / dormant signature" wrong-link class.

Happy to put up a follow-up that cross-links these from ci-failure-scan.md and inlines a minimal bash equivalent of the validation gate within the existing tool allowlist if that's useful.

Thanks, inlined your suggestions here.

The two sections were the same gate at different specificity: Verify check #7 was the prose form ('would String.Contains accidentally match...'), Validate was the executable form ('grep -F vs the log'). Merge them so the rule is stated once with its concrete how-to inline, and drop the duplicate header. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Step 5: replace fictitious 'get_issue dotnet/runtime#<n>' with the actual gh-aw github MCP tool name and argument shape — issue_read with method=get and {owner, repo, issue_number}. Confirmed against github/github-mcp-server: the tool is registered as 'issue_read' and the existence-check method is 'get'. - Step 6: drop hardcoded labels (do-not-mute, investigation-active, keep-failing) — none of these exist in dotnet/runtime, so the check silently no-ops. Reframe as a generic 'do not mute / disable' check on body and recent area-owner comments, with an explicit instruction to verify any label exists before relying on it. - KBE template comment + regex skeleton: clarify 'single-line regex' meant 'JSON value is a single-line string'. Spell out that multi-line matching is achieved via the regex \n escape inside that single-line string or via the array form, not by embedding real newlines in the JSON. - Bad to Good signature table: replace the literal ellipsis '...' in the ErrorMessage 'Good' examples with the array form. The previous examples (e.g. 'TestName ... SocketException') would have been matched as three literal dots by Build Analysis and never matched any log line. Add a leading paragraph stating that '...' in ErrorMessage is matched literally, not as a wildcard, and document which form each row represents (plain string -> ErrorMessage, prefixed ErrorPattern: -> ErrorPattern, JSON array -> ErrorMessage array form). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot's findings

Files reviewed: 1/1 changed files
Comments generated: 1

Copilot review (line 377) flagged that the section said the body 'MUST look exactly like the example below', but the example contained two JSON fences (one ErrorMessage, one ErrorPattern). That contradicts the later 'Verify the body' check #2 ('exactly one fenced JSON block') and would have led an agent to copy both blocks. Restructure as Template A (literal substring, the default) and Template B (regex), each a self-contained body with a single fenced JSON block. The header explicitly says 'pick exactly one'. Each template's HTML comment narrates only the field it uses, removing the 'EXACTLY ONE of ErrorMessage or ErrorPattern' instruction that previously had to be re-asserted from inside an example showing both. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 8, 2026 17:47

kotlarmilos requested review from a team and jeffhandley as code owners May 8, 2026 17:47

Copilot started reviewing on behalf of kotlarmilos May 8, 2026 17:47 View session

github-actions Bot added the area-Infrastructure label May 8, 2026

dotnet-policy-service Bot assigned kotlarmilos May 8, 2026

github-project-automation Bot added this to Runtime Infra May 8, 2026

kotlarmilos requested review from JanKrivanek, MichalStrehovsky, jkotas and vitek-karas May 8, 2026 17:49

kotlarmilos mentioned this pull request May 8, 2026

[ci-scan] Test failure: DnsGetHostAddresses_LocalhostSubdomainWithTrailingDot_ReturnsLoopback on android #127953

Open

Copilot AI reviewed May 8, 2026

View reviewed changes

Comment thread .github/workflows/ci-failure-scan.md

Comment thread .github/workflows/ci-failure-scan.md Outdated

Comment thread .github/workflows/ci-failure-scan.md Outdated

jkotas approved these changes May 8, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings May 8, 2026 18:37

Copilot started reviewing on behalf of kotlarmilos May 8, 2026 18:37 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

Comment thread .github/workflows/ci-failure-scan.md Outdated

Comment thread .github/workflows/ci-failure-scan.md Outdated

Comment thread .github/workflows/ci-failure-scan.md

Comment thread .github/workflows/ci-failure-scan.md

kotlarmilos and others added 2 commits May 8, 2026 20:48

Copilot AI review requested due to automatic review settings May 8, 2026 18:50

Copilot started reviewing on behalf of kotlarmilos May 8, 2026 18:51 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

Comment thread .github/workflows/ci-failure-scan.md Outdated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ci-failure-scan] Tighten KBE filing rules and PR-search coverage#127961

[ci-failure-scan] Tighten KBE filing rules and PR-search coverage#127961
kotlarmilos wants to merge 7 commits intodotnet:mainfrom
kotlarmilos:kotlarmilos/ci-failure-scan-tighten

kotlarmilos commented May 8, 2026 •

edited

Loading

Uh oh!

dotnet-policy-service Bot commented May 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

steveisok commented May 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kotlarmilos commented May 8, 2026

1. `ci-analysis/references/kbe-issue-creation.md`

2. `ci-analysis/scripts/Test-KnownIssuePattern.ps1`

3. `known-issue-history`

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kotlarmilos commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Test run results

Uh oh!

dotnet-policy-service Bot commented May 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

steveisok commented May 8, 2026

1. ci-analysis/references/kbe-issue-creation.md

2. ci-analysis/scripts/Test-KnownIssuePattern.ps1

3. known-issue-history

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kotlarmilos commented May 8, 2026

1. ci-analysis/references/kbe-issue-creation.md

2. ci-analysis/scripts/Test-KnownIssuePattern.ps1

3. known-issue-history

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kotlarmilos commented May 8, 2026 •

edited

Loading

1. `ci-analysis/references/kbe-issue-creation.md`

2. `ci-analysis/scripts/Test-KnownIssuePattern.ps1`

3. `known-issue-history`

1. `ci-analysis/references/kbe-issue-creation.md`

2. `ci-analysis/scripts/Test-KnownIssuePattern.ps1`

3. `known-issue-history`