[ci-failure-scan] Tighten KBE filing rules and PR-search coverage#127961
[ci-failure-scan] Tighten KBE filing rules and PR-search coverage#127961kotlarmilos wants to merge 7 commits intodotnet:mainfrom
Conversation
Refines the ci-failure-scan agentic-workflow prompt to address failure modes seen in past runs: - Generic ErrorMessage signatures (bare test names, exception types, or truncated prefixes) that match [PASS]/[SKIP] lines for the same test and turn the KBE into a false-positive matcher against passing builds. - Malformed JSON fences (4-backtick opens, mismatched fence lengths, multiple skeletons in one body) that cause Build Analysis to silently skip the issue. - Issues filed under the Known Build Error label with no JSON block at all, so nothing matches future failures. - Muting PRs that link to a non-existent issue number or duplicate an in-flight fix authored by a maintainer (because the search was scoped too narrowly to [ci-scan] PRs). - Muting PRs opened against issues whose area owners had already objected to disabling the test. - Wrong-KBE links where the candidate KBE matched only on test name but was filed against a different architecture / failure signature. Reorganizes the prompt body into: 1. A numbered six-step pre-flight walk (existing KBE / area tracker / muting PR / in-flight fix PR / issue resolves / mute is welcome) followed by an explicit action selection. 2. A four-question KBE-match verification (test, signature, OS, arch). 3. Eight explicit body checks covering the JSON fence, exact opening, single-line/no-escape signature, and a negative-match test against [PASS]/[SKIP] and build-time output. 4. Bad/Good examples for signature shape and platform/csproj scope. Coverage-discipline section trimmed to its unique contribution (pipeline ordering, per-pipeline tally, run summary). Redundant Submit section removed. Only .github/workflows/ci-failure-scan.md is touched. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @dotnet/runtime-infrastructure |
There was a problem hiding this comment.
Pull request overview
This PR refactors the ci-failure-scan agent workflow prompt to tighten the “KBE then muting PR” decision flow and to reduce common false-positive / malformed-KBE failure modes (over-broad ErrorMessage, broken JSON fences, and too-narrow PR search).
Changes:
- Replaces the previous per-failure instructions with an explicit six-step preflight walk (existing KBE, tracker, muting PR, in-flight fix PR, issue existence check, do-not-mute check) plus an action selection section.
- Strengthens Known Build Error body requirements with a literal
```jsonexample, an 8-point pre-submit checklist, and “Bad → Good” signature guidance. - Trims/rewrites the coverage discipline section to emphasize systematic per-pipeline processing and consistent per-signature outcomes.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/ci-failure-scan.md | Reworks the workflow prompt’s per-failure decision procedure and KBE body/signature rules to reduce incorrect KBEs and improve PR/issue dedupe coverage. |
Copilot's findings
- Files reviewed: 1/1 changed files
- Comments generated: 3
- Step 2: do not treat a non-KBE area tracker as a substitute for a KBE. Build Analysis only matches issues with the Known Build Error label and a valid JSON body, so a plain tracker doesn't unblock PR CI on its own. Always file a KBE, and cross-link the tracker as Tracking: ... in both the KBE body and the muting PR body. - Standardize the per-step outcome vocabulary. Steps 3-4 now record -> existing-PR #<n> (matching Coverage discipline) instead of the divergent -> already-covered: ... form. - Anchor the regex examples in the Bad/Good signature table. The previous examples used unbounded .* and were unanchored, contradicting the surrounding rule. Replaced with anchored, character-class-bounded patterns ([^\n]* instead of .*, leading ^, word-boundary on test names). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Nice tightening — the failure modes you're addressing line up closely with material already curated in 1.
|
Build Analysis matches on either ErrorMessage (literal String.Contains) or ErrorPattern (regex), but populating both is undefined behavior — the parser may use only one and the caller cannot predict which. The current skeleton encouraged filling ErrorMessage with the test name and ErrorPattern with the regex, producing the worst of both worlds: the test-name substring matches every [PASS] line for that test on every future run, muting legitimate regressions. Replace the single skeleton with two alternative skeletons — one for literal-substring match, one for regex — and require the agent to pick exactly one field per JSON block and delete the other. Strengthen check #6 to forbid bare test names in ErrorPattern regexes too, not just in ErrorMessage literals. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adopt the substantive items @steveisok flagged from dotnet/arcade-skills: - Cross-link the canonical kbe-issue-creation.md reference at the head of the body-checks section, and Test-KnownIssuePattern.ps1 in the validate-before-filing block. Lifts the burden of keeping this prompt exhaustive while keeping the in-band rules as a self-contained fallback. - Add a JSON-escaping body-check (item 9): " -> \", \\ for backslash, real newlines -> \n, and the regex double-escape gotcha (\\. in JSON -> \. at the engine). - Document the exact ErrorPattern regex flavor (Singleline | IgnoreCase | NonBacktracking, 50ms/line timeout) in body-check 8. - Document the ErrorMessage / ErrorPattern multi-line array form: each element matches a separate line in order, lines between OK, do not pad with generic tokens, do not mix ErrorMessage and ErrorPattern in the same array. - Add an in-band 'Validate before filing' block with grep -F / grep -E smoke tests against the failure log on disk. grep -F reproduces the Build-Analysis ErrorMessage matcher (literal, ordinal, case-sensitive) exactly; grep -E approximates the regex matcher closely enough to flag over-broad patterns. Calls out that the canonical validator is the pwsh script and that pwsh is not in this workflow's allowlist. - Add 'pattern length doesn't matter; specificity does' to the trailing guidance to discourage agents from shortening unique multi-line signatures into pithy one-liners. - Add an optional fifth question to the KBE-match check: when the candidate KBE is older than ~14 days, confirm Build Analysis is still matching it via gh api graphql over userContentEdits. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Thanks, inlined your suggestions here. |
The two sections were the same gate at different specificity: Verify check #7 was the prose form ('would String.Contains accidentally match...'), Validate was the executable form ('grep -F vs the log'). Merge them so the rule is stated once with its concrete how-to inline, and drop the duplicate header. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Step 5: replace fictitious 'get_issue dotnet/runtime#<n>' with the
actual gh-aw github MCP tool name and argument shape — issue_read
with method=get and {owner, repo, issue_number}. Confirmed against
github/github-mcp-server: the tool is registered as 'issue_read' and
the existence-check method is 'get'.
- Step 6: drop hardcoded labels (do-not-mute, investigation-active,
keep-failing) — none of these exist in dotnet/runtime, so the check
silently no-ops. Reframe as a generic 'do not mute / disable' check
on body and recent area-owner comments, with an explicit instruction
to verify any label exists before relying on it.
- KBE template comment + regex skeleton: clarify 'single-line regex'
meant 'JSON value is a single-line string'. Spell out that multi-line
matching is achieved via the regex \n escape inside that single-line
string or via the array form, not by embedding real newlines in the
JSON.
- Bad to Good signature table: replace the literal ellipsis '...' in
the ErrorMessage 'Good' examples with the array form. The previous
examples (e.g. 'TestName ... SocketException') would have been
matched as three literal dots by Build Analysis and never matched
any log line. Add a leading paragraph stating that '...' in
ErrorMessage is matched literally, not as a wildcard, and document
which form each row represents (plain string -> ErrorMessage,
prefixed ErrorPattern: -> ErrorPattern, JSON array -> ErrorMessage
array form).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot review (line 377) flagged that the section said the body 'MUST look exactly like the example below', but the example contained two JSON fences (one ErrorMessage, one ErrorPattern). That contradicts the later 'Verify the body' check #2 ('exactly one fenced JSON block') and would have led an agent to copy both blocks. Restructure as Template A (literal substring, the default) and Template B (regex), each a self-contained body with a single fenced JSON block. The header explicitly says 'pick exactly one'. Each template's HTML comment narrates only the field it uses, removing the 'EXACTLY ONE of ErrorMessage or ErrorPattern' instruction that previously had to be re-asserted from inside an example showing both. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Description
Refines the
ci-failure-scanagentic-workflow prompt to address failure modes seen in past runs:ErrorMessagesignatures (bare test names, exception types, or truncated prefixes) that match[PASS]/[SKIP]lines for the same test and turn the KBE into a false-positive matcher against passing builds.Known Build Errorlabel with no JSON block at all, so nothing matches future failures.[ci-scan]PRs).Changes
The body of
.github/workflows/ci-failure-scan.mdis reorganized into a clear walk-through:[PASS]/ `[SKIP]` and build-time output.linux-arm-only, single-arch NativeAOT, single stress mode).Submitsection removed — its content is now covered by the numbered walk.Test run results
Workflow run 25570821336 was dispatched against this branch (commit
3cd6399dd70, pre-Copilot-fixup) and completed successfully (~28 min). Outputs:[ci-scan] Skip AsyncProfilerTests on Android and tvOS[ci-scan] Skip System.Net.Sockets IPv6 tests on Android[ci-scan] Known Build Error: System.Net.NameResolution DnsGetHostAddresses_LocalhostSubdomainWithTrailingDot fails on Android[ci-scan] Test failure: XslCompiledTransformApiTests (82 tests) on all NativeAOT legs — PlatformNotSupportedException (Reflection.Emit)[ci-scan] Exclude Vector3Interop from GC stress