feat(test): add failure triage for batch root-cause grouping#44
Open
SahilRakhaiya05 wants to merge 4 commits into
Open
feat(test): add failure triage for batch root-cause grouping#44SahilRakhaiya05 wants to merge 4 commits into
SahilRakhaiya05 wants to merge 4 commits into
Conversation
Add testsprite test failure triage --project <id> to group failed tests into root-cause clusters using existing M2.1 analysis fields. Returns a representative test per cluster, confidence score, and fix priority without downloading failure bundles. Includes grouping library, command wiring, unit/integration tests, docs, CHANGELOG entry, agent skill update, and help snapshot.
…iage - Fix test failure triage help snapshot default value quoting - Run prettier on changed files - Add filter and max-concurrency validation tests - Remove draft issue/PR markdown files from repo
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a new CLI command:
The command groups failed tests into root-cause clusters instead of returning a flat list of unrelated failures. This helps agents and developers quickly identify the highest-priority issue, investigate one representative test first, and avoid fixing the same underlying problem multiple times.
Each cluster includes:
The command uses existing TestSprite APIs only, so no backend changes are required. It fetches lightweight failure summary data per test and does not download screenshots, videos, or full failure bundles.
Problem
Today, when a batch run fails many tests, the CLI and agents only see individual failures:
This makes agents and developers:
The CLI already has strong per-test analysis through
test failure get,test failure summary,rootCauseHypothesis,recommendedFixTarget, andfailureKind.What was missing is cross-test grouping after a batch failure.
Solution
test failure triageworks in three steps:The grouping algorithm uses the following signals:
recommendedFixTarget.referencefailureKind, such asnetwork_timeoutorinfrarootCauseHypothesisClusters are ordered by fix priority first, then by member count.
Command surface
Supported options:
--project <id>— required project ID--type frontend|backend— filter failed tests by type--filter <substr>— filter tests by name substring--max-concurrency <n>— parallel summary fetches, default 5--output json|text— machine or human output--endpoint-url <url>— override API hostAlso supports global flags such as
--dry-run,--profile,--verbose, and--debug.Recommended agent workflow
The agent skill was also updated to recommend triage before downloading bundles when multiple tests fail.
Implementation details
Added new grouping logic in:
This includes:
normalizeHypothesis()computeGroupKey()pickRepresentativeTestId()computeClusterConfidence()computeFixPriority()buildFailureClusters()renderFailureTriageText()Added command implementation in:
The command validates inputs, paginates failed tests, applies filters, fetches summaries with bounded concurrency, handles stale failed rows, and emits JSON or text output through the existing output system.
Test coverage
This PR adds 18 automated tests:
src/lib/failure-triage.test.tssrc/commands/test.test.tsCoverage includes:
Future work
Out of scope for this PR:
GET /projects/{id}/failures/clustersAPIrootCauseHypothesis--rerun-representatives --waitorchestration flagChecklist
#43