feat(test): add failure triage for batch root-cause grouping by SahilRakhaiya05 · Pull Request #44 · TestSprite/testsprite-cli

SahilRakhaiya05 · 2026-06-26T12:13:54Z

This PR adds a new CLI command:

testsprite test failure triage --project <project-id> --output json

The command groups failed tests into root-cause clusters instead of returning a flat list of unrelated failures. This helps agents and developers quickly identify the highest-priority issue, investigate one representative test first, and avoid fixing the same underlying problem multiple times.

Each cluster includes:

A human-readable label
A representative test to investigate first
All affected test IDs
A confidence score
A fix priority, where lower means higher priority

The command uses existing TestSprite APIs only, so no backend changes are required. It fetches lightweight failure summary data per test and does not download screenshots, videos, or full failure bundles.

Problem

Today, when a batch run fails many tests, the CLI and agents only see individual failures:

testsprite test run --all --project proj_xxx --wait
# → tests failed, each reported separately

This makes agents and developers:

Review many separate failed rows.
Guess which failure to investigate first.
Download multiple failure bundles.
Often fix the same underlying issue more than once.

The CLI already has strong per-test analysis through test failure get, test failure summary, rootCauseHypothesis, recommendedFixTarget, and failureKind.

What was missing is cross-test grouping after a batch failure.

Solution

test failure triage works in three steps:

Lists all failed tests for a project.
Fetches failure summaries for each failed test in parallel.
Groups failures client-side using deterministic heuristics.

The grouping algorithm uses the following signals:

Shared recommendedFixTarget.reference
Environment-wide failureKind, such as network_timeout or infra
Similar rootCauseHypothesis
Singleton fallback when no grouping signal exists

Clusters are ordered by fix priority first, then by member count.

Command surface

testsprite test failure triage --project <project-id> [options]

Supported options:

--project <id> — required project ID
--type frontend|backend — filter failed tests by type
--filter <substr> — filter tests by name substring
--max-concurrency <n> — parallel summary fetches, default 5
--output json|text — machine or human output
--endpoint-url <url> — override API host

Also supports global flags such as --dry-run, --profile, --verbose, and --debug.

Recommended agent workflow

# 1. Batch run fails
testsprite test run --all --project <project-id> --wait --output json

# 2. Triage failures into clusters
testsprite test failure triage --project <project-id> --output json

# 3. Download one bundle from the highest-priority representative test
testsprite test failure get <representativeTestId> --out ./.testsprite/failure

# 4. Fix the issue and rerun the representative first
testsprite test rerun <representativeTestId> --wait

# 5. Run full regression after the representative passes
testsprite test rerun --all --project <project-id> --wait

The agent skill was also updated to recommend triage before downloading bundles when multiple tests fail.

Implementation details

Added new grouping logic in:

src/lib/failure-triage.ts

This includes:

normalizeHypothesis()
computeGroupKey()
pickRepresentativeTestId()
computeClusterConfidence()
computeFixPriority()
buildFailureClusters()
renderFailureTriageText()

Added command implementation in:

src/commands/test.ts

The command validates inputs, paginates failed tests, applies filters, fetches summaries with bounded concurrency, handles stale failed rows, and emits JSON or text output through the existing output system.

Test coverage

This PR adds 18 automated tests:

11 unit tests for src/lib/failure-triage.test.ts
7 integration tests for src/commands/test.test.ts

Coverage includes:

Grouping by fix target
Grouping by failure kind
Grouping by hypothesis
Singleton fallback
Representative test selection
Cluster confidence and priority
Empty projects
Stale failed rows
Missing project validation
JSON and text output
Help surface

Future work

Out of scope for this PR:

Native GET /projects/{id}/failures/clusters API
Semantic embedding clustering on rootCauseHypothesis
BE wave/cascade graph integration
--rerun-representatives --wait orchestration flag

Checklist

New command with JSON and text output
Uses existing APIs only
Deterministic grouping, no CLI LLM calls
18 automated tests added
Typecheck, lint, and build pass
Documentation updated
Agent skill updated
Help snapshot added
Manual production API smoke test completed

#43

Add testsprite test failure triage --project <id> to group failed tests into root-cause clusters using existing M2.1 analysis fields. Returns a representative test per cluster, confidence score, and fix priority without downloading failure bundles. Includes grouping library, command wiring, unit/integration tests, docs, CHANGELOG entry, agent skill update, and help snapshot.

…iage - Fix test failure triage help snapshot default value quoting - Run prettier on changed files - Add filter and max-concurrency validation tests - Remove draft issue/PR markdown files from repo

SahilRakhaiya05 added 4 commits June 26, 2026 17:14

docs: add issue and PR templates for failure triage

6a17fa8

docs: update PR template with live API verification results

12b6df5

fix(ci): correct help snapshot and prettier formatting for failure tr…

62af870

…iage - Fix test failure triage help snapshot default value quoting - Run prettier on changed files - Add filter and max-concurrency validation tests - Remove draft issue/PR markdown files from repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(test): add failure triage for batch root-cause grouping#44

feat(test): add failure triage for batch root-cause grouping#44
SahilRakhaiya05 wants to merge 4 commits into
TestSprite:mainfrom
SahilRakhaiya05:feat/failure-triage

SahilRakhaiya05 commented Jun 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

SahilRakhaiya05 commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Command surface

Recommended agent workflow

Implementation details

Test coverage

Future work

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SahilRakhaiya05 commented Jun 26, 2026 •

edited

Loading