Skip to content

feat(test): add failure triage for batch root-cause grouping#44

Open
SahilRakhaiya05 wants to merge 4 commits into
TestSprite:mainfrom
SahilRakhaiya05:feat/failure-triage
Open

feat(test): add failure triage for batch root-cause grouping#44
SahilRakhaiya05 wants to merge 4 commits into
TestSprite:mainfrom
SahilRakhaiya05:feat/failure-triage

Conversation

@SahilRakhaiya05

@SahilRakhaiya05 SahilRakhaiya05 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

This PR adds a new CLI command:

testsprite test failure triage --project <project-id> --output json

The command groups failed tests into root-cause clusters instead of returning a flat list of unrelated failures. This helps agents and developers quickly identify the highest-priority issue, investigate one representative test first, and avoid fixing the same underlying problem multiple times.

Each cluster includes:

  • A human-readable label
  • A representative test to investigate first
  • All affected test IDs
  • A confidence score
  • A fix priority, where lower means higher priority

The command uses existing TestSprite APIs only, so no backend changes are required. It fetches lightweight failure summary data per test and does not download screenshots, videos, or full failure bundles.


Problem

Today, when a batch run fails many tests, the CLI and agents only see individual failures:

testsprite test run --all --project proj_xxx --wait
# → tests failed, each reported separately

This makes agents and developers:

  1. Review many separate failed rows.
  2. Guess which failure to investigate first.
  3. Download multiple failure bundles.
  4. Often fix the same underlying issue more than once.

The CLI already has strong per-test analysis through test failure get, test failure summary, rootCauseHypothesis, recommendedFixTarget, and failureKind.

What was missing is cross-test grouping after a batch failure.


Solution

test failure triage works in three steps:

  1. Lists all failed tests for a project.
  2. Fetches failure summaries for each failed test in parallel.
  3. Groups failures client-side using deterministic heuristics.

The grouping algorithm uses the following signals:

  • Shared recommendedFixTarget.reference
  • Environment-wide failureKind, such as network_timeout or infra
  • Similar rootCauseHypothesis
  • Singleton fallback when no grouping signal exists

Clusters are ordered by fix priority first, then by member count.


Command surface

testsprite test failure triage --project <project-id> [options]

Supported options:

  • --project <id> — required project ID
  • --type frontend|backend — filter failed tests by type
  • --filter <substr> — filter tests by name substring
  • --max-concurrency <n> — parallel summary fetches, default 5
  • --output json|text — machine or human output
  • --endpoint-url <url> — override API host

Also supports global flags such as --dry-run, --profile, --verbose, and --debug.


Recommended agent workflow

# 1. Batch run fails
testsprite test run --all --project <project-id> --wait --output json

# 2. Triage failures into clusters
testsprite test failure triage --project <project-id> --output json

# 3. Download one bundle from the highest-priority representative test
testsprite test failure get <representativeTestId> --out ./.testsprite/failure

# 4. Fix the issue and rerun the representative first
testsprite test rerun <representativeTestId> --wait

# 5. Run full regression after the representative passes
testsprite test rerun --all --project <project-id> --wait

The agent skill was also updated to recommend triage before downloading bundles when multiple tests fail.


Implementation details

Added new grouping logic in:

src/lib/failure-triage.ts

This includes:

  • normalizeHypothesis()
  • computeGroupKey()
  • pickRepresentativeTestId()
  • computeClusterConfidence()
  • computeFixPriority()
  • buildFailureClusters()
  • renderFailureTriageText()

Added command implementation in:

src/commands/test.ts

The command validates inputs, paginates failed tests, applies filters, fetches summaries with bounded concurrency, handles stale failed rows, and emits JSON or text output through the existing output system.


Test coverage

This PR adds 18 automated tests:

  • 11 unit tests for src/lib/failure-triage.test.ts
  • 7 integration tests for src/commands/test.test.ts

Coverage includes:

  • Grouping by fix target
  • Grouping by failure kind
  • Grouping by hypothesis
  • Singleton fallback
  • Representative test selection
  • Cluster confidence and priority
  • Empty projects
  • Stale failed rows
  • Missing project validation
  • JSON and text output
  • Help surface

Future work

Out of scope for this PR:

  • Native GET /projects/{id}/failures/clusters API
  • Semantic embedding clustering on rootCauseHypothesis
  • BE wave/cascade graph integration
  • --rerun-representatives --wait orchestration flag

Checklist

  • New command with JSON and text output
  • Uses existing APIs only
  • Deterministic grouping, no CLI LLM calls
  • 18 automated tests added
  • Typecheck, lint, and build pass
  • Documentation updated
  • Agent skill updated
  • Help snapshot added
  • Manual production API smoke test completed

#43

Add testsprite test failure triage --project <id> to group failed tests into root-cause clusters using existing M2.1 analysis fields. Returns a representative test per cluster, confidence score, and fix priority without downloading failure bundles.

Includes grouping library, command wiring, unit/integration tests, docs, CHANGELOG entry, agent skill update, and help snapshot.
…iage

- Fix test failure triage help snapshot default value quoting

- Run prettier on changed files

- Add filter and max-concurrency validation tests

- Remove draft issue/PR markdown files from repo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant