Skip to content

Commit 3f2eb72

Browse files
committed
Deploy: test-driver category-aware gap analysis + linux-sysadmin skills expansion
2 parents d6601c4 + c6906eb commit 3f2eb72

5 files changed

Lines changed: 332 additions & 7 deletions

File tree

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
# Test Driver Category-Aware Gap Analysis — Implementation Plan
2+
3+
> **For agentic workers:** REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (`- [ ]`) syntax for tracking.
4+
5+
**Goal:** Make test-driver's gap analysis find and fill gaps across all applicable test categories (unit, integration, e2e, contract, security, UI), not just unit tests.
6+
7+
**Architecture:** Three skill markdown files are edited. gap-analysis gets category-aware coverage mapping in Step 5. convergence-loop gets category-specific generation guidance in its GENERATE phase. test-design gets a new Section 8 on non-unit test design principles.
8+
9+
**Tech Stack:** Markdown skill files (no code, no tests, no build)
10+
11+
**Spec:** `docs/superpowers/specs/2026-03-16-test-driver-category-awareness-design.md`
12+
13+
---
14+
15+
## Chunk 1: All Tasks
16+
17+
### Task 1: Rewrite gap-analysis Step 5
18+
19+
**Files:**
20+
- Modify: `plugins/test-driver/skills/gap-analysis/SKILL.md` (lines 81-89, current Step 5)
21+
22+
The current Step 5 has three category-blind bullet points. Replace it with a two-phase approach that classifies tests into categories then maps coverage per source file per category. Also update Step 3 (line 65) to explicitly note that categorization output feeds Step 5.
23+
24+
- [ ] **Step 1: Update Step 3 to note categorization feeds Step 5**
25+
26+
In Step 3 (Inventory Existing Tests), the line "Categorize each test file by type based on directory structure (`tests/unit/`, `tests/integration/`) or pytest markers" should be expanded to clarify this classification is used by Step 5 for per-category coverage mapping.
27+
28+
Change line 65 from:
29+
```
30+
- Categorize each test file by type based on directory structure (`tests/unit/`, `tests/integration/`) or pytest markers
31+
```
32+
To:
33+
```
34+
- Categorize each test file by type based on directory structure (`tests/unit/`, `tests/integration/`) or pytest markers. This classification feeds Step 5's per-category coverage mapping.
35+
```
36+
37+
- [ ] **Step 2: Replace Step 5 content**
38+
39+
Replace lines 81-89 (the current Step 5 heading and content) with:
40+
41+
```markdown
42+
## Step 5: Map Coverage Per Category
43+
44+
For each source file and each applicable category (from the profile), determine whether test coverage exists in that specific category.
45+
46+
### Phase 1: Classify Existing Tests
47+
48+
Use the categorization from Step 3. Classification priority:
49+
50+
1. **Directory structure**: Test files under `tests/unit/`, `tests/integration/`, `tests/e2e/`, `tests/contract/`, `tests/security/`, `tests/ui/` are classified by their directory.
51+
2. **Pytest markers**: Test files using `@pytest.mark.unit`, `@pytest.mark.integration`, etc. are classified by their markers. A file can belong to multiple categories if it has multiple markers.
52+
3. **Conservative fallback**: Test files that have neither a category directory nor markers are classified as **unit**. This intentionally over-reports gaps for non-unit categories; under-reporting is the problem this methodology exists to solve.
53+
54+
### Phase 2: Per-Source-File, Per-Category Mapping
55+
56+
For each source file, for each applicable category:
57+
58+
- Is there a test file **classified in that category** (from Phase 1) that imports or references this source file?
59+
- Use the same structural mapping techniques (import scanning, naming conventions, content grep) but scoped to the test files in that specific category.
60+
61+
A source file that has unit tests but no integration tests still has an **integration gap**. A source file with no tests in any category has gaps in every applicable category.
62+
63+
This is structural mapping (test file exists in the right category and references the source), not runtime coverage. Runtime coverage requires executing the test suite, which happens during the convergence loop.
64+
```
65+
66+
- [ ] **Step 3: Verify Step 6 references are coherent**
67+
68+
Read Step 6 after the edit. The current text at line 93 says "For each source file missing test coverage in an applicable category, create a gap entry." This already works correctly with the new Step 5 output since Step 5 now actually produces per-category data. No change needed to Step 6 unless it reads awkwardly after the edit.
69+
70+
- [ ] **Step 4: Commit**
71+
72+
```bash
73+
git add plugins/test-driver/skills/gap-analysis/SKILL.md
74+
git commit -m "fix(test-driver): make gap analysis category-aware in Step 5
75+
76+
Step 5 now classifies tests into categories (by directory, markers, or
77+
conservative fallback to unit) and maps coverage per source file per
78+
category. Source files with unit tests but no integration/security/etc
79+
tests now correctly show gaps in those categories."
80+
```
81+
82+
### Task 2: Add category-specific generation guidance to convergence-loop
83+
84+
**Files:**
85+
- Modify: `plugins/test-driver/skills/convergence-loop/SKILL.md` (insert after GENERATE section, after line 54)
86+
87+
- [ ] **Step 1: Add category-specific subsection after GENERATE**
88+
89+
After line 54 (the last bullet of the GENERATE section: "Place test files according to the profile's discovery conventions"), insert:
90+
91+
```markdown
92+
93+
#### Category-Specific Generation
94+
95+
When generating tests for non-unit gaps, adapt the test approach to match the category:
96+
97+
| Category | Approach |
98+
|----------|----------|
99+
| **Unit** | Mock external dependencies. Test isolated function/class behavior. One function per test. |
100+
| **Integration** | Use real components (test database, actual HTTP client, real service instances). Assert on observable outcomes across component boundaries, not internal state. |
101+
| **E2E** | Full request lifecycle through the actual app stack with minimal mocking. Test critical user-facing workflows (e.g., authenticate, perform action, verify result). Accept slower execution. |
102+
| **Contract** | Validate API response schemas, status codes, required fields, content-type headers, and error response shapes. Use schema validation (jsonschema, pydantic model parsing) rather than value equality. Tests should pass regardless of data state. |
103+
| **Security** | Each test represents a specific attack vector: SQL injection in user inputs, auth token manipulation, accessing resources without credentials, accessing another user's resources. Assert the attack fails gracefully (proper error code, no data leakage in error messages). |
104+
| **UI** | Use the framework's UI testing tool (pytest-qt, XCUITest, Charlotte). Interact via accessibility identifiers. Assert on what the user sees (text content, visibility, enabled state), not internal widget state. |
105+
106+
#### Category Ordering
107+
108+
When the gap report contains gaps across multiple categories, generate tests in this order:
109+
110+
1. **Unit** — fastest to write and run, catches the most bugs per iteration
111+
2. **Integration** — validates component interactions
112+
3. **Contract** / **Security** — validates API shape and attack resistance
113+
4. **E2E** / **UI** — slowest, run last
114+
115+
Within each category, follow the gap report's priority ordering (high before medium before low).
116+
```
117+
118+
- [ ] **Step 2: Commit**
119+
120+
```bash
121+
git add plugins/test-driver/skills/convergence-loop/SKILL.md
122+
git commit -m "feat(test-driver): add category-specific test generation guidance
123+
124+
The convergence loop's GENERATE phase now has explicit guidance for
125+
writing integration, e2e, contract, security, and UI tests, plus a
126+
category ordering preference for efficient convergence."
127+
```
128+
129+
### Task 3: Add non-unit test design principles to test-design
130+
131+
**Files:**
132+
- Modify: `plugins/test-driver/skills/test-design/SKILL.md` (append after Section 7, around line 177)
133+
134+
- [ ] **Step 1: Add Section 8**
135+
136+
Append after the end of Section 7 (Meaningful Assertions):
137+
138+
```markdown
139+
140+
## 8. Non-Unit Test Design
141+
142+
Sections 1-7 apply universally, but some principles shift weight when writing non-unit tests.
143+
144+
### Integration Tests
145+
146+
Relax isolation (Section 1): the point of integration tests is verifying that components work together. Use real dependencies where feasible (test database, actual HTTP client, real service wiring). Keep test independence (each test sets up its own state), but don't mock the interactions you're trying to test.
147+
148+
Assert on observable outcomes across boundaries: data persisted correctly, response includes data assembled from multiple components, side effects propagated through the real dependency chain. Avoid asserting on internal state of intermediate components.
149+
150+
### Contract Tests
151+
152+
Test the shape, not the content. Assert on response structure (required fields present, correct types, proper status codes, expected content-type headers, error response format). Use schema validation (jsonschema, pydantic model parsing) rather than value equality.
153+
154+
Contract tests should pass regardless of what data is in the system. If a contract test breaks when test data changes, it's testing values, not shape.
155+
156+
### Security Tests
157+
158+
Each test represents one attack vector. Write the test as an attacker would attempt the attack: SQL injection in a user input field, manipulated auth tokens, requests without credentials, accessing another user's resources via ID enumeration.
159+
160+
Assert that the attack fails gracefully: proper HTTP error code (401/403, not 500), no sensitive data leaked in error messages or response bodies, no state corruption from the malicious input.
161+
162+
### E2E Tests
163+
164+
Test user-facing workflows from entry point to final result. Minimize mocking: the value of E2E tests is proving the full stack works together. Accept slower execution as the cost of this confidence.
165+
166+
Focus on critical paths (authenticate, perform primary action, verify result) rather than exhaustive feature coverage. A few high-quality E2E tests covering the main workflows are worth more than dozens covering edge cases.
167+
168+
### UI Tests
169+
170+
Test what the user sees and does, not implementation details. Click buttons, fill forms, navigate between screens, verify visible outcomes (text content, element visibility, enabled/disabled state).
171+
172+
Use accessibility identifiers or object names for element lookup, not CSS selectors or internal widget hierarchy. If a test breaks because the widget tree changed but the user experience didn't, the test is too tightly coupled to implementation.
173+
```
174+
175+
- [ ] **Step 2: Commit**
176+
177+
```bash
178+
git add plugins/test-driver/skills/test-design/SKILL.md
179+
git commit -m "feat(test-driver): add non-unit test design principles
180+
181+
Section 8 covers how universal test design principles shift for
182+
integration, contract, security, e2e, and UI tests."
183+
```
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Test Driver: Category-Aware Gap Analysis
2+
3+
**Date:** 2026-03-16
4+
**Scope:** Three skill files in `plugins/test-driver/skills/`
5+
**Problem:** Gap analysis only finds unit test gaps despite profiles defining six test categories
6+
7+
## Problem Statement
8+
9+
The gap-analysis skill's Step 5 (Map Coverage) is category-blind. It checks whether *any* test file references a source file, but never checks which *category* of test provides that coverage. When `test_auth.py` has unit tests for `auth.py`, the structural mapping considers `auth.py` "covered" across all categories. This causes the gap report to miss integration, e2e, contract, and security gaps entirely.
10+
11+
The convergence loop then only generates unit tests because those are the only gaps it receives.
12+
13+
## Changes
14+
15+
### 1. gap-analysis/SKILL.md — Step 5: Per-Category Coverage Mapping
16+
17+
Replace the current three-point structural mapping with a two-phase approach.
18+
19+
**Phase 1: Classify existing test files into categories.**
20+
Priority order for classification:
21+
1. Directory structure: `tests/unit/`, `tests/integration/`, `tests/e2e/`, `tests/contract/`, `tests/security/`, `tests/ui/`
22+
2. Pytest markers: `@pytest.mark.unit`, `@pytest.mark.integration`, etc.
23+
3. Conservative fallback: if neither directory nor marker is present, classify all tests as "unit"
24+
25+
The conservative fallback ensures that unorganized test suites flag gaps for every non-unit applicable category. This may over-report gaps, but under-reporting is worse (the whole reason this fix exists).
26+
27+
**Phase 2: Per-source-file, per-category mapping.**
28+
For each source file and each applicable category (from the profile): does a test file *classified in that category* reference this source file? A source file with unit tests but no integration tests has an integration gap.
29+
30+
Step 6 (prioritization) stays unchanged but now receives multi-category gap data, producing a report with gaps across all applicable categories.
31+
32+
### 2. convergence-loop/SKILL.md — GENERATE Phase: Category-Specific Guidance
33+
34+
Add a new subsection under GENERATE that tells Claude what distinguishes each category's tests:
35+
36+
| Category | Key Difference from Unit Tests |
37+
|----------|-------------------------------|
38+
| Unit | Mock external dependencies, test isolated function/class behavior |
39+
| Integration | Use real components (test DB, actual services), test data flow across component boundaries |
40+
| E2E | Full request lifecycle through the actual app stack, minimal mocking |
41+
| Contract | Validate API response schemas, status codes, error shapes, content-type headers; use schema validation, not value equality |
42+
| Security | Auth bypass, input injection (SQL/XSS/command), secrets in responses, rate limiting, CORS |
43+
| UI | User interaction via framework tools (pytest-qt, XCUITest, Charlotte); assert on visible outcomes |
44+
45+
Add category ordering preference: unit first (fastest feedback loop), then integration, then contract/security/e2e, then UI. This keeps the convergence loop efficient.
46+
47+
### 3. test-design/SKILL.md — Non-Unit Test Design Principles
48+
49+
Add a new section (Section 8) covering what changes when writing non-unit tests:
50+
51+
- **Integration tests**: Relax isolation; the point is testing component interaction. Use real dependencies. Assert on observable outcomes across boundaries, not internal state.
52+
- **Contract tests**: Test the shape, not the content. Assert on structure, required fields, types. Use schema validation (jsonschema, pydantic). Should pass regardless of data state.
53+
- **Security tests**: Each test represents a specific attack vector. Assert the attack fails gracefully with proper error codes and no data leakage in error messages.
54+
- **E2E tests**: Test user-facing workflows end-to-end. Minimize mocking. Accept slower execution. Focus on critical paths.
55+
- **UI tests**: Test user interaction, not implementation. Use accessibility identifiers. Assert on what the user sees.
56+
57+
## Files Modified
58+
59+
| File | Type of Change |
60+
|------|---------------|
61+
| `plugins/test-driver/skills/gap-analysis/SKILL.md` | Rewrite Step 5, minor adjustment to Step 6 |
62+
| `plugins/test-driver/skills/convergence-loop/SKILL.md` | Add subsection under GENERATE |
63+
| `plugins/test-driver/skills/test-design/SKILL.md` | Add Section 8 |
64+
65+
## What Does NOT Change
66+
67+
- Stack profiles (they already define applicable categories correctly)
68+
- TEST_STATUS.json schema (already has per-category fields)
69+
- The analyze command (it delegates to gap-analysis)
70+
- testing-mindset skill (it only drives awareness, not execution)
71+
- test-status skill (schema already supports multi-category data)

plugins/test-driver/skills/convergence-loop/SKILL.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,30 @@ Write 3-5 tests per batch, targeting the highest-priority unfilled gaps.
5353
- Consult framework-specific plugins when available (e.g., `python-dev:python-testing-patterns`)
5454
- Place test files according to the profile's discovery conventions
5555

56+
#### Category-Specific Generation
57+
58+
When generating tests for non-unit gaps, adapt the test approach to match the category:
59+
60+
| Category | Approach |
61+
|----------|----------|
62+
| **Unit** | Mock external dependencies. Test isolated function/class behavior. One function per test. |
63+
| **Integration** | Use real components (test database, actual HTTP client, real service instances). Assert on observable outcomes across component boundaries, not internal state. |
64+
| **E2E** | Full request lifecycle through the actual app stack with minimal mocking. Test critical user-facing workflows (e.g., authenticate, perform action, verify result). Accept slower execution. |
65+
| **Contract** | Validate API response schemas, status codes, required fields, content-type headers, and error response shapes. Use schema validation (jsonschema, pydantic model parsing) rather than value equality. Tests should pass regardless of data state. |
66+
| **Security** | Each test represents a specific attack vector: SQL injection in user inputs, auth token manipulation, accessing resources without credentials, accessing another user's resources. Assert the attack fails gracefully (proper error code, no data leakage in error messages). |
67+
| **UI** | Use the framework's UI testing tool (pytest-qt, XCUITest, Charlotte). Interact via accessibility identifiers. Assert on what the user sees (text content, visibility, enabled state), not internal widget state. |
68+
69+
#### Category Ordering
70+
71+
When the gap report contains gaps across multiple categories, generate tests in this order:
72+
73+
1. **Unit** — fastest to write and run, catches the most bugs per iteration
74+
2. **Integration** — validates component interactions
75+
3. **Contract** / **Security** — validates API shape and attack resistance
76+
4. **E2E** / **UI** — slowest, run last
77+
78+
Within each category, follow the gap report's priority ordering (high before medium before low).
79+
5680
### RUN
5781

5882
Execute the test suite using the command from the stack profile:

0 commit comments

Comments
 (0)