Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 97 additions & 32 deletions .opencode/commands/investigate.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,51 +3,84 @@ description: Investigate a bug from a Sentry issue or error description, biasing
subtask: false
---

Load the `test-driven-investigation`, `investigation-notes`, `find-and-run-tests`, `parent-project-skills`, and `dad-jokes` skills, then investigate: $ARGUMENTS
Load the `test-driven-investigation`, `investigation-notes`, `find-and-run-tests`,
`parent-project-skills`, and `dad-jokes` skills, then investigate: $ARGUMENTS

## Prerequisites

This command uses Sentry MCP tools when given a Sentry issue ID or URL. The Sentry MCP connection requires a user-specific `X-Sentry-Token` header configured in `~/.config/opencode/opencode.json` under `mcp.sentry.headers`. If the Sentry tools fail with auth errors, tell the user to check their token configuration and stop — do not guess at issue details.
Use the Sentry MCP tools when given a Sentry issue ID or URL. The Sentry MCP connection requires a
user-specific `X-Sentry-Token` header configured in `~/.config/opencode/opencode.json` under
`mcp.sentry.headers`. If the Sentry tools fail with auth errors, tell the user to check their token
configuration and stop — do not guess at issue details.

## Parsing the argument

The argument can be:

- **A Sentry issue ID** (e.g., `6181478`) — fetch from Sentry
- **A Sentry short ID** (e.g., `EDGEWORKER-RUNTIME-4MS`) — fetch from Sentry
- **A Sentry URL** (e.g., `https://sentry.io/organizations/.../issues/...`) — extract the issue ID, fetch from Sentry
- **A plain text description** (e.g., `"concurrent write()s not allowed" in kj/compat/http.c++`) — skip Sentry, go straight to orientation
- **A Sentry URL** (e.g., `https://sentry.io/organizations/.../issues/...`) — extract the issue ID,
fetch from Sentry
- **A plain text description** (e.g., `"concurrent write()s not allowed" in kj/compat/http.c++`) —
skip Sentry, go straight to orientation

## Steps

### 0. Create a tracking document

Create a tracking document in the `investigation-notes` tool to keep track of hypotheses, code read,
and test results. **Always** actively consult and update this document throughout to avoid losing
insights, going in circles, or forgetting what you've tried. See the "Investigation Notes" section
below for format and rules.

### 1. Extract the error

**If Sentry issue:**

1. Fetch the issue details with `sentry_get_sentry_issue`.
2. Fetch the most recent event with `sentry_list_sentry_issue_events` (limit 1), then `sentry_get_sentry_event` to get the full stack trace.
2. Fetch the most recent event with `sentry_list_sentry_issue_events` (limit 1), then
`sentry_get_sentry_event` to get the full stack trace.
3. Extract:
- The **error message** (assertion text, exception message, crash description)
- The **assertion/crash site** (file and line from the top of the stack)
- The **entry point** (the outermost workerd/KJ/capnp frame in the stack — where the operation started)
- The **entry point** (the outermost workerd/KJ/capnp frame in the stack — where the operation
started)
- The **time range** of occurrences (when it started, if it's increasing in rate, etc.)
- Identify the issue **status**: is it new, regressed, or longstanding

**If plain text:** Parse the error message and file reference from the description.

**Output to user:** The error message, crash site, and entry point. One short paragraph. Do not go deeper yet.
**Output to user:** The error message, crash site, and entry point, time range, and status.
One short paragraph. Do not go deeper yet.

### 2. Orient

Find three things:

1. **The crash site source.** Read the assertion/crash line and its immediate context (~50 lines). Understand what invariant was violated and what state would cause it. If the crash is in a C++ class method, **use the `cross-reference` tool** to quickly locate the header, implementation files, JSG registration, and test files for that class.
1. **The crash site source.** Read the assertion/crash line and its immediate context (~50 lines).
Understand what invariant was violated and what state would cause it. If the crash is in a C++
class method, **use the `cross-reference` tool** to quickly locate the header, implementation
files, JSG registration, and test files for that class.

2. **Recent changes.** If the incident being investigated started, re-occurred, or increased in rate
recently, look at the git history around the crash site to see if recent changes may have caused
the bug. Use `git blame` to find when the crash line or the code around it was last modified, and
`git log` to see recent commits in that file.

2. **The test file.** Use `/find-test` on the source file containing the crash site (the cross-reference output may already list relevant test files). If no test exists, identify the nearest test file in the same directory.
3. **The test file.** Use `/find-test` on the source file containing the crash site (the
cross-reference output may already list relevant test files). If no test exists, identify the
nearest test file in the same directory.

3. **Existing feature tests.** Search for existing tests that exercise the _feature_ involved in the bug — not just tests near the crash site file. The crash may be in `pipeline.c++` but the relevant working test may be an integration test in a completely different directory. These existing tests encode setup, verification, and framework patterns you need. They are your starting template.
4. **Existing feature tests.** Search for existing tests that exercise the _feature_ involved in the
bug — not just tests near the crash site file. The crash may be in `pipeline.c++` but the relevant
working test may be an integration test in a completely different directory. These existing tests
encode setup, verification, and framework patterns you need. They are your starting template.

4. **The build command.** Construct the exact `bazel test` invocation to run a single test case from that test file.
5. **The build command.** Construct the exact `bazel test` invocation to run a single test case from
that test file.

**Output to user:** The crash site with a one-sentence explanation of the invariant, the test file path, and the build command.
**Output to user:** The crash site with a one-sentence explanation of the invariant, the test file
path, and the build command.

### 3. Hypothesize

Expand All @@ -57,43 +90,64 @@ Form a hypothesis in the format:

This does not need to be correct. It needs to be testable. State it to the user.

Ask for clarification or additional details if you cannot form a hypothesis with the information you have. But do not ask for more information just to delay writing a test.
Ask for clarification or additional details if you cannot form a hypothesis with the information
you have. But do not ask for more information just to delay writing a test.

### 4. Write the test

**Start from an existing test if one exists** (from step 2.3). Clone it and modify the single variable that your hypothesis targets (disable an autogate, change a config flag, alter the setup). This is almost always faster and more correct than writing from scratch, because existing tests already have the right verification (subrequest checks, expected log patterns, shutdown handling).
**Start from an existing test if one exists** (from step 2.3). Clone it and modify the single
variable that your hypothesis targets (disable an autogate, change a config flag, alter the setup).
This is almost always faster and more correct than writing from scratch, because existing tests
already have the right verification (subrequest checks, expected log patterns, shutdown handling).

If no existing test is suitable, write a new one that:

- Sets up the minimum state to reach the crash site
- Performs the operation described in the hypothesis
- **Includes observable verification** — the test must check that the feature actually ran, not just that nothing crashed. Use subrequest expectations, check for feature-specific log lines, or verify side effects.
- **Includes observable verification** — the test must check that the feature actually ran, not
just that nothing crashed. Use subrequest expectations, check for feature-specific log lines, or
verify side effects.
- Asserts the expected behavior (what _should_ happen if the bug didn't exist)

Keep it short. Prefer public API. Do not try to reproduce the full production call stack.

### 5. Run the test
Do not interrupt your flow to investigate tangents while writing the test. If you
realize you need to understand something else to write the test, make a note of it
and move on — you can investigate it in the next iteration if the test doesn't
reproduce the bug.

Build and run using the command from step 2. **Start the build immediately.** Do not read more code before starting the build.
### 5. Run the test

While waiting for the build:
Build and run using the command from step 2. **Start the build immediately.** Do not read more code
before starting the build.

- Read code that would inform the **next** test iteration if this one doesn't reproduce the bug
- Do NOT use the wait time to second-guess the current test
Using parallel sub-agents, waiting for the build, read code that would inform the **next** test
iteration if this one doesn't reproduce the bug

### 6. Validate and iterate

**After every test run, first:** Update the tracking document (if using one). Then check the test output for evidence the code path was exercised — feature-specific log lines, subrequests, RPC calls. A test that passes with no evidence the feature ran is not a valid result.
After every test run:

1. **Always** update the tracking document (if using one)
2. **Always** check the test output for evidence the code path was exercised — feature-specific log
lines, subrequests, RPC calls. A test that passes with no evidence the feature ran is not a valid
result.

Based on the result:

- **Test fails as expected** → the mechanism is confirmed. Report findings to the user. Read code with purpose to find the fix, not to find the bug.
- **Test passes with evidence the feature ran** → hypothesis was wrong. Adjust the hypothesis, update the test, run again. Tell the user what you learned.
- **Test passes with NO evidence the feature ran** → the test is not exercising the code path. Do not read more source code to explain why. Fix the test first — compare it against existing working tests to find what's missing.
- **Test doesn't compile** → fix the compilation error and rerun. This is not a setback, it's a normal part of the process.
- **Test fails as expected** → the mechanism is confirmed. Report findings to the user. Read code
with purpose to find the fix, not to find the bug.
- **Test passes with evidence the feature ran** → hypothesis was wrong. Adjust the hypothesis,
update the test, run again. Tell the user what you learned.
- **Test passes with NO evidence the feature ran** → the test is not exercising the code path. Do
not read more source code to explain why. Fix the test first — compare it against existing working
tests to find what's missing.
- **Test doesn't compile** → fix the compilation error and rerun. This is not a setback, it's a
normal part of the process.
- **Test crashes differently** → follow the new trail but note the divergence. Tell the user.

Repeat until the bug mechanism is confirmed or you've exhausted reasonable hypotheses (at which point, report what you've tried and what you've ruled out).
Repeat until the bug mechanism is confirmed or you've exhausted reasonable hypotheses (at which
point, report what you've tried and what you've ruled out).

### 7. Report

Expand All @@ -106,10 +160,21 @@ When the mechanism is confirmed, output:

## Rules

- **Work in parallel whenever possible.** Don't wait for the build to finish before reading code that would inform the next test iteration. Use the build time to maximize your learning and progress. Investigate multiple hypotheses in parallel if you can, but do not let this delay writing and running tests.
- **Do not spend more too much time reading code before the first test is written and building.** If you hit 15 tool calls, write whatever test you can with your current understanding.
- **Do not re-read the same function more than twice.** If you catch yourself doing this, write a test immediately.
- **Do not try to trace the full call stack before writing a test.** The test will tell you if your understanding is correct.
- **Always work in parallel whenever possible.** Don't wait for the build to finish before reading
code that would inform the next test iteration. Use the build time to maximize your learning and
progress. Investigate multiple hypotheses in parallel if you can, but do not let this delay
writing and running tests.
- **Do not keep endlessly reading code before the first test is written and building.** If you hit
15 tool calls, write whatever test you can with your current understanding.
- **Do not re-read the same function more than twice.** If you catch yourself doing this, update
the tracking document to record findings and write a test immediately.
- **Do not trace the full call stack before writing a test.** The test will tell you if your
understanding is correct.
- **Every hypothesis must be tested, not just reasoned about.**
- **Update the tracking document with each iteration.** If a tracking document is being used, update the hypotheses, code read, and test results sections so you have a clear record of your investigation process. Particularly after compaction, if the tracking document is outdated, update it before coninuing to the next step.
- **Never** miss an opportunity for a good dad joke (using the `dad-jokes` skill). Don't overdo it, but don't avoid them either. When summarizing, **always** preserve any jokes from the subagent output, and **always** including the intro prefix ("Here's a dad joke for you:", etc.) so the user knows it's intentional.
- **Update the tracking document with each iteration.** If a tracking document is being used, update
the hypotheses, code read, and test results sections so you have a clear record of your
investigation process. After compaction, **always** update the tracking document before continuing
to the next step.
- **Never** miss an opportunity for a good dad joke (using the `dad-jokes` skill). Don't overdo it.
When summarizing, **always** preserve any jokes from the subagent output, and **always** including
the intro prefix ("Here's a dad joke for you:", etc.) so the user knows it's intentional.
60 changes: 60 additions & 0 deletions .opencode/skills/bazel-test-hygiene/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
name: bazel-test-hygiene
description: Mandatory rules for running bazel tests during development. Load this skill before running any bazel test command, especially when validating fixes or verifying regression tests. Prevents false confidence from cached results, filter flags that silently match nothing, and partial test runs that miss breakage.
---

# Bazel Test Hygiene

## The Three Rules

### 1. Always disable caching

```bash
bazel test //... --nocache_test_results
```

**Why:** Bazel's action cache can serve stale test binaries even after you edit source files. Without `--nocache_test_results`, you may be running the OLD binary and seeing OLD results. This is not hypothetical — it has caused real false-positive/false-negative confusion in this repo.

**Always include `--nocache_test_results`.** No exceptions.

### 2. Keep it simple — no filter flags

Do NOT use `--test_arg='-f'` or similar filter flags to run individual test cases.

**Why:** KJ test's `-f` flag silently passes when zero tests match. If you typo the filter or the test name changes, bazel reports "PASSED" with zero tests actually run. This gives completely false confidence.

**Run the full test target.** If you need to check a specific test, look for its name in the full output. If the full suite is too slow, run the specific test _target_ (e.g., `//src/workerd/api:streams/standard-test@`), not a filtered subset within a target.

### 3. Run the full suite before claiming done

A single test target passing does not mean you haven't broken something else. Fixes to shared code (queue.c++, standard.c++, common.h) can break tests in completely different directories.

**Before claiming any fix is complete:**

```bash
bazel test //... --nocache_test_results
```

Check the final summary line: `Executed N out of N tests: N tests pass.` All N must match. If any test fails, the fix is not done.

## Red-Green Verification for Regression Tests

When writing a regression test for a bug fix, you MUST verify the test actually catches the bug:

1. **Green:** Run `bazel test //... --nocache_test_results` — all tests pass (fix in place)
2. **Red:** Remove the fix, run `bazel test //... --nocache_test_results` — the new test(s) MUST fail
3. **Green:** Restore the fix, run `bazel test //... --nocache_test_results` — all tests pass again

If step 2 passes (test doesn't fail without the fix), the test is not testing what you think. Go back and fix the test.

**Do the red-green on the full suite**, not just the one target. This catches two problems at once: (a) the regression test actually detects the bug, and (b) the fix doesn't break anything else.

## Anti-Patterns

| Don't | Do instead |
| ----------------------------------------- | --------------------------------------------------- |
| `bazel test //target` (no cache flag) | `bazel test //target --nocache_test_results` |
| `--test_arg='-f' --test_arg='test name'` | Run the full target, grep output for test name |
| Run one target, claim fix is done | Run `//...`, check all-pass summary |
| Claim "tests pass" from a previous run | Run fresh, read fresh output |
| Trust filter-based "PASSED" at face value | Check that the expected test names appear in output |
Loading
Loading