cloudflare · jasnell · Mar 19, 2026 · Mar 5, 2026 · Mar 5, 2026 · Mar 5, 2026
@@ -3,51 +3,84 @@ description: Investigate a bug from a Sentry issue or error description, biasing
 subtask: false
 ---
 
-Load the `test-driven-investigation`, `investigation-notes`, `find-and-run-tests`, `parent-project-skills`, and `dad-jokes` skills, then investigate: $ARGUMENTS
+Load the `test-driven-investigation`, `investigation-notes`, `find-and-run-tests`,
+`parent-project-skills`, and `dad-jokes` skills, then investigate: $ARGUMENTS
 
 ## Prerequisites
 
-This command uses Sentry MCP tools when given a Sentry issue ID or URL. The Sentry MCP connection requires a user-specific `X-Sentry-Token` header configured in `~/.config/opencode/opencode.json` under `mcp.sentry.headers`. If the Sentry tools fail with auth errors, tell the user to check their token configuration and stop — do not guess at issue details.
+Use the Sentry MCP tools when given a Sentry issue ID or URL. The Sentry MCP connection requires a
+user-specific `X-Sentry-Token` header configured in `~/.config/opencode/opencode.json` under
+`mcp.sentry.headers`. If the Sentry tools fail with auth errors, tell the user to check their token
+configuration and stop — do not guess at issue details.
 
 ## Parsing the argument
 
 The argument can be:
 
 - **A Sentry issue ID** (e.g., `6181478`) — fetch from Sentry
 - **A Sentry short ID** (e.g., `EDGEWORKER-RUNTIME-4MS`) — fetch from Sentry
-- **A Sentry URL** (e.g., `https://sentry.io/organizations/.../issues/...`) — extract the issue ID, fetch from Sentry
-- **A plain text description** (e.g., `"concurrent write()s not allowed" in kj/compat/http.c++`) — skip Sentry, go straight to orientation
+- **A Sentry URL** (e.g., `https://sentry.io/organizations/.../issues/...`) — extract the issue ID,
+  fetch from Sentry
+- **A plain text description** (e.g., `"concurrent write()s not allowed" in kj/compat/http.c++`) —
+  skip Sentry, go straight to orientation
 
 ## Steps
 
+### 0. Create a tracking document
+
+Create a tracking document in the `investigation-notes` tool to keep track of hypotheses, code read,
+and test results. **Always** actively consult and update this document throughout to avoid losing
+insights, going in circles, or forgetting what you've tried. See the "Investigation Notes" section
+below for format and rules.
+
 ### 1. Extract the error
 
 **If Sentry issue:**
 
 1. Fetch the issue details with `sentry_get_sentry_issue`.
-2. Fetch the most recent event with `sentry_list_sentry_issue_events` (limit 1), then `sentry_get_sentry_event` to get the full stack trace.
+2. Fetch the most recent event with `sentry_list_sentry_issue_events` (limit 1), then
+   `sentry_get_sentry_event` to get the full stack trace.
 3. Extract:
    - The **error message** (assertion text, exception message, crash description)
    - The **assertion/crash site** (file and line from the top of the stack)
-   - The **entry point** (the outermost workerd/KJ/capnp frame in the stack — where the operation started)
+   - The **entry point** (the outermost workerd/KJ/capnp frame in the stack — where the operation
+     started)
+   - The **time range** of occurrences (when it started, if it's increasing in rate, etc.)
+   - Identify the issue **status**: is it new, regressed, or longstanding
 
 **If plain text:** Parse the error message and file reference from the description.
 
-**Output to user:** The error message, crash site, and entry point. One short paragraph. Do not go deeper yet.
+**Output to user:** The error message, crash site, and entry point, time range, and status.
+One short paragraph. Do not go deeper yet.
 
 ### 2. Orient
 
 Find three things:
 
-1. **The crash site source.** Read the assertion/crash line and its immediate context (~50 lines). Understand what invariant was violated and what state would cause it. If the crash is in a C++ class method, **use the `cross-reference` tool** to quickly locate the header, implementation files, JSG registration, and test files for that class.
+1. **The crash site source.** Read the assertion/crash line and its immediate context (~50 lines).
+   Understand what invariant was violated and what state would cause it. If the crash is in a C++
+   class method, **use the `cross-reference` tool** to quickly locate the header, implementation
+   files, JSG registration, and test files for that class.
+
+2. **Recent changes.** If the incident being investigated started, re-occurred, or increased in rate
+   recently, look at the git history around the crash site to see if recent changes may have caused
+   the bug. Use `git blame` to find when the crash line or the code around it was last modified, and
+   `git log` to see recent commits in that file.
 
-2. **The test file.** Use `/find-test` on the source file containing the crash site (the cross-reference output may already list relevant test files). If no test exists, identify the nearest test file in the same directory.
+3. **The test file.** Use `/find-test` on the source file containing the crash site (the
+   cross-reference output may already list relevant test files). If no test exists, identify the
+   nearest test file in the same directory.
 
-3. **Existing feature tests.** Search for existing tests that exercise the _feature_ involved in the bug — not just tests near the crash site file. The crash may be in `pipeline.c++` but the relevant working test may be an integration test in a completely different directory. These existing tests encode setup, verification, and framework patterns you need. They are your starting template.
+4. **Existing feature tests.** Search for existing tests that exercise the _feature_ involved in the
+   bug — not just tests near the crash site file. The crash may be in `pipeline.c++` but the relevant
+   working test may be an integration test in a completely different directory. These existing tests
+   encode setup, verification, and framework patterns you need. They are your starting template.
 
-4. **The build command.** Construct the exact `bazel test` invocation to run a single test case from that test file.
+5. **The build command.** Construct the exact `bazel test` invocation to run a single test case from
+   that test file.
 
-**Output to user:** The crash site with a one-sentence explanation of the invariant, the test file path, and the build command.
+**Output to user:** The crash site with a one-sentence explanation of the invariant, the test file
+path, and the build command.
 
 ### 3. Hypothesize
 
@@ -57,43 +90,64 @@ Form a hypothesis in the format:
 
 This does not need to be correct. It needs to be testable. State it to the user.
 
-Ask for clarification or additional details if you cannot form a hypothesis with the information you have. But do not ask for more information just to delay writing a test.
+Ask for clarification or additional details if you cannot form a hypothesis with the information
+you have. But do not ask for more information just to delay writing a test.
 
 ### 4. Write the test
 
-**Start from an existing test if one exists** (from step 2.3). Clone it and modify the single variable that your hypothesis targets (disable an autogate, change a config flag, alter the setup). This is almost always faster and more correct than writing from scratch, because existing tests already have the right verification (subrequest checks, expected log patterns, shutdown handling).
+**Start from an existing test if one exists** (from step 2.3). Clone it and modify the single
+variable that your hypothesis targets (disable an autogate, change a config flag, alter the setup).
+This is almost always faster and more correct than writing from scratch, because existing tests
+already have the right verification (subrequest checks, expected log patterns, shutdown handling).
 
 If no existing test is suitable, write a new one that:
 
 - Sets up the minimum state to reach the crash site
 - Performs the operation described in the hypothesis
-- **Includes observable verification** — the test must check that the feature actually ran, not just that nothing crashed. Use subrequest expectations, check for feature-specific log lines, or verify side effects.
+- **Includes observable verification** — the test must check that the feature actually ran, not
+  just that nothing crashed. Use subrequest expectations, check for feature-specific log lines, or
+  verify side effects.
 - Asserts the expected behavior (what _should_ happen if the bug didn't exist)
 
 Keep it short. Prefer public API. Do not try to reproduce the full production call stack.
 
-### 5. Run the test
+Do not interrupt your flow to investigate tangents while writing the test. If you
+realize you need to understand something else to write the test, make a note of it
+and move on — you can investigate it in the next iteration if the test doesn't
+reproduce the bug.
 
-Build and run using the command from step 2. **Start the build immediately.** Do not read more code before starting the build.
+### 5. Run the test
 
-While waiting for the build:
+Build and run using the command from step 2. **Start the build immediately.** Do not read more code
+before starting the build.
 
-- Read code that would inform the **next** test iteration if this one doesn't reproduce the bug
-- Do NOT use the wait time to second-guess the current test
+Using parallel sub-agents, waiting for the build, read code that would inform the **next** test
+iteration if this one doesn't reproduce the bug
 
 ### 6. Validate and iterate
 
-**After every test run, first:** Update the tracking document (if using one). Then check the test output for evidence the code path was exercised — feature-specific log lines, subrequests, RPC calls. A test that passes with no evidence the feature ran is not a valid result.
+After every test run:
+
+1. **Always** update the tracking document (if using one)
+2. **Always** check the test output for evidence the code path was exercised — feature-specific log
+   lines, subrequests, RPC calls. A test that passes with no evidence the feature ran is not a valid
+   result.
 
 Based on the result:
 
-- **Test fails as expected** → the mechanism is confirmed. Report findings to the user. Read code with purpose to find the fix, not to find the bug.
-- **Test passes with evidence the feature ran** → hypothesis was wrong. Adjust the hypothesis, update the test, run again. Tell the user what you learned.
-- **Test passes with NO evidence the feature ran** → the test is not exercising the code path. Do not read more source code to explain why. Fix the test first — compare it against existing working tests to find what's missing.
-- **Test doesn't compile** → fix the compilation error and rerun. This is not a setback, it's a normal part of the process.
+- **Test fails as expected** → the mechanism is confirmed. Report findings to the user. Read code
+  with purpose to find the fix, not to find the bug.
+- **Test passes with evidence the feature ran** → hypothesis was wrong. Adjust the hypothesis,
+  update the test, run again. Tell the user what you learned.
+- **Test passes with NO evidence the feature ran** → the test is not exercising the code path. Do
+  not read more source code to explain why. Fix the test first — compare it against existing working
+  tests to find what's missing.
+- **Test doesn't compile** → fix the compilation error and rerun. This is not a setback, it's a
+  normal part of the process.
 - **Test crashes differently** → follow the new trail but note the divergence. Tell the user.
 
-Repeat until the bug mechanism is confirmed or you've exhausted reasonable hypotheses (at which point, report what you've tried and what you've ruled out).
+Repeat until the bug mechanism is confirmed or you've exhausted reasonable hypotheses (at which
+point, report what you've tried and what you've ruled out).
 
 ### 7. Report
 
@@ -106,10 +160,21 @@ When the mechanism is confirmed, output:
 
 ## Rules
 
-- **Work in parallel whenever possible.** Don't wait for the build to finish before reading code that would inform the next test iteration. Use the build time to maximize your learning and progress. Investigate multiple hypotheses in parallel if you can, but do not let this delay writing and running tests.
-- **Do not spend more too much time reading code before the first test is written and building.** If you hit 15 tool calls, write whatever test you can with your current understanding.
-- **Do not re-read the same function more than twice.** If you catch yourself doing this, write a test immediately.
-- **Do not try to trace the full call stack before writing a test.** The test will tell you if your understanding is correct.
+- **Always work in parallel whenever possible.** Don't wait for the build to finish before reading
+  code that would inform the next test iteration. Use the build time to maximize your learning and
+  progress. Investigate multiple hypotheses in parallel if you can, but do not let this delay
+  writing and running tests.
+- **Do not keep endlessly reading code before the first test is written and building.** If you hit
+  15 tool calls, write whatever test you can with your current understanding.
+- **Do not re-read the same function more than twice.** If you catch yourself doing this, update
+  the tracking document to record findings and write a test immediately.
+- **Do not trace the full call stack before writing a test.** The test will tell you if your
+  understanding is correct.
 - **Every hypothesis must be tested, not just reasoned about.**
-- **Update the tracking document with each iteration.** If a tracking document is being used, update the hypotheses, code read, and test results sections so you have a clear record of your investigation process. Particularly after compaction, if the tracking document is outdated, update it before coninuing to the next step.
-- **Never** miss an opportunity for a good dad joke (using the `dad-jokes` skill). Don't overdo it, but don't avoid them either. When summarizing, **always** preserve any jokes from the subagent output, and **always** including the intro prefix ("Here's a dad joke for you:", etc.) so the user knows it's intentional.
+- **Update the tracking document with each iteration.** If a tracking document is being used, update
+  the hypotheses, code read, and test results sections so you have a clear record of your
+  investigation process. After compaction, **always** update the tracking document before continuing
+  to the next step.
+- **Never** miss an opportunity for a good dad joke (using the `dad-jokes` skill). Don't overdo it.
+  When summarizing, **always** preserve any jokes from the subagent output, and **always** including
+  the intro prefix ("Here's a dad joke for you:", etc.) so the user knows it's intentional.
@@ -0,0 +1,60 @@
+---
+name: bazel-test-hygiene
+description: Mandatory rules for running bazel tests during development. Load this skill before running any bazel test command, especially when validating fixes or verifying regression tests. Prevents false confidence from cached results, filter flags that silently match nothing, and partial test runs that miss breakage.
+---
+
+# Bazel Test Hygiene
+
+## The Three Rules
+
+### 1. Always disable caching
+
+```bash
+bazel test //... --nocache_test_results
+```
+
+**Why:** Bazel's action cache can serve stale test binaries even after you edit source files. Without `--nocache_test_results`, you may be running the OLD binary and seeing OLD results. This is not hypothetical — it has caused real false-positive/false-negative confusion in this repo.
+
+**Always include `--nocache_test_results`.** No exceptions.
+
+### 2. Keep it simple — no filter flags
+
+Do NOT use `--test_arg='-f'` or similar filter flags to run individual test cases.
+
+**Why:** KJ test's `-f` flag silently passes when zero tests match. If you typo the filter or the test name changes, bazel reports "PASSED" with zero tests actually run. This gives completely false confidence.
+
+**Run the full test target.** If you need to check a specific test, look for its name in the full output. If the full suite is too slow, run the specific test _target_ (e.g., `//src/workerd/api:streams/standard-test@`), not a filtered subset within a target.
+
+### 3. Run the full suite before claiming done
+
+A single test target passing does not mean you haven't broken something else. Fixes to shared code (queue.c++, standard.c++, common.h) can break tests in completely different directories.
+
+**Before claiming any fix is complete:**
+
+```bash
+bazel test //... --nocache_test_results
+```
+
+Check the final summary line: `Executed N out of N tests: N tests pass.` All N must match. If any test fails, the fix is not done.
+
+## Red-Green Verification for Regression Tests
+
+When writing a regression test for a bug fix, you MUST verify the test actually catches the bug:
+
+1. **Green:** Run `bazel test //... --nocache_test_results` — all tests pass (fix in place)
+2. **Red:** Remove the fix, run `bazel test //... --nocache_test_results` — the new test(s) MUST fail
+3. **Green:** Restore the fix, run `bazel test //... --nocache_test_results` — all tests pass again
+
+If step 2 passes (test doesn't fail without the fix), the test is not testing what you think. Go back and fix the test.
+
+**Do the red-green on the full suite**, not just the one target. This catches two problems at once: (a) the regression test actually detects the bug, and (b) the fix doesn't break anything else.
+
+## Anti-Patterns
+
+| Don't                                     | Do instead                                          |
+| ----------------------------------------- | --------------------------------------------------- |
+| `bazel test //target` (no cache flag)     | `bazel test //target --nocache_test_results`        |
+| `--test_arg='-f' --test_arg='test name'`  | Run the full target, grep output for test name      |
+| Run one target, claim fix is done         | Run `//...`, check all-pass summary                 |
+| Claim "tests pass" from a previous run    | Run fresh, read fresh output                        |
+| Trust filter-based "PASSED" at face value | Check that the expected test names appear in output |