Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/knowledge/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ One-line pointers into the evergreen knowledge base. Newest entries at the top o

## Cross-cutting

- [Log-key allowlist (test-time gate)](../../internal/relay/log_allowlist.go) — closed `allowedLogKeys` set in `internal/relay/log_allowlist.go` paired with `TestLogKeysAreAllowlisted` in `internal/relay/log_keys_test.go`; AST-walks every non-test `.go` file in the package and fails if a `logger.{Info,Warn,Error,Debug}` call carries a key absent from the set, a non-string-literal key, or a `Logger.With/LogAttrs/Log` selector the walker can't parse. Fourth defence layer for threat-model § *Log hygiene* (alongside spec, code review, structural absence) (#36).
- [Threat model](../threat-model.md) — operational threats to the relay-as-deployed-process: deploy, supply chain, DoS, log hygiene, cert handling, TLS config, error leakage. Complements (does not replace) the protocol spec's wire-protocol Security model.
- [Project memory](../PROJECT-MEMORY.md) — what's built, patterns established, current state.
- [Lessons](../lessons.md) — gotchas worth carrying forward.
41 changes: 41 additions & 0 deletions docs/knowledge/codebase/36.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Ticket #36 — AST-walk test enforces log-key allowlist

Fourth defence layer for `docs/threat-model.md` § *Log hygiene*. Before this ticket the rule lived in three places that all rely on human attention: the threat-model doc, code review of each new log call, and the structural absence of token-bearing values in routing-path code (`docs/lessons.md` "Credentials the relay does not validate…"). One careless PR re-introduces a leak. This ticket adds a deterministic test-time gate: any `logger.{Info,Warn,Error,Debug}` call in `internal/relay/` whose key is absent from a closed in-repo allowlist fails `go test`, so the rule lands in CI rather than in production-log review.

## Implementation

- **`internal/relay/log_allowlist.go`** — unexported `var allowedLogKeys = map[string]struct{}{…}` with the seven keys that cover every existing call site: `server_id`, `conn_id`, `binary_conn_id`, `device_name`, `remote`, `binary_version`, `err`. Lives in a non-test `.go` file (not `_test.go`) so the threat-model doc can point at a stable production path. File comment names the test and the doc section so a developer adding a key sees the full contract without leaving the file.
- **`internal/relay/log_keys_test.go`** — single test, pure stdlib (`go/parser`, `go/ast`, `go/token`, `strconv`, `path/filepath`). `filepath.Glob("*.go")` from cwd, drop `_test.go` files, `parser.ParseFile(..., parser.SkipObjectResolution)` each, `ast.Inspect` for `*ast.CallExpr` whose `Fun` is a `*ast.SelectorExpr` with `Sel.Name ∈ {Info, Warn, Error, Debug}`. Skip-the-test-file is essential — the test itself contains string literals that look like log keys.
- **Selector-name match, not receiver type.** Walker matches by `Sel.Name` only, so `slog.Info(...)` is gated identically to `logger.Info(...)`. No `go/types` resolution; intentional — false positives from an unrelated type that happens to expose `Info/Warn/Error/Debug` are not present in the package today and would surface as a `file:line` violation if introduced later.
- **First-arg-must-be-string-literal narrowing.** A matched call's `Args[0]` must itself be `*ast.BasicLit` with `Kind == token.STRING`. This filters `http.Error(w, msg, code)` and anything else that shares a method name but starts with a non-string arg. The narrowing is documented in the test docstring; a future log call with a computed msg would slip past it, and the convention (and this filter) gets revisited in that PR.
- **Three distinct violation classes, all accumulated and reported together** (single trailing `t.Fatalf("found %d log-hygiene violation(s)", n)` so a developer who misnames a key in three places fixes them in one pass):
1. *Key not in allowlist* — `"%s: log key %q is not in allowedLogKeys (see internal/relay/log_allowlist.go and docs/threat-model.md § Log hygiene)"`. Names file path, line, and key without `-v`.
2. *Non-literal key argument* — `"%s: non-literal log key argument in %s call (dynamic key defeats the allowlist; use a literal string key)"`. Catches `slog.String(dynamicKey, v)`, `fmt.Sprintf("k_%s", id)`, bare variables. Stops walking that call's args (pairing can't be trusted past the first bad key).
3. *Unsupported method shape* — `"%s: %s.%s used; this method is not gated by TestLogKeysAreAllowlisted. Either convert the call to positional Info/Warn/Error/Debug or extend the walker."` for any selector with `Sel.Name ∈ {With, LogAttrs, Log}`. The "future contributor switches to `Logger.With(...).Info(...)` and bypasses the gate" hedge — ~6 lines that pay their freight the first time anyone touches log structure.
- **`docs/threat-model.md` § Log hygiene** gains a one-line **Enforcement** paragraph pointing at the allowlist file and the test as the canonical machine-checkable rule. The existing *Residual risk* and *Future hardening* paragraphs stay — the test catches authoring-time drift; runtime redaction (still future) catches operational drift and value-side leaks.

## Threats this gates, and what it deliberately does not

Captured in the test docstring and the architect's security review (verdict PASS). Summary:

- **Caught structurally** — dynamic keys (`slog.String(dynKey, v)`, `fmt.Sprintf`-built keys, bare variables); switches to `Logger.With/LogAttrs/Log` shapes the walker can't parse; package-level `slog.Info(...)` (same selector match).
- **Acknowledged out of scope, by design** — value-side leaks (an allowlisted key like `err` carrying a string with a token in it). The test inspects keys, not values; value-side redaction is the future `slog` middleware named in the threat-model's *Future hardening* line. The Enforcement paragraph deliberately says "carries a key absent from that set" — keys only.
- **Acknowledged trust-layer, not structural** — allowlist tampering (a PR adds `"token"` to `allowedLogKeys`) and `t.Skip()`-style test disablement. Both visible in diff; code review owns them, same trust model as every other in-repo policy file.
- **Acknowledged scope boundary** — the test walks `internal/relay/` only. `cmd/pyrycode-relay/main.go` has handler construction and no `logger.{Info,Warn,Error,Debug}` calls today; a future log call there extends the test (or carbons it). Recorded in the ticket's *Out of Scope* block.

## Negative-path check (manual, not committed)

Per the test docstring: temporarily add e.g. `"forbidden_key", "x"` to a `logger.Info` call, re-run `go test ./internal/relay/ -run TestLogKeysAreAllowlisted`, confirm the failure names the right file:line and key without `-v`, revert. Documented in the test file so a future contributor can repeat it. No permanent negative-path fixture — committing a file that contains a forbidden log key would itself be the pattern the test is supposed to ban.

## Lessons learned

- **"Stochastic spec → deterministic test" pairs naturally with the threat-model layer.** The pattern this ticket establishes: when a threat-model bullet enumerates a closed set ("must not log X, may log Y"), the closed set wants a machine-checkable home in the source tree, and the doc bullet becomes a pointer at the canonical rule. The doc keeps stating intent; the source file states the exact names; the test gates drift. This is the same shape as `internal/protocol/closecodes/codes.go` (the closed set of WS close codes) — when you find yourself writing "the list of X is Y, Z, W" in a doc, ask whether the list wants to live in a `.go` file the source can be tested against. (Belt-and-suspenders fabric rule from `CLAUDE.md`: the threat-model rule is the stochastic agent layer; this test is the deterministic backstop. Different fabric, by design.)
- **Selector-name AST match is sufficient where the package has a stable receiver vocabulary.** No `go/types` resolution needed for a single-package gate of this shape. The walker is ~100 lines of stdlib; the cost of type resolution (build the package, run the type checker, handle build-tag combinatorics) would have been the dominant complexity for no real precision gain. Reach for `go/types` when the same check has to span receivers from multiple packages or interact with generics — not for in-package selector-name gates.
- **Narrow the matcher *before* the test runs against the live corpus, not after.** The first-arg-must-be-string-literal filter was added during implementation when `http.Error(w, msg, code)` showed up as a false positive. The general rule: a structural walker that matches by selector name will pick up unrelated methods sharing the name; lean on argument-shape filters (here, "first arg is a string literal — i.e., the slog msg convention") to scope it down. Cheaper than learning the false-positive list by running the test and triaging output.
- **Accumulate-and-report-all beats fail-at-first.** Three violation classes, all surfaced in one run, single `t.Fatalf` with count. The "developer who misnames a key in one file probably misnames it in three" guess turned out right in pre-commit testing; the negative-path check (one synthetic forbidden key) genuinely is a single-violation scenario, but a real refactor introducing a new key name hits every call site that uses it. The fix-loop shortens by a factor equal to the call-site count.

## Cross-links

- [Spec: 36-log-key-allowlist-test](../../specs/architecture/36-log-key-allowlist-test.md) — architect's design and security review (verdict: PASS).
- [Threat model § Log hygiene](../../threat-model.md) — the doc rule this test enforces; the new **Enforcement** paragraph points back to the allowlist file.
- [`docs/lessons.md`](../../lessons.md) — pre-2026-05-11 historical entry on layered defence (spec, code review, structural absence); this ticket adds the fourth layer.
Loading