Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/PROJECT-MEMORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@ Stateless WebSocket router between mobile clients and pyry binaries. Internet-ex
- **Capture process-state timestamps in `main` after `flag.Parse()`, not as package-level vars.** `startedAt := time.Now()` lives inside `main` and is passed into the handler factory. A package-level `var startedAt = time.Now()` would fire at import time — before flag parsing, before `--version` early-returns — and be wrong for short-lived test binaries and any future deferred-serve setup. Adopted in #10.
- **Digest-pin third-party base images with a `# Tracks: <upstream-tag>` sibling comment.** `FROM image:tag@sha256:…` cannot be silently shifted by a tag-swap attack between Renovate bumps; the `# Tracks:` line tells a reviewer (or Renovate's diff) what the digest is supposed to correspond to, so a digest-changed-while-tag-comment-unchanged proposal is structurally reviewable. Same shape applies to any future supply-chain pin (action SHAs, downloaded toolchain archives) — never pin by digest alone, always pair with the human-readable tag the digest tracks. Adopted in `Dockerfile` for both `golang:1.26-bookworm` and `gcr.io/distroless/static-debian12:nonroot` (#32).
- **Belt-and-suspenders invariants survive base-layer regressions.** When a base image already enforces a security property upstream (distroless `:nonroot` sets `USER 65532`), re-state the property in our own layer (`USER nonroot:nonroot`) so a future base swap that drops the upstream property doesn't silently regress us. Both layers would need to regress simultaneously for the property to lapse. Cost is one line; benefit is regression resistance under unattended dependency updates. Adopted in the runtime stage of `Dockerfile` (#32).
- **Type-owned background goroutine: `ticker + done chan + WaitGroup`, idempotent + synchronous `Close`.** When a type owns a long-lived bookkeeping goroutine (sweep/eviction loop, not a per-conn read pump), start it from the constructor and stop it via the type's own `Close()`. Shape: `closeOnce.Do(func(){ close(done) })` then `wg.Wait()` **outside** the once — putting `wg.Wait()` inside would let a second concurrent caller return early while shutdown is still in flight. The wiring site registers `defer x.Close()` at the composition root. Adopted in `IPRateLimiter` (#50). Distinct from "Per-conn goroutines exit cleanly via LIFO defers" — that pattern is for goroutines whose lifecycle is owned by an HTTP handler; this one is for goroutines whose lifecycle is owned by the type itself.
- **When one timestamp must serve two semantic purposes, split it.** In `IPRateLimiter` (#50), refill math needs an anchor that advances in token-sized chunks (preserving fractional elapsed time for the next call) while eviction needs an anchor that updates on every poll (so an actively-hammered/denied bucket isn't dropped and recycled for a fresh burst). A single `last`-of-anything cannot do both without one purpose corrupting the other. Two fields (`refillLast`, `pollLast`) decouple them at zero structural cost and make a load-bearing security invariant (no free-burst-recycling) structurally true rather than carefully maintained. Generalisable: whenever a timestamp's "should it update here?" question gets conflicting answers from different callers, the answer is two timestamps.
- **Injectable wall-clock as an unexported `now func() time.Time` field, defaulting to `time.Now`.** When a type's correctness depends on `time.Now` and tests need determinism without sleeping, give the struct a package-private `now func() time.Time` field set to `time.Now` in the constructor; in-package tests assign a fake (`l.now = fakeClock.Now`) directly. No exported `WithClock` option, no clock interface, no constructor variants. The single eviction goroutine reads `l.now` on every tick; assignment before the first tick is race-free in practice and `-race` tests cover the steady-state. Adopted in `IPRateLimiter` (#50); applies to any future timer-driven primitive whose tests need fake-clock determinism.

## Conventions

Expand Down
48 changes: 48 additions & 0 deletions docs/knowledge/codebase/50.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Ticket #50 — per-IP token-bucket rate-limit type with eviction

Standalone primitive for pre-upgrade attempt throttling. Ships the limiter type only; IP extraction and upgrade-handler wiring are separate tickets (#47 et al.). Closes the third leg of `docs/threat-model.md` § *DoS resistance* (connection floods, fork-bomb retry) — but only once the wiring ticket actually applies it; this ticket changes nothing about the running system.

## Implementation

- **`internal/relay/ratelimit.go`** — `IPRateLimiter`: per-IP token-bucket map under one `sync.Mutex`; integer-token refill math (`time.Duration` per token, no float drift); single owned eviction goroutine started in `NewIPRateLimiter`; `Allow(ip string) bool`, `Close()`, unexported `sweep`/`evictLoop`. Constructor is `NewIPRateLimiter(refillEvery, burst, evictionInterval)`. The `now` clock is a package-private `func() time.Time` field (`time.Now` in production; in-package tests assign a fake directly — no exported `WithClock` constructor). No new dependencies; stdlib only.
- **`internal/relay/ratelimit_test.go`** — 10 tests: `BurstThenDeny`, `RefillAfterAdvance`, `FractionalRefillPreserved`, `PerIPIsolation`, `EvictionDropsIdleBucket`, `EvictionPreservesActiveBucket` (regression-guard for the two-field invariant), `EvictionGoroutineRunsAndDrops` (only test with real-clock sleeping), `CloseIsIdempotent`, `CloseStopsEvictionGoroutine`, `RaceFreedom` (32 goroutines × 200 ops, sweep on 1ms interval, run with `go test -race -count=20 -run TestIPRateLimiter_RaceFreedom ./internal/relay`). Fake clock is local to the test file; constructor helper `newTestLimiter` registers `t.Cleanup(l.Close)`.
- **No changes to `cmd/pyrycode-relay/main.go`** — per the AC, this is the primitive only. Wiring is #47's job.

## Two-field invariant (the load-bearing design choice)

Each bucket holds two timestamps:

- `refillLast` — anchored to the last *credited refill*. Advances in integer-token chunks (`refillLast += tokensAdd * refillEvery`, **not** to `now`), preserving the fractional remainder of elapsed time for the next call.
- `pollLast` — anchored to the last *call to `Allow`*, set to `now` on every call **including denies**.

Eviction predicate is `now - pollLast >= burst * refillEvery`. The invariant — that an evicted bucket is at capacity at the moment of eviction — follows: since `pollLast` is the most recent `Allow`, no calls have happened since, so tokens are unchanged and `refillLast ≤ pollLast`, so `now - refillLast ≥ burst * refillEvery`, which credits ≥ burst tokens (clamped to burst). A single `last`-of-anything field could not simultaneously preserve fractional refill time *and* keep an actively-hammered (denied) bucket resident — separating the two concerns is what makes free-burst-recycling structurally impossible (`TestIPRateLimiter_EvictionPreservesActiveBucket` is the regression guard).

## Eviction goroutine lifecycle

`time.NewTicker(evictionInterval)` + `done chan struct{}` + `sync.WaitGroup`. `Close` is idempotent via `sync.Once` (closes `done` once) and synchronous via `wg.Wait()` (a second concurrent `Close` also blocks until the goroutine exits). `sweep` is unexported but in-package: tests construct the limiter with `evictionInterval=time.Hour` (goroutine effectively never fires), drive a fake clock manually, and call `sweep()` directly. One test (`TestIPRateLimiter_EvictionGoroutineRunsAndDrops`) uses real wall-clock + short interval to verify the goroutine path end-to-end.

The shape is *related but distinct* from the "per-conn goroutine exits via LIFO defers" pattern (`docs/PROJECT-MEMORY.md`): that pattern is for goroutines whose lifecycle is owned by an HTTP handler. The limiter's eviction goroutine is owned by the limiter type itself and stopped via its own `Close()` — the wiring ticket (#47) is what registers `defer limiter.Close()` at the composition root.

## Concurrency model

- Single `sync.Mutex` (not RWMutex): `Allow` always writes (token decrement and/or pollLast update), so RWMutex's overhead is unjustified.
- Eviction goroutine takes the same `Mutex` during sweep; sweep holds the lock for the duration of one full map walk. Map iteration with concurrent `delete` is spec-safe.
- The eviction goroutine does not call any externally-supplied callback. It mutates `l.buckets` only; no `Send`, no `Close` on application objects.
- Clock-backwards (NTP step backwards): `elapsed < 0` ⇒ `tokensAdd ≤ 0`, refill skipped; `now - pollLast < threshold` (negative), so no premature eviction. Safe-by-default.

## Parameter contracts (documented, not enforced)

- `refillEvery > 0`, `burst ≥ 1`, `evictionInterval > 0` — out-of-bounds values are a wiring bug. Limiter does not validate; `time.NewTicker` panics loudly on `evictionInterval ≤ 0`, which is the right loud-failure (matches `docs/PROJECT-MEMORY.md` § *Loud failure over silent correction*).

## Out of scope

- IP-extraction helper (sibling ticket).
- Upgrade-handler middleware wiring, `--trust-x-forwarded-for` flag, 429 response shape, deny-side log line, policy defaults — all #47.
- `docs/threat-model.md` § *DoS resistance* update — #47 lands first (until then, no rate limit is applied to traffic; updating the doc here would falsely claim the threat is mitigated).
- Per-server-id or per-token rate limiting — separate threat surface.
- Multi-instance shared-state rate limiting — v1 is single-instance.

## Cross-links

- [Spec: 50-ip-rate-limiter](../../specs/architecture/50-ip-rate-limiter.md) — architect's design and security review (verdict: PASS, no findings).
- [Threat model § DoS resistance](../../threat-model.md) — names the threat this primitive begins to mitigate (alongside #29's per-frame cap and #30's max-phones cap).
12 changes: 12 additions & 0 deletions docs/lessons.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,3 +93,15 @@ The runtime image used in #32 ships with the binary and nothing else: no `sh`, n
## `autocert.Manager.TLSConfig()` doesn't set `MinVersion`

`gosec` G402 fires on `make lint` if you use it raw. Wrap it in a helper that pins `MinVersion = tls.VersionTLS12` (or 1.3) before handing it to `http.Server`. Centralising the override means a future bump is a one-line change. Source: `relay.TLSConfig` (#9).

## Integer token-bucket refill: advance the anchor by token-sized chunks, not to `now`

After crediting `tokensAdd := int(elapsed / refillEvery)` tokens, the natural-looking move is `refillLast = now`. That discards the fractional remainder of `elapsed` (the slice less than one `refillEvery`), over-throttling slightly because the next call has to wait the full interval again instead of just the leftover. The correct move is `refillLast += time.Duration(tokensAdd) * refillEvery`, which preserves the fractional time for the next call and keeps the steady-state rate exactly `1 / refillEvery`. Security-positive either way (the bug only over-throttles, never under), but the chunked form is the one that matches the documented rate. Source: `IPRateLimiter.Allow` (#50).

## `wg.Wait()` belongs OUTSIDE `closeOnce.Do(close(done))`

The natural shape for "idempotent close that stops one goroutine" is `closeOnce.Do(func(){ close(done); wg.Wait() })`. That works for a single caller, but a second concurrent `Close()` skips the `Do` body entirely and returns immediately — while the goroutine is still running and the first caller is still inside `wg.Wait()`. Caller two now races past `Close` thinking shutdown is complete. Hoist `wg.Wait()` outside the once: `closeOnce.Do(func(){ close(done) }); wg.Wait()`. Now both callers block on the WaitGroup until the goroutine exits; both return synchronously. Matches the "Close is idempotent and synchronous" expectation on `WSConn.Close`. Source: `IPRateLimiter.Close` (#50).

## Map iteration with `delete(m, k)` is spec-safe; `for range` snapshot is not required

The Go spec guarantees that deleting the current key during `for k, v := range m { ... delete(m, k) ... }` is safe — the deleted entry is omitted from subsequent iterations, and entries not yet visited may or may not be visited (which is fine when you're filtering by a predicate that any-order eviction satisfies). No need to build a `toDelete` slice first. Source: `IPRateLimiter.sweep` (#50); confirmed by the Go spec §"For statements with `range` clause".
Loading