feat(relay): cap WS upgrade handshake at 5s (#35)#92
Conversation
Tighten ReadHeaderTimeout from 10s to 5s on every http.Server the relay constructs so a slow-loris peer cannot pin a socket + serving goroutine on the pre-upgrade window. Post-upgrade slow peers are covered by #15 (per-frame deadlines) and #7 (heartbeat); per-IP attempt rate is bounded by #50. Threat model framing: docs/threat-model.md § DoS resistance (doc update belongs to the documentation phase, not this commit). Four wiring sites move together so the metrics-listener invariant documented in metrics_listen.go ("timeouts match the public listener … that is the safest default") holds: - cmd/pyrycode-relay/main.go: insecureListen srv, autocert httpsSrv, autocert httpSrv (each carries the literal at the wiring site per the policy-values-in-main convention). - internal/relay/metrics_listen.go: NewMetricsServer return literal. A new behavioural test (internal/relay/slow_upgrade_test.go) pins the net/http ReadHeaderTimeout contract: an http.Server with a finite ReadHeaderTimeout closes a peer that stalls mid-header. Uses a bare http.Server on a 127.0.0.1:0 listener (not httptest.NewServer, which cannot set ReadHeaderTimeout). Production literal (5s) is verified by code review at each wiring site, matching how every other inline policy value in main.go is guarded. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Code Review: #35Decision: PASS FindingsNone. SummaryImplementation matches the spec exactly. The four The Security review: spec's
|
Threat model § DoS resistance now cites ReadHeaderTimeout: 5s and names all four http.Server instances the relay constructs (insecure, autocert TLS, autocert HTTP-01, loopback metrics) — explicitly framing the bound as covering only the pre-upgrade window and pointing at #15 + #7 as the post-upgrade complements. autocert-tls.md and metrics-listener.md update the cited timeout-table values from 10s to 5s and note the cross-site invariant the metrics listener's function comment encodes (all four wiring sites moved together). New per-ticket note codebase/35.md records the four wiring sites, the choice of 5s (matches nhooyr's 5s close-handshake constant used by drainDeadline; leaves observable margin for legitimate mobile peers), the behavioural test that pins the net/http ReadHeaderTimeout contract, and the lesson about ReadTimeout/WriteTimeout no longer applying once websocket.Accept hijacks the connection (so ReadHeaderTimeout is the only stdlib-level bound on the pre-upgrade window). No new feature/decision/architecture doc was created, so INDEX.md is untouched per the sole-writer convention. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
What
Tightens
ReadHeaderTimeoutfrom 10s → 5s on everyhttp.Serverthe relay constructs (three incmd/pyrycode-relay/main.go, one ininternal/relay/metrics_listen.go). This bounds the pre-upgrade window — TCP accept to full HTTP header receipt — so a slow-loris peer cannot pin a socket + serving goroutine on the internet-exposed surface. Oncewebsocket.Accepthijacks the connection,ReadHeaderTimeoutno longer applies; post-upgrade slow peers are already covered by per-frame deadlines (#15) and heartbeat (#7).The metrics-listener moves together with the public listeners because the function comment in
metrics_listen.goexplicitly encodes the invariant ("timeouts match the public listener … that is the safest default"). Loopback traffic has no slow-loris threat, but keeping the four sites synchronised preserves the rationale the comment captures.Issue
Closes #35. Architect spec:
docs/specs/architecture/35-ws-upgrade-handshake-timeout.md(security review: PASS).Testing
internal/relay/slow_upgrade_test.go—TestHTTPServerReadHeaderTimeoutClosesSlowPeerconstructs a barehttp.Serveron a127.0.0.1:0listener with a shortReadHeaderTimeout(150ms), dials it, writes a partial header block (no terminating\r\n\r\n), and asserts the server closes the connection before the test's own bounded read deadline. An atomic counter fails the test loudly if the handler ever runs (signal that the timeout did not fire).internal/relay/metrics_listen_test.gopinned-value assertion adjusted from10*time.Second→5*time.Second.go test -race ./...,go vet ./...,go build ./cmd/pyrycode-relayall clean locally.Architecture compliance
const, no helper — per the policy-values-in-main convention named in the spec and indocs/PROJECT-MEMORY.md.srv) in the same shape as the existingdrainDeadlinecomment; the autocerthttpsSrv/httpSrvsit visibly adjacent and don't need their own comment, and the metrics file's existing block already explains the duplicated-value rationale.relay.Shutdown(relay: graceful shutdown on SIGTERM — drain WS connections before exit #31).net/httpcontract the relay relies on; the production literal (5s) is verified by code review at each wiring site, matching how every other inline policy value inmain.gois guarded.docs/threat-model.mdcurrently cites the 10s value verbatim; per the spec, that update belongs to the documentation phase and is not part of this PR.🤖 Generated with Claude Code