Skip to content

feat(relay): cap WS upgrade handshake at 5s (#35)#92

Merged
ilmoniemi merged 3 commits into
mainfrom
feature/35
May 13, 2026
Merged

feat(relay): cap WS upgrade handshake at 5s (#35)#92
ilmoniemi merged 3 commits into
mainfrom
feature/35

Conversation

@ilmoniemi
Copy link
Copy Markdown
Contributor

What

Tightens ReadHeaderTimeout from 10s → 5s on every http.Server the relay constructs (three in cmd/pyrycode-relay/main.go, one in internal/relay/metrics_listen.go). This bounds the pre-upgrade window — TCP accept to full HTTP header receipt — so a slow-loris peer cannot pin a socket + serving goroutine on the internet-exposed surface. Once websocket.Accept hijacks the connection, ReadHeaderTimeout no longer applies; post-upgrade slow peers are already covered by per-frame deadlines (#15) and heartbeat (#7).

The metrics-listener moves together with the public listeners because the function comment in metrics_listen.go explicitly encodes the invariant ("timeouts match the public listener … that is the safest default"). Loopback traffic has no slow-loris threat, but keeping the four sites synchronised preserves the rationale the comment captures.

Issue

Closes #35. Architect spec: docs/specs/architecture/35-ws-upgrade-handshake-timeout.md (security review: PASS).

Testing

  • New: internal/relay/slow_upgrade_test.goTestHTTPServerReadHeaderTimeoutClosesSlowPeer constructs a bare http.Server on a 127.0.0.1:0 listener with a short ReadHeaderTimeout (150ms), dials it, writes a partial header block (no terminating \r\n\r\n), and asserts the server closes the connection before the test's own bounded read deadline. An atomic counter fails the test loudly if the handler ever runs (signal that the timeout did not fire).
  • Updated: internal/relay/metrics_listen_test.go pinned-value assertion adjusted from 10*time.Second5*time.Second.
  • go test -race ./..., go vet ./..., go build ./cmd/pyrycode-relay all clean locally.

Architecture compliance

  • Literal sits at each wiring site — no shared const, no helper — per the policy-values-in-main convention named in the spec and in docs/PROJECT-MEMORY.md.
  • Single comment block above the first occurrence (insecure-listen srv) in the same shape as the existing drainDeadline comment; the autocert httpsSrv/httpSrv sit visibly adjacent and don't need their own comment, and the metrics file's existing block already explains the duplicated-value rationale.
  • No new goroutines, no new exported symbols, no new dependencies, no change to relay.Shutdown (relay: graceful shutdown on SIGTERM — drain WS connections before exit #31).
  • Behavioural test verifies the net/http contract the relay relies on; the production literal (5s) is verified by code review at each wiring site, matching how every other inline policy value in main.go is guarded.
  • docs/threat-model.md currently cites the 10s value verbatim; per the spec, that update belongs to the documentation phase and is not part of this PR.

🤖 Generated with Claude Code

ilmoniemi and others added 2 commits May 13, 2026 11:42
Tighten ReadHeaderTimeout from 10s to 5s on every http.Server the
relay constructs so a slow-loris peer cannot pin a socket + serving
goroutine on the pre-upgrade window. Post-upgrade slow peers are
covered by #15 (per-frame deadlines) and #7 (heartbeat); per-IP
attempt rate is bounded by #50. Threat model framing:
docs/threat-model.md § DoS resistance (doc update belongs to the
documentation phase, not this commit).

Four wiring sites move together so the metrics-listener invariant
documented in metrics_listen.go ("timeouts match the public listener
… that is the safest default") holds:

  - cmd/pyrycode-relay/main.go: insecureListen srv, autocert httpsSrv,
    autocert httpSrv (each carries the literal at the wiring site per
    the policy-values-in-main convention).
  - internal/relay/metrics_listen.go: NewMetricsServer return literal.

A new behavioural test (internal/relay/slow_upgrade_test.go) pins the
net/http ReadHeaderTimeout contract: an http.Server with a finite
ReadHeaderTimeout closes a peer that stalls mid-header. Uses a bare
http.Server on a 127.0.0.1:0 listener (not httptest.NewServer, which
cannot set ReadHeaderTimeout). Production literal (5s) is verified by
code review at each wiring site, matching how every other inline
policy value in main.go is guarded.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ilmoniemi
Copy link
Copy Markdown
Contributor Author

Code Review: #35

Decision: PASS

Findings

None.

Summary

Implementation matches the spec exactly. The four ReadHeaderTimeout literals (three in cmd/pyrycode-relay/main.go at lines 198/249/261, one in internal/relay/metrics_listen.go:90) all move 10s → 5s in lock-step, preserving the "duplicated, identical" invariant the metrics-listener comment encodes. The wiring-site comment in main.go:185-194 is well-shaped: it names the value, what it bounds, the post-upgrade complements (#15, #7), and the threat-model citation — without leaking implementation detail into the policy.

The slow_upgrade_test.go test pins the standard-library contract at a short value (150ms + 500ms slack), uses an atomic counter to assert the handler never ran (correct negative-shape check), accepts both EOF and reset-style closures as valid signals, and explicitly fails on a client-side deadline trip with a clear diagnostic. Passes locally under -race. metrics_listen_test.go:105 is updated to assert the new value.

Security review: spec's ## Security review section is present with verdict PASS and a complete category walk; the diff matches the design — no new log lines that could leak token material, no filesystem or subprocess changes, no crypto change. The pre-upgrade window carries no application credentials (token parsing happens post-websocket.Accept), so the timeout shortening has no token-handling impact.

go vet clean, new test passes under -race.

Threat model § DoS resistance now cites ReadHeaderTimeout: 5s and names
all four http.Server instances the relay constructs (insecure, autocert
TLS, autocert HTTP-01, loopback metrics) — explicitly framing the bound
as covering only the pre-upgrade window and pointing at #15 + #7 as the
post-upgrade complements.

autocert-tls.md and metrics-listener.md update the cited timeout-table
values from 10s to 5s and note the cross-site invariant the metrics
listener's function comment encodes (all four wiring sites moved
together).

New per-ticket note codebase/35.md records the four wiring sites, the
choice of 5s (matches nhooyr's 5s close-handshake constant used by
drainDeadline; leaves observable margin for legitimate mobile peers),
the behavioural test that pins the net/http ReadHeaderTimeout contract,
and the lesson about ReadTimeout/WriteTimeout no longer applying once
websocket.Accept hijacks the connection (so ReadHeaderTimeout is the
only stdlib-level bound on the pre-upgrade window).

No new feature/decision/architecture doc was created, so INDEX.md is
untouched per the sole-writer convention.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ilmoniemi ilmoniemi merged commit 281c38e into main May 13, 2026
2 of 3 checks passed
@ilmoniemi ilmoniemi deleted the feature/35 branch May 13, 2026 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

relay: WS upgrade handshake timeout — cap upgrade time to bound slow-loris exposure

1 participant