Skip to content

relay: SetReadLimit on WSConn — bound per-frame size on both forwarders #29

@ilmoniemi

Description

@ilmoniemi

User Story

As a relay operator, I want every WebSocket frame the relay reads from a peer bounded by a deliberate per-frame size cap, so that a misbehaving or malicious phone or binary cannot pin memory in the read buffer or amplify backpressure across forwarders by sending oversized frames.

Context

docs/knowledge/features/ws-conn-adapter.md § Out of scope explicitly flags this as a follow-up:

No per-message size cap on Read. Inherited from *websocket.Conn's default (nhooyr's 32 MiB read limit). A deliberate SetReadLimit policy is a follow-up so it covers both forwarders.

32 MiB per frame is too generous for a phone↔binary message channel where envelope payloads are typically well under 64 KiB. Phone messages are typed text + structured envelopes; backfill chunks per the protocol spec are bounded; image/blob messages (if any) flow out-of-band.

Setting the cap at WSConn construction makes the policy cover every reader through the adapter — both the phone-side forwarder (StartPhoneForwarder, #25) and the binary-side forwarder (StartBinaryForwarder, #26, now landed) — without each call site needing to know.

Acceptance Criteria

  • A per-frame read-size cap is configured on every *websocket.Conn wrapped by WSConn, before any Read is performed against it. The mechanism is *websocket.Conn.SetReadLimit (or an equivalent applied at WSConn construction time).
  • The cap is a single policy value defined as a literal in cmd/pyrycode-relay/main.go and threaded through the existing ServerHandler and ClientHandler constructors (same pattern as the 30*time.Second grace duration on ServerHandler). One literal in one place; no package-level constant in internal/relay.
  • WSConn.Read returns a non-nil error when the peer sends a frame whose payload exceeds the cap, AND the underlying connection is closed by the library (no further reads succeed). The specific error type is library-dependent and is NOT asserted on — only the non-nil contract.
  • The cap value is justified in the ticket spec by reference to pyrycode/pyrycode/docs/protocol-mobile.md § Message envelope; 256 KiB is the proposed starting value but the architect may revise after deriving the upper bound from the protocol spec.
  • A unit test in internal/relay/ws_conn_test.go (using the existing startEcho/httptest.NewServer harness) verifies that a frame exceeding the cap surfaces as a non-nil Read error on the receiving WSConn and that subsequent reads fail. A second test verifies that a frame at-or-below the cap is delivered intact.
  • On /v1/client, when the phone sends an oversized frame, StartPhoneForwarder returns; the handler's existing defer { UnregisterPhone; Close; log } runs unchanged — i.e. no new cleanup paths are introduced. Verifiable by inspection of client_endpoint.go (no defer additions) and by the existing forwarder test pattern continuing to pass.
  • make vet, make test -race, and make build are clean.

Technical Notes

  • Construction-site application is the natural seam: NewWSConn is the choke point through which both /v1/server and /v1/client reach the wire. Applying at handler entry (i.e. calling c.SetReadLimit between websocket.Accept and NewWSConn) would also work but duplicates the call across handlers; thread the value into NewWSConn instead.
  • Test call sites that pass *WSConn (ws_conn_test.go, heartbeat_test.go, any forward-test fakes) will need their NewWSConn invocations updated for the new signature. These are mechanical one-line edits, not substantive changes.
  • nhooyr.io/websocket's SetReadLimit documented behaviour: when exceeded, the next Read returns a non-nil error and the library closes the connection with StatusMessageTooBig (1009). The relay does not need to emit its own close code — the library handles it.
  • Out of scope (separate tickets):
    • Per-IP / per-server-id rate limiting
    • Slow-loris hardening on the WS upgrade handshake
    • Differentiating phone vs. binary caps (single value covers both for now; revisit only if binary-side framing turns out to need a different bound in practice)
    • Making the cap operator-configurable via flag (current pattern is literal-at-wiring-site; promote only when a second operator needs a different value)

Size Estimate

S — single-file behavioural change in ws_conn.go (SetReadLimit call + signature change), single-literal wiring in main.go, mechanical one-line signature updates at four existing call sites (server_endpoint.go, client_endpoint.go, two test files), and one new test pair in ws_conn_test.go. Estimated <100 lines of production code; tests scale linearly.

The work passes the one-sentence-no-"and" test: "Apply a per-frame read-size cap at WSConn construction, threaded from the wiring site."

Metadata

Metadata

Assignees

No one assigned

    Labels

    security-sensitiveTouches auth, crypto, or internet-exposed input pathssize:sSmall ticket: <100 lines production code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions