Skip to content

feat(relay): /v1/server WS upgrade with header gate and server-id claim (#16)#19

Merged
ilmoniemi merged 3 commits into
mainfrom
feature/16
May 8, 2026
Merged

feat(relay): /v1/server WS upgrade with header gate and server-id claim (#16)#19
ilmoniemi merged 3 commits into
mainfrom
feature/16

Conversation

@ilmoniemi
Copy link
Copy Markdown
Contributor

What

Implements the binary-side WebSocket upgrade endpoint at /v1/server:

  • internal/relay/server_endpoint.go exports ServerHandler(reg, logger) http.Handler. Pre-upgrade gate on X-Pyrycode-Server, X-Pyrycode-Version, and User-Agent (all must be non-empty); failures return 400 with no body and the registry untouched.
  • On success, websocket.Accept upgrades with OriginPatterns: ["*"] (the custom-header gate structurally excludes browser-driven CSWSH). The conn is wrapped via NewWSConn with a server-<id>-<8 hex> conn-id sourced from crypto/rand.
  • reg.ClaimServer enforces first-claim-wins; a losing claim closes the WS with code 4409 and reason "server-id already claimed". The conflict path closes the underlying *websocket.Conn directly (WSConn.Close always sends StatusNormalClosure); a comment in source names this principled exception.
  • The handler holds the connection open via Conn.CloseRead until the peer disconnects, then releases the slot in a defer.

cmd/pyrycode-relay/main.go adds one mux.Handle("/v1/server", …) line next to /healthz.

Issue

Closes #16

Testing

internal/relay/server_endpoint_test.go covers all five AC scenarios:

  • TestServerEndpoint_ValidUpgrade_RegistersBinary — successful upgrade, conn-id shape (server-s1- + 8 hex chars).
  • TestServerEndpoint_HeaderGate_400 — table-driven, missing/empty for each of the three required headers; uses req.Header[k] = nil for the User-Agent missing case to suppress Go's default UA injection so the test exercises the wire-level missing case.
  • TestServerEndpoint_DuplicateClaim_4409 — second dial reads back a websocket.CloseError with Code == 4409 and the documented reason; first conn remains registered.
  • TestServerEndpoint_PeerClose_ReleasesSlot — peer-side close releases the slot; re-claim succeeds.
  • TestServerEndpoint_WrongMethod_NoPanic — GET without upgrade headers and POST both return 4xx with no panic and registry empty.

Verified locally:

  • go test -race ./...
  • go vet ./...
  • go build ./cmd/pyrycode-relay

Architecture compliance

Follows the spec at docs/specs/architecture/16-server-endpoint.md:

  • Single exported symbol (ServerHandler), file-local randHex8 and remoteHost helpers.
  • Header validation runs before any WebSocket state is allocated; registry untouched on every error path.
  • Log fields exactly match the threat-model hygiene rules: server_claimed includes binary_version; server_id_conflict deliberately omits it to avoid creating a version-disclosure oracle.
  • defer { ReleaseServer; wsconn.Close; log } is registered after the successful ClaimServer so the conflict path cannot trigger a release on a slot we don't hold.
  • No new dependencies. No changes to registry.go, ws_conn.go, or any other existing file beyond the single mux registration line.

ilmoniemi added 2 commits May 8, 2026 21:55
…im (#16)

Implements the binary-side WebSocket upgrade endpoint. Validates
X-Pyrycode-Server, X-Pyrycode-Version, and User-Agent before invoking
websocket.Accept; on success, wraps the conn in WSConn (with a
crypto/rand-suffixed conn-id) and registers it via Registry.ClaimServer.
Returns close code 4409 with reason "server-id already claimed" when the
slot is already held. Holds the connection open via CloseRead until the
peer disconnects; releases the slot in a defer.

Logs structured fields only (server_id, binary_version, remote on
claim; server_id and remote on conflict; server_id on release) per
docs/threat-model.md log-hygiene rules.

Closes #16
@ilmoniemi
Copy link
Copy Markdown
Contributor Author

Code Review: #16

Decision: PASS

Architect's security-review section is present in docs/specs/architecture/16-server-endpoint.md (Verdict: PASS, full categories walked) — security-sensitive obligation #1 satisfied. CI is green (security and test jobs both pass). Implementation closely matches the spec; tests cover the five AC bullets.

Findings

  • [NIT] internal/relay/server_endpoint.go:3-10 — the close-codes comment block sits between package relay and import (…) separated from both by blank lines, so it is unattached and won't surface in go doc. Either fold it into the ServerHandler doc comment, or hang it off a named const (e.g. const statusServerIDConflict websocket.StatusCode = 4409) which also addresses the bare 4409 magic number on line 61.
  • [NIT] internal/relay/server_endpoint.go:53-72 — the conflict path closes the underlying *websocket.Conn directly (correct, to send 4409) but never invokes wsconn.Close() afterwards, so the cancel func held by WSConn.closeCtx is left un-invoked. No real leak — WithCancel(Background()) spawns no goroutine and the struct goes out of scope — but a trailing wsconn.Close() after c.Close(websocket.StatusCode(4409), …) would tidy the bookkeeping (the second c.Close(StatusNormalClosure, "") inside WSConn.Close is a harmless no-op on an already-closed conn). The spec's "stillborn WSConn, invariant preserved in spirit" comment is reasonable, just mildly inconsistent with WSConn's documented contract.
  • [NIT] internal/relay/server_endpoint_test.go:97 — the empty_User-Agent row uses req.Header.Set("User-Agent", ""), but Go's (*Request).write suppresses an empty User-Agent from the wire (it has explicit treatment versus arbitrary headers). The handler still sees an empty header and returns 400, so the assertion holds, but the row exercises the missing-header path rather than the explicit-empty-string path. The custom-header rows (empty_X-Pyrycode-Server, empty_X-Pyrycode-Version) actually do send the empty value over the wire — those genuinely cover the empty-string branch.

Summary

A clean, spec-faithful implementation of the binary-side WebSocket upgrade. Header gate runs pre-Accept; registry untouched on every error path; first-claim-wins enforced via ClaimServer; defer ordering puts release after successful claim so a conflict never ghost-releases the legitimate slot; CloseRead keeps control-frames flowing so the conn observes peer-side closes; conn-id uses crypto/rand (forward-compat hardening, not security-critical here); log fields match docs/threat-model.md § Log hygiene exactly, with binary_version deliberately omitted on the conflict path. Tests cover all five AC bullets. Findings above are NIT-level — none block merge.

)

- Add features/server-endpoint.md covering the binary-side ingress: header
  gate pre-upgrade, ClaimServer integration, 4409 conflict path,
  CloseRead-held connection until #6's frame loop lands.
- Add ADR-0005 capturing the rule that application close codes go through
  the underlying *websocket.Conn (stillborn-WSConn window), keeping
  WSConn.Close fixed to StatusNormalClosure.
- Update INDEX with new feature and ADR.
- Update PROJECT-MEMORY: mark /v1/server done; add patterns for pre-upgrade
  validation, close-code emission, defer-after-claim ordering, and the
  CloseRead bridge for long-lived WS handlers.
- Add four lessons: control-frame requirement, close-code emission rule,
  defer ordering for ReleaseServer, and crypto/rand panic-as-fatal idiom.
@ilmoniemi ilmoniemi merged commit d99b1f9 into main May 8, 2026
2 checks passed
@ilmoniemi ilmoniemi deleted the feature/16 branch May 8, 2026 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

relay: WS upgrade for /v1/server — accept binary connection, validate headers, claim server-id

1 participant