Skip to content

feat(relay): cap phones per server-id (#30)#43

Merged
ilmoniemi merged 3 commits into
mainfrom
feature/30
May 13, 2026
Merged

feat(relay): cap phones per server-id (#30)#43
ilmoniemi merged 3 commits into
mainfrom
feature/30

Conversation

@ilmoniemi
Copy link
Copy Markdown
Contributor

What

Bound the 1:N phone slice per server-id with a configurable cap (production: 16). A misbehaving binary — or one whose token has leaked to many phones — can no longer grow phones[serverID] without limit, and the per-frame fanout cost in StartBinaryForwarder is bounded above by maxPhones.

  • internal/relay/registry.go — new sentinel ErrPhonesAtCap; new RegisterPhoneCapped(serverID, conn, max int) error with cap-check and append under one write lock; RegisterPhone delegates with max=0 (no cap).
  • internal/relay/client_endpoint.goClientHandler gains maxPhones int. Over-cap registrations are closed on the underlying *websocket.Conn with code 4429 and reason "too many phones for server-id" (stillborn-WSConn pattern, symmetric with the 4404 branch). New log event phone_register_at_cap carries server_id + remote; no token, no device-name.
  • cmd/pyrycode-relay/main.go — passes literal 16 at the wiring site (phone + tablet + several desktops + headroom).

Issue

Closes #30.

Testing

  • TestRegisterPhoneCapped_BoundaryAndRejection — registering exactly max phones succeeds; the next attempt returns ErrPhonesAtCap and does not append.
  • TestRegisterPhoneCapped_RecoveryAfterUnregister — after UnregisterPhone brings the count below max, the next RegisterPhoneCapped succeeds again.
  • TestRegisterPhoneCapped_PerServerIDIndependent — the cap is per-server-id; a second server-id can register up to its own cap independently.
  • TestRegisterPhone_NoCapAfterDelegation — pins the wrapper invariant (RegisterPhoneRegisterPhoneCapped with max=0).
  • TestClientEndpoint_AtCap_4429 — integration test confirming the 4429 close code and reason reach the wire when the cap is exceeded; the in-cap phones remain registered.
  • TestRegistry_RaceFreedom extended to exercise RegisterPhoneCapped under -race.
  • go test -race ./... clean; go vet ./... clean; go build ./cmd/pyrycode-relay clean.

Architecture compliance

  • Follows the spec's dual-method design (RegisterPhone + RegisterPhoneCapped) to avoid a 23-call-site mechanical edit on the existing RegisterPhone signature.
  • maxPhones lives at the wiring site (mirrors the 30*time.Second grace on the line above); not a package constant.
  • Error sentinel + errors.Is branching follows the existing ErrServerIDConflict / ErrNoServer pattern; stillborn-WSConn close direct on *websocket.Conn follows ADR-0005.
  • No change to the data path (StartPhoneForwarder / StartBinaryForwarder) or to the 4404 / 4409 close-code semantics.

Coordination

WS close code 4429 is not yet listed in protocol-mobile.md's close-code table; per the architect's Open Questions §1 the relay change ships independently of a coordinated spec-side PR adding 4429 (phone clients tolerate unknown 44xx codes per RFC 6455). No interlock.

🤖 Generated with Claude Code

@ilmoniemi
Copy link
Copy Markdown
Contributor Author

Code Review: #30

Decision: PASS

Findings

  • [NIT] internal/relay/client_endpoint_test.go:306const cap = 2 shadows the predeclared identifier cap. The spec named the constant cap, so the developer is just following directions; calling it out here so a future Edit doesn't get tripped by the shadow. Non-blocking.

What I verified

Spec compliance. Every item in the "Done means" section is present:

  • ErrPhonesAtCap sentinel exported alongside the existing two (registry.go:21-26).
  • RegisterPhoneCapped(serverID, conn, max int) error added; RegisterPhone delegates with max=0 (registry.go:200-225). Cap check and append are atomic under one r.mu.Lock(). ErrNoServer precedence over ErrPhonesAtCap is preserved (no-server check runs first).
  • ClientHandler gains maxPhones int; the errors.Is(_, ErrPhonesAtCap) branch closes the underlying *websocket.Conn with websocket.StatusCode(4429) and reason "too many phones for server-id"; log event phone_register_at_cap carries server_id + remote only, symmetric with the 4404 line. Stillborn-WSConn pattern matches ADR-0005 and the sibling 4404 branch.
  • cmd/pyrycode-relay/main.go:51 passes 16 at the wiring site.
  • Four registry unit tests + one /v1/client integration test, all matching the spec's templates.

Blast radius. ClientHandler has three call sites total (1 prod + 2 test); all are updated in this diff. RegisterPhone's signature is unchanged, so the 23 existing call sites across healthz_test.go, forward_test.go, and registry_test.go need no edits. This is the explicit value of the dual-method design over a signature change.

Architect's non-gating SHOULD FIX is addressed. TestRegistry_RaceFreedom now includes a RegisterPhoneCapped call in the inner loop (registry_test.go:565), plus the matching unregister cleanup at :568.

Security walk (security-sensitive label):

  • No tokens or device-names in the at-cap log line — field set is identical to phone_register_no_server.
  • Close reason "too many phones for server-id" carries no cap value or other state that would aid probing.
  • Cap-check + append atomic under one write lock — TOCTOU race is structurally impossible (same shape as ClaimServer's first-claim-wins, already covered by TestRegistry_RaceFreedom).
  • maxPhones <= 0 degrades to "no cap" — same contract on both sides; no panic, no clamp, no path to inject a negative value (compile-time literal at the wiring site).
  • No new goroutines; no new lock ordering; no new file/subprocess/crypto surface.
  • The 4429 close happens in the stillborn-WSConn window before the heartbeat goroutine spawns and before the unregister-defer is registered — no leak, no orphan unregister.

Build/test: go vet ./..., go test -race -count=1 ./internal/relay/, and go build ./cmd/pyrycode-relay all clean.

Summary

Faithful spec implementation. Dual-method design (RegisterPhone wrapping RegisterPhoneCapped) keeps the existing 23 call sites untouched. Cap-rejection is atomic and observable; the new 4429 close path is structurally symmetric with the existing 4404 path. The protocol-spec PR for 4429 is tracked separately per Open Questions §1 and does not block this merge.

ilmoniemi and others added 3 commits May 13, 2026 08:47
Bound the 1:N phone slice per server-id so a misbehaving binary — or one
whose token has leaked to many phones — cannot grow phones[serverID]
without limit and inflate per-frame fanout cost.

- Add ErrPhonesAtCap sentinel and RegisterPhoneCapped(serverID, conn, max)
  to Registry. RegisterPhone now delegates with max=0 (no cap). The cap
  check and the slice append run under one write lock; race shape mirrors
  ClaimServer's first-claim-wins.
- ClientHandler gains a maxPhones parameter. Over-cap registrations are
  rejected with WS close code 4429 and reason "too many phones for
  server-id" on the underlying *websocket.Conn (stillborn-WSConn pattern,
  symmetric with the 4404 branch). Log event phone_register_at_cap
  carries server_id + remote; no token, no device-name.
- Wiring: cmd/pyrycode-relay/main.go passes 16 (phone + tablet + several
  desktops + headroom) at construction; tests pass 0 to disable the cap.
- Tests: four new registry unit tests (boundary, recovery, per-server-id
  independence, wrapper-delegation invariant) and one integration test
  asserting 4429 reaches the wire. TestRegistry_RaceFreedom now also
  exercises RegisterPhoneCapped under the race detector.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per-ticket note in codebase/30.md plus updates to the client-endpoint and
connection-registry feature docs and the knowledge index for the new
ErrPhonesAtCap sentinel, RegisterPhoneCapped, the 4429 close code, and the
phone_register_at_cap log event.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ilmoniemi ilmoniemi merged commit c70544c into main May 13, 2026
3 checks passed
@ilmoniemi ilmoniemi deleted the feature/30 branch May 13, 2026 05:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

relay: cap phones per server-id — bound the 1:N phone slice

1 participant