Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cmd/pyrycode-relay/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ func main() {

mux := http.NewServeMux()
mux.Handle("/healthz", relay.NewHealthzHandler(reg, Version, startedAt))
mux.Handle("/v1/server", relay.ServerHandler(reg, logger))
mux.Handle("/v1/server", relay.ServerHandler(reg, logger, 30*time.Second))

if *insecureListen != "" {
logger.Info("starting", "version", Version, "mode", "insecure", "listen", *insecureListen)
Expand Down
5 changes: 3 additions & 2 deletions docs/PROJECT-MEMORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Stateless WebSocket router between mobile clients and pyry binaries. Internet-ex
| TLS via autocert in `--domain` mode (`NewAutocertManager`, `EnforceHost`, `TLSConfig`, `ErrCacheDirInsecure`) | Done (#9) | `internal/relay/tls.go`, `cmd/pyrycode-relay/main.go` |
| `WSConn` adapter (`nhooyr.io/websocket.Conn` → registry `Conn`; per-conn write mutex; `Close`-cancelled context; 10s `Send` deadline) | Done (#15) | `internal/relay/ws_conn.go` |
| `/healthz` JSON endpoint (`status`, `version`, `connected_binaries`, `connected_phones`, `uptime_seconds`; `Cache-Control: no-store`; unauthenticated) | Done (#10) | `internal/relay/healthz.go`, `cmd/pyrycode-relay/main.go` |
| WS upgrade on `/v1/server` (header gate pre-upgrade; `ClaimServer`; `4409` on conflict; `CloseRead`-held until #6) | Done (#16) | `internal/relay/server_endpoint.go`, `cmd/pyrycode-relay/main.go` |
| WS upgrade on `/v1/server` (header gate pre-upgrade; `ClaimServer`; `4409` on conflict; `CloseRead`-held until #6; disconnect defers a 30s grace release via `ScheduleReleaseServer`) | Done (#16, #21) | `internal/relay/server_endpoint.go`, `cmd/pyrycode-relay/main.go` |
| WS upgrade on `/v1/client` | Not started | — |
| Header validation (`x-pyrycode-server`, `x-pyrycode-version`, `user-agent` on `/v1/server`; `x-pyrycode-token` on `/v1/client`) | `/v1/server` done (#16); `/v1/client` not started | `internal/relay/server_endpoint.go` |
| Frame forwarding using the routing envelope | Not started | — |
Expand All @@ -35,7 +35,8 @@ Stateless WebSocket router between mobile clients and pyry binaries. Internet-ex
- **Handler factories return `http.Handler`, not `http.HandlerFunc`.** `NewHealthzHandler(reg, version, startedAt) http.Handler` keeps construction factory-shaped so a future ticket adding per-handler state (logger, injectable clock) can do so without touching the call site in `main`. Adopted in `/healthz` (#10).
- **Validate adversarial input *before* allocating WS state.** Header-gate checks (presence + non-empty) run before `websocket.Accept`; a malformed request never receives an upgrade response and the registry is untouched on every error path. Adopted in `/v1/server` (#16); the same shape applies to `/v1/client` token validation (#5).
- **Application WS close codes are emitted on the underlying `*websocket.Conn`, not via `WSConn`.** `WSConn.Close()` always emits `StatusNormalClosure` — no policy parameters. Handlers that need `4409` / `4401` / `4404` call `c.Close(websocket.StatusCode(code), reason)` directly in the stillborn-WSConn window (after construction, before any `Send`). See [ADR-0005](knowledge/decisions/0005-application-close-codes-via-underlying-conn.md). Adopted in `/v1/server` (#16); applies to `/v1/client` (#5).
- **`defer { Release; Close; log }` is registered AFTER the successful claim**, never before. A conflict path (or any pre-claim error) returns before the defer is in place, so we never `ReleaseServer` on a slot we don't hold. Adopted in `/v1/server` (#16); same structural ordering applies to `/v1/client`'s phone registration.
- **`defer { ScheduleRelease; Close; log }` is registered AFTER the successful claim**, never before. A conflict path (or any pre-claim error) returns before the defer is in place, so we never schedule a release on a slot we don't hold. Originally `ReleaseServer` (#16); swapped to `ScheduleReleaseServer(serverID, grace)` in #21 so a quick reconnect lands back in the same slot. The structural rule sharpened in #21 — pre-grace this was a benign no-op when violated; now violating it would arm a stray timer for an id the handler never owned. Same ordering applies to `/v1/client`'s phone registration.
- **Policy values live at the wiring site, not as package-level constants.** `30*time.Second` for the grace window is a literal in `cmd/pyrycode-relay/main.go`, threaded into `ServerHandler` as a constructor parameter — it appears exactly once, the value is policy (matches protocol spec), and inlining keeps the protocol-spec linkage visible where the relay is composed. Tests pass ms-scale durations through the same parameter. Promote to a package-level constant only when a second wiring entry point needs the same value. Adopted in `/v1/server` grace duration (#21).
- **Pointer-identity for stale `time.AfterFunc` fires.** `time.Timer.Stop()` returns false if the timer's func has already started executing. Wrap each pending timer in a small struct and store the wrapper pointer in a map; the `AfterFunc` closure captures the wrapper pointer and asserts `map[key] == self` under the lock before acting. If a faster goroutine replaced the entry, the pointer no longer matches and the closure no-ops. Capturing the `*time.Timer` directly forces a self-referential local var (assigned after `AfterFunc` returns) which trips the race detector under stress; the wrapper avoids that. Adopted in `Registry.ScheduleReleaseServer` (#20). Same shape applies to any "one cancellable timer per key" pattern.
- **Hold long-lived WS handlers open with `c.CloseRead(r.Context())` plus `<-readCtx.Done()` until the real read loop lands.** `CloseRead` drains-and-discards frames (including control frames — pings must be processed for the connection to observe a peer-side close). The frame loop ticket replaces `<-readCtx.Done()` with the actual read body in the same call site. Adopted in `/v1/server` (#16) pending #6.
- **Capture process-state timestamps in `main` after `flag.Parse()`, not as package-level vars.** `startedAt := time.Now()` lives inside `main` and is passed into the handler factory. A package-level `var startedAt = time.Now()` would fire at import time — before flag parsing, before `--version` early-returns — and be wrong for short-lived test binaries and any future deferred-serve setup. Adopted in #10.
Expand Down
4 changes: 2 additions & 2 deletions docs/knowledge/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ One-line pointers into the evergreen knowledge base. Newest entries at the top o

## Features

- [`/v1/server` WS upgrade](features/server-endpoint.md) — binary-side ingress: validates `X-Pyrycode-Server` / `X-Pyrycode-Version` / `User-Agent` pre-upgrade, claims the slot, emits `4409` on conflict, holds the conn via `CloseRead` until #6's frame loop replaces it.
- [`/v1/server` WS upgrade](features/server-endpoint.md) — binary-side ingress: validates `X-Pyrycode-Server` / `X-Pyrycode-Version` / `User-Agent` pre-upgrade, claims the slot, emits `4409` on conflict, holds the conn via `CloseRead` until #6's frame loop replaces it; on disconnect schedules a 30s grace release so a quick reconnect inherits the slot (#21).
- [`/healthz` JSON endpoint](features/healthz.md) — unauthenticated `GET /healthz` returning `{status, version, connected_binaries, connected_phones, uptime_seconds}`; `Cache-Control: no-store`, body bounded ≈135 bytes.
- [WSConn adapter](features/ws-conn-adapter.md) — wraps `nhooyr.io/websocket.Conn` to satisfy the registry's `Conn`; owns the per-conn write mutex and a `Close`-cancelled context with a 10s per-`Send` deadline.
- [Connection registry](features/connection-registry.md) — thread-safe `Registry` (server-id → binary 1:1, server-id → phones 1:N) with `Conn` interface, snapshot-returning `PhonesFor`, sentinel errors for `4409` / `4404`, deferred-release `ScheduleReleaseServer` with grace-window reclaim semantics.
Expand All @@ -16,7 +16,7 @@ One-line pointers into the evergreen knowledge base. Newest entries at the top o
- [ADR-0006: Grace window IS the reclaim path](decisions/0006-grace-period-as-reclaim-path.md) — `ClaimServer` during a pending grace timer succeeds (not `4409`); pointer-identity wrapper defends against stale `time.AfterFunc` fires after `Stop()`.
- [ADR-0005: Application WS close codes go through the underlying `*websocket.Conn`](decisions/0005-application-close-codes-via-underlying-conn.md) — `WSConn.Close()` stays minimal (`StatusNormalClosure`); handlers emit `4409` / `4401` / `4404` directly on the `*websocket.Conn` in stillborn-WSConn paths.
- [ADR-0004: WS library choice and adapter context strategy](decisions/0004-ws-library-and-adapter-context-strategy.md) — `nhooyr.io/websocket` v1.8.x; adapter-owned context cancelled by `Close`, 10s per-`Send` deadline; registry's `Send` signature stays context-free.
- [ADR-0003: Connection registry as a passive store](decisions/0003-connection-registry-passive-store.md) — single RWMutex, snapshot returns, no callbacks, narrow `ReleaseServer` so grace logic (#8) can wrap it.
- [ADR-0003: Connection registry as a passive store](decisions/0003-connection-registry-passive-store.md) — single RWMutex, snapshot returns, no callbacks, narrow `ReleaseServer` so grace logic (#20/#21) can wrap it.
- [ADR-0002: Explicit-failure 404 on `:80` instead of HTTPS redirect](decisions/0002-autocert-explicit-failure-on-port-80.md) — pass `http.NotFoundHandler()` to `autocert.Manager.HTTPHandler`, not `nil`; loud failure over silent scheme-switching.
- [ADR-0001: Routing envelope shape and opacity](decisions/0001-routing-envelope-shape-and-opacity.md) — `json.RawMessage` for the inner frame; sentinel errors; validate at boundary, never parse payloads.

Expand Down
2 changes: 1 addition & 1 deletion docs/knowledge/features/connection-registry.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ The duration `d` is fully trusted — degenerate values (`d <= 0` fires immediat
- **One `sync.RWMutex` covers all three maps** (`binaries`, `phones`, `timers`). Mutating methods take `Lock`; lookups take `RLock`. Sharding is a possible later optimisation but irrelevant at v1 (tens to hundreds of conns, sub-microsecond critical sections, no measured contention).
- **Conn callbacks are never invoked under the lock,** with one documented exception: `UnregisterPhone` calls `ConnID()` while holding the write lock to scan for the target. The `Conn` contract requires `ConnID` to be a non-blocking getter.
- **Broadcast pattern.** A caller wanting to fan out a frame to all phones for a server-id calls `PhonesFor` (returns a freshly allocated copy) and iterates the snapshot calling `Send` per conn — no registry lock held during I/O.
- **No callbacks, no channels.** The registry is a passive store. Notification of binary loss / new phone arrival happens elsewhere (the WS upgrade handlers and #8's grace timer).
- **No callbacks, no channels.** The registry is a passive store. Notification of binary loss / new phone arrival happens elsewhere (the WS upgrade handlers and the grace timer the registry arms in `ScheduleReleaseServer`).

## Invariants the registry enforces

Expand Down
Loading