diff --git a/docs/PROJECT-MEMORY.md b/docs/PROJECT-MEMORY.md
index 987761e..fcbaea2 100644
--- a/docs/PROJECT-MEMORY.md
+++ b/docs/PROJECT-MEMORY.md
@@ -9,10 +9,10 @@ Stateless WebSocket router between mobile clients and pyry binaries. Internet-ex
 | Go skeleton | Done | `cmd/pyrycode-relay/main.go`, `internal/relay/doc.go` |
 | `http.Server` with explicit timeouts (gosec G114) | Done | `cmd/pyrycode-relay/main.go` |
 | Routing envelope wrapper type (`Envelope`, `Marshal`, `Unmarshal`, sentinel errors) | Done (#1) | `internal/relay/envelope.go` |
+| Connection registry (`Conn`, `Registry`; 1:1 binary, 1:N phones; sentinel errors for `4409`/`4404`; race-tested) | Done (#3) | `internal/relay/registry.go` |
 | TLS via autocert in `--domain` mode (`NewAutocertManager`, `EnforceHost`, `TLSConfig`, `ErrCacheDirInsecure`) | Done (#9) | `internal/relay/tls.go`, `cmd/pyrycode-relay/main.go` |
 | WS upgrade on `/v1/server` and `/v1/client` | Not started | — |
 | Header validation (`x-pyrycode-server`, `x-pyrycode-token`) | Not started | — |
-| Connection registry (server-id ↔ binary, server-id ↔ phones) | Not started | — |
 | Frame forwarding using the routing envelope | Not started | — |
 | `conn_id` generation scheme | Not started | — |
 | Threat model doc — operational surface (deploy, supply chain, DoS, log hygiene, cert handling, TLS, error leakage) | Done (#11) | `docs/threat-model.md` |
@@ -25,6 +25,8 @@ Stateless WebSocket router between mobile clients and pyry binaries. Internet-ex
 - **Stdlib + `golang.org/x/crypto`.** First non-stdlib dep arrived in #9 (`acme/autocert`). Keep the dependency surface deliberate; new deps need a justification.
 - **Loud failure over silent correction.** `--domain` and `--insecure-listen` are mutually exclusive and explicit. `ErrCacheDirInsecure` refuses to start on a world/group-readable cert cache rather than re-chmoding it. `:80` returns `404` for non-challenge traffic instead of redirecting to HTTPS (ADR-0002).
 - **Tests live in the same package** (`package relay`, not `relay_test`) so they can `errors.Is` against unexported sentinels if needed and reach private helpers without re-exporting.
+- **Passive in-memory stores guard mutation under one RWMutex; reads return copies, never references.** `PhonesFor` allocates a fresh slice so callers do slow work (broadcast, `Send` over the network) without holding the registry lock. Adopted in `internal/relay/registry.go` (#3); the same shape applies to any future "shared map of conns" type.
+- **Interface methods called under the lock are documented as non-blocking getters.** The registry's `Conn.ConnID()` is invoked under the write lock during `UnregisterPhone`; `Send` and `Close` are never called while the lock is held. Pattern: state the contract on the interface, never call something that could block on I/O while a mutex is held.
 
 ## Conventions
 
diff --git a/docs/knowledge/INDEX.md b/docs/knowledge/INDEX.md
index 00e41af..3d7a5e1 100644
--- a/docs/knowledge/INDEX.md
+++ b/docs/knowledge/INDEX.md
@@ -4,11 +4,13 @@ One-line pointers into the evergreen knowledge base. Newest entries at the top o
 
 ## Features
 
+- [Connection registry](features/connection-registry.md) — thread-safe `Registry` (server-id → binary 1:1, server-id → phones 1:N) with `Conn` interface, snapshot-returning `PhonesFor`, sentinel errors for `4409` / `4404`.
 - [Autocert TLS](features/autocert-tls.md) — `--domain` mode wiring: `:443` WSS via Let's Encrypt + `:80` ACME http-01, host gates, cache-dir permission policy, TLS 1.2 floor.
 - [Routing envelope](features/routing-envelope.md) — typed Go wrapper (`Envelope`, `Marshal`, `Unmarshal`) for the `{conn_id, frame}` wire shape exchanged between relay and binary.
 
 ## Decisions
 
+- [ADR-0003: Connection registry as a passive store](decisions/0003-connection-registry-passive-store.md) — single RWMutex, snapshot returns, no callbacks, narrow `ReleaseServer` so grace logic (#8) can wrap it.
 - [ADR-0002: Explicit-failure 404 on `:80` instead of HTTPS redirect](decisions/0002-autocert-explicit-failure-on-port-80.md) — pass `http.NotFoundHandler()` to `autocert.Manager.HTTPHandler`, not `nil`; loud failure over silent scheme-switching.
 - [ADR-0001: Routing envelope shape and opacity](decisions/0001-routing-envelope-shape-and-opacity.md) — `json.RawMessage` for the inner frame; sentinel errors; validate at boundary, never parse payloads.
 
diff --git a/docs/knowledge/decisions/0003-connection-registry-passive-store.md b/docs/knowledge/decisions/0003-connection-registry-passive-store.md
new file mode 100644
index 0000000..4b72e02
--- /dev/null
+++ b/docs/knowledge/decisions/0003-connection-registry-passive-store.md
@@ -0,0 +1,59 @@
+# ADR-0003: Connection registry as a passive store
+
+**Status:** Accepted (#3)
+**Date:** 2026-05-08
+
+## Context
+
+The relay needs one canonical structure that owns "who is connected as what for which server-id." Every routing-layer ticket (#4 server upgrade, #5 client upgrade, #6 forwarding, #7 heartbeat, #8 grace, #10 health) reads or writes through it. The shape comes from `docs/architecture.md`: 1:1 server-id → binary with first-claim-wins, 1:N server-id → phones gated on a binary holding the slot.
+
+The design space had several axes — what the registry *owns*, what locks it takes, how it returns data, and where the grace period lives.
+
+## Decision
+
+`internal/relay/registry.go` defines a `Conn` interface, a `Registry` type with one `sync.RWMutex` over both maps, seven methods (`ClaimServer`, `ReleaseServer`, `RegisterPhone`, `UnregisterPhone`, `BinaryFor`, `PhonesFor`, `Counts`), and two sentinel errors (`ErrServerIDConflict`, `ErrNoServer`). The registry is a **passive store**: it never invokes `Send` / `Close` on conns it tracks, never times anything, never calls back to anyone. Stdlib only.
+
+Rules adopted:
+
+1. **One RWMutex over both maps.** No sharding, no per-server-id locks.
+2. **`Conn` interface lives in `relay` package.** Defined at the consumer.
+3. **`PhonesFor` returns a freshly allocated copy.** Callers iterate the snapshot lock-free.
+4. **`UnregisterPhone` calls `ConnID()` under the lock.** The `Conn` contract requires it to be a non-blocking getter.
+5. **`ReleaseServer` does not touch phones.** Orphan phones survive a release; their owner must unregister them.
+6. **Sentinel errors over wrapped errors.** Same idiom as ADR-0001 (#1).
+
+## Rationale
+
+### One RWMutex, not per-map or sharded
+
+Two separate locks (one for `binaries`, one for `phones`) invite ordering bugs (always-A-then-B) for no measurable gain — `RegisterPhone` already needs both, and the critical sections are sub-microsecond map operations. Sharding by server-id is a possible later optimisation but irrelevant at v1: the relay handles tens to hundreds of concurrent connections, not millions, and lock contention is unobserved.
+
+### `Conn` defined in `relay`, not at the consumer
+
+Conventional Go style says "define interfaces at the consumer." Here there is one consumer package — `internal/relay` — and the Conn impls themselves will live in a `relay/ws` subpackage in #4/#5. Defining `Conn` in `relay` avoids an import cycle and matches the practical scope (one consumer).
+
+### `PhonesFor` returns a copy, not a reference
+
+Returning the underlying slice would force callers to broadcast under the registry lock or risk a data race against `RegisterPhone` / `UnregisterPhone`. A copy is cheap (pointer-sized entries, typically 1–3 phones per server-id) and lets callers do the slow thing — `Send` over the network — without holding the lock. The Conn references inside the copy are still live; that's intentional, and it's what makes the broadcast pattern work.
+
+### `ConnID` is a getter, called under the lock
+
+Alternative: copy the slice, drop the lock, scan for the matching `ConnID`, re-acquire the lock to remove. This creates a TOCTOU window — a phone added between snapshot and removal could be erroneously skipped or removed. The cleaner contract is "`ConnID` is a non-blocking getter" stated on the interface; if a future Conn impl violates that, fix the impl. Documented; tested only structurally.
+
+### `ReleaseServer` is narrow
+
+It removes the binary entry and nothing else. The grace ticket (#8) needs phones to remain reachable for 30 seconds across a binary disconnect — so cleanup of orphan phones cannot live inside `ReleaseServer`. Keeping release immediate-and-narrow lets #8 layer policy on top without changing the registry's API. If a future ticket wants auto-close-phones-on-release, it adds a wrapper, not a flag on this method.
+
+### Sentinel errors
+
+Same idiom as ADR-0001. `errors.Is` for branching, no string matching, downstream WS handlers map sentinels → close codes (`4404`, `4409`).
+
+## Consequences
+
+- The registry is the single canonical place for "who is connected." Future routing-layer code reads/writes through it; ad-hoc maps in #4–#10 are a code-review smell.
+- Concurrency is provable from the lock graph (one node) plus the documented invariant that `Send` / `Close` never run under the lock. Race-test coverage (32 goroutines × 200 ops) plus `make test`'s default `-race` mode is the enforcement.
+- The registry is testable without WebSockets — a 5-line `fakeConn` is enough. WS-library choice in #4/#5 doesn't reach back into this code.
+- `Conn` is a contract: any future Conn impl must honour non-blocking `ConnID`, idempotent `Close`, and per-impl write serialisation in `Send`. Those properties belong in #4/#5's spec.
+- Per-server connection caps are explicitly NOT here. #5 owns that.
+- Orphan phones after `ReleaseServer` are a documented state. #8 will own the grace timer that decides when to close them.
+- Linear scan in `UnregisterPhone` is correct for v1 (1–3 phones per server-id). If profiling later flags it, switch to `map[connID]Conn` per server-id; the public API doesn't change.
diff --git a/docs/knowledge/features/connection-registry.md b/docs/knowledge/features/connection-registry.md
new file mode 100644
index 0000000..e38e13d
--- /dev/null
+++ b/docs/knowledge/features/connection-registry.md
@@ -0,0 +1,104 @@
+# Connection registry
+
+The relay's routing core needs one canonical structure that owns "who is connected as what for which server-id." Every routing-layer ticket downstream — `/v1/server` upgrade (#4), `/v1/client` upgrade (#5), frame forwarding (#6), heartbeat (#7), grace period (#8), health (#10) — reads or writes through this registry.
+
+Two non-negotiable shapes flow from `docs/architecture.md`:
+
+1. **server-id → binary is 1:1, first-claim-wins.** A second binary claiming a held server-id is a `4409` conflict.
+2. **server-id → phones is 1:N, gated on a binary holding the slot.** A phone whose server-id has no binary is a `4404` no-server.
+
+The registry is a **passive in-memory store**. It holds opaque `Conn` handles, never speaks WebSocket, never invokes `Send` / `Close` on the conns it tracks, and never times anything out. The 30-second grace period on binary disconnect (#8) wraps the registry rather than living inside it.
+
+## API
+
+Package `internal/relay` (`registry.go`):
+
+```go
+type Conn interface {
+    ConnID() string         // stable, non-blocking; called under the registry lock
+    Send(msg []byte) error  // never called under the registry lock
+    Close()                 // idempotent; never called under the registry lock
+}
+
+type Registry struct { /* sync.RWMutex + binaries map + phones map */ }
+
+func NewRegistry() *Registry
+
+func (r *Registry) ClaimServer(serverID string, conn Conn) error
+func (r *Registry) ReleaseServer(serverID string) (released bool)
+func (r *Registry) RegisterPhone(serverID string, conn Conn) error
+func (r *Registry) UnregisterPhone(serverID string, connID string)
+func (r *Registry) BinaryFor(serverID string) (Conn, bool)
+func (r *Registry) PhonesFor(serverID string) []Conn
+func (r *Registry) Counts() (binaries, phones int)
+```
+
+Sentinel errors (branch with `errors.Is`):
+
+| Error | Returned by | Maps to |
+|---|---|---|
+| `ErrServerIDConflict` | `ClaimServer` when slot already held | WS close `4409` |
+| `ErrNoServer` | `RegisterPhone` when no binary holds the slot | WS close `4404` |
+
+The mapping to close codes is informational; the registry doesn't know about WebSockets — the upgrade handlers translate.
+
+## Concurrency model
+
+- **One `sync.RWMutex` covers both maps.** Mutating methods take `Lock`; lookups take `RLock`. Sharding is a possible later optimisation but irrelevant at v1 (tens to hundreds of conns, sub-microsecond critical sections, no measured contention).
+- **Conn callbacks are never invoked under the lock,** with one documented exception: `UnregisterPhone` calls `ConnID()` while holding the write lock to scan for the target. The `Conn` contract requires `ConnID` to be a non-blocking getter.
+- **Broadcast pattern.** A caller wanting to fan out a frame to all phones for a server-id calls `PhonesFor` (returns a freshly allocated copy) and iterates the snapshot calling `Send` per conn — no registry lock held during I/O.
+- **No callbacks, no channels.** The registry is a passive store. Notification of binary loss / new phone arrival happens elsewhere (the WS upgrade handlers and #8's grace timer).
+
+## Invariants the registry enforces
+
+- `ClaimServer` is atomic check-then-set under the write lock; concurrent claims for the same server-id cannot both succeed.
+- `RegisterPhone` is atomic check-binary-then-append under the write lock; a phone cannot land in the map for a server-id whose binary was released between check and append.
+- `UnregisterPhone` removes by `ConnID` via swap-and-truncate, nils the trailing slot for GC hygiene, and `delete`s the map key when the slice empties — no orphaned empty slices.
+- `PhonesFor` returns nil for unknown server-ids or empty slices — callers don't have to distinguish "unknown" from "known with zero phones."
+- `PhonesFor` returns a copy: mutating the returned slice cannot affect registry state.
+- `Counts` is internally consistent for one call (read under RLock); two concurrent calls may observe different values.
+
+## What the registry deliberately does NOT do
+
+- **Does not close `Conn` handles.** `ReleaseServer` and `UnregisterPhone` only remove map entries. Connection lifetime is owned by the WS upgrade handlers.
+- **Does not deduplicate phones by `ConnID`.** Caller invariant: each `Conn` is registered once.
+- **Does not bound the per-server-id phone slice.** Per-server caps belong to the WS upgrade layer (#5).
+- **Does not propagate binary loss to phones.** When `ReleaseServer` runs, the phones slice for that server-id is left in place. The owner of phones must `UnregisterPhone` (and choose whether to close them) when their server's binary is gone. This is intentional: the grace ticket (#8) needs phones to remain reachable across the 30-second disconnect window.
+- **Does not validate `serverID` content.** Length, charset, prefix — owned by `x-pyrycode-server` header validation in #4. The registry treats any string as an opaque map key.
+
+## Adversarial framing
+
+The registry has no networking and no direct exposure to adversarial input. Adversaries reach it only through future tickets (#4/#5) that pass strings and `Conn` handles into its API. Notes for the layers above:
+
+- **`serverID` of pathological size or content** is harmless to the registry (just a map key) but the WS upgrade layer must cap header lengths.
+- **`Conn.ConnID()` returning unstable values** would cause `UnregisterPhone` to remove the wrong phone (or none). Documented as a contract on the `Conn` interface; violation is a leak, not a cross-tenant data leak.
+- **Repeated `RegisterPhone` for the same server-id** is unbounded memory growth, gated only by the requirement that a binary holds the slot. Per-server connection caps are #5's job.
+- **Slow `ConnID` blocking the write lock** is the cost of the documented non-blocking contract; copying the slice and scanning outside the lock would create a TOCTOU window where a phone could be added between snapshot and removal.
+
+## Testing
+
+`internal/relay/registry_test.go`, `package relay`. Tests use a minimal `fakeConn` (id, sent buffer, closed flag — no synchronisation; the race test gives each goroutine its own fakes).
+
+Functional cases (one subtest per AC bullet):
+
+- `ClaimServer` first-wins / second-conflicts; `BinaryFor` still resolves to the first conn.
+- `ReleaseServer` reclaim — release returns `true`, second release of an unheld id returns `false`, claim again with a different conn succeeds.
+- `RegisterPhone` requires a binary (`ErrNoServer` until `ClaimServer`).
+- `UnregisterPhone` removes by `ConnID`, leaves siblings intact, no-op on unknown id.
+- `PhonesFor` snapshot isolation: mutating the returned slice doesn't affect the registry; later registrations don't appear in earlier snapshots.
+- `Counts` across the full lifecycle including the `ReleaseServer` orphan case (`(0, 1)` while phones survive a release).
+- `UnregisterPhone` deletes the empty-slice map entry.
+
+Race coverage: one `TestRegistry_RaceFreedom` hammers `ClaimServer` / `BinaryFor` / `RegisterPhone` / `PhonesFor` / `UnregisterPhone` / `Counts` / `ReleaseServer` from 32 goroutines × 200 ops over four contended server-ids. The test deliberately ignores returned errors — `ErrServerIDConflict` under contention is expected behaviour, not a failure. The point is the absence of `DATA RACE` reports.
+
+`make test` runs `-race` by default. The `-count=20` invocation in the AC is a manual stress command, documented in the test's doc comment, not a knob in the test code:
+
+```sh
+go test -race -count=20 -run TestRegistry_RaceFreedom ./internal/relay
+```
+
+## Related
+
+- [ADR-0003: Connection registry as a passive store](../decisions/0003-connection-registry-passive-store.md) — single RWMutex, snapshot returns, no callbacks, why grace logic lives outside.
+- [Routing envelope](routing-envelope.md) — the wrapper used by frame forwarding (#6) once the registry is wired up.
+- [Architecture overview](../../architecture.md) — where the registry fits in the data flow.
diff --git a/docs/lessons.md b/docs/lessons.md
index de5bd4a..d93fbe7 100644
--- a/docs/lessons.md
+++ b/docs/lessons.md
@@ -14,6 +14,18 @@ When unmarshalling `{"frame": null}` into a struct with `Frame json.RawMessage`,
 
 `encoding/json` re-encodes `RawMessage` through the outer marshaller and may strip insignificant whitespace. Tests asserting opacity should canonicalise both sides with `json.Compact` before `bytes.Equal`, otherwise they flake on formatting. Source: routing-envelope tests (#1).
 
+## Slice swap-and-truncate needs an explicit nil for GC hygiene
+
+Removing an element from `[]Conn` (or any slice of pointer-bearing values) by `slice[i] = slice[len(slice)-1]; slice = slice[:len(slice)-1]` leaves the original last element reachable from the underlying array's spare capacity, pinning whatever it references in memory. Set `slice[len(slice)-1] = nil` *before* re-slicing. Source: `Registry.UnregisterPhone` (#3). Catches: connection handles can be heavy structs; a leaked `Conn` keeps a goroutine and a socket buffer alive.
+
+## Race-test count is a CI-runner knob, not test code
+
+The AC asked for `go test -race -count=20` coverage. `count` belongs on the command line, not as a loop inside the test. `make test` already runs `-race`; document the manual stress invocation in the test's doc comment so future contributors know how to reproduce locally. Source: `TestRegistry_RaceFreedom` (#3).
+
+## A passive registry returns copies, callers do the slow work outside the lock
+
+When a shared map holds connection handles (or anything you'll iterate to do I/O over), the read accessor should allocate and return a fresh copy. Callers can then `Send` / `Close` / fan out without holding the lock — and concurrent mutators don't trip the iteration. The cost is one allocation per call; the benefit is no foot-gun where a slow `Send` blocks every other registry caller. Source: `Registry.PhonesFor` (#3).
+
 ## `autocert.Manager.HTTPHandler(nil)` redirects to HTTPS — it does not 404
 
 The autocert docs and the `nil` argument are easy to misread as "no fallback → 404." Actual behaviour: GET/HEAD get a `302` to HTTPS; other methods get `400`. To get an explicit 404 (or any other handler), pass it explicitly: `mgr.HTTPHandler(http.NotFoundHandler())`. See ADR-0002. Source: autocert TLS wiring (#9).
diff --git a/docs/specs/architecture/3-connection-registry.md b/docs/specs/architecture/3-connection-registry.md
new file mode 100644
index 0000000..ad38dda
--- /dev/null
+++ b/docs/specs/architecture/3-connection-registry.md
@@ -0,0 +1,399 @@
+# Spec — Connection registry (#3)
+
+## Files to read first
+
+- `internal/relay/envelope.go` — sentinel-error idiom, doc-comment style, `package relay` framing. The new file mirrors this shape.
+- `internal/relay/envelope_test.go` — `package relay` (not `_test`) so tests can `errors.Is` against unexported helpers; table-driven rejections via `errors.Is`.
+- `internal/relay/doc.go` — package doc explicitly names "per-server connection registry" as a routing-core concern. The new file is its second pillar.
+- `docs/architecture.md:18-23` — what the relay does: 1:1 server-id → binary, 1:N server-id → phones, frame forwarding. The registry's invariants flow from these bullets.
+- `docs/PROJECT-MEMORY.md:22-26` — established patterns: sentinel errors via `errors.Is`, stdlib only, package-internal tests.
+- `Makefile:13-14` — `make test` runs `go test -race ./...`. Race coverage isn't a CI nicety here; it's the default mode every PR is judged against.
+- `go.mod` — Go 1.26.2, stdlib only. `sync` and `errors` are the only imports the production file needs.
+
+## Context
+
+The relay's routing core needs a single canonical structure that owns "who is connected as what for which server-id." Every routing-layer ticket downstream — `/v1/server` upgrade (#4), `/v1/client` upgrade (#5), frame forwarding (#6), heartbeat (#7), grace period (#8), health (#10) — reads or writes through it. Without this type each downstream ticket would invent its own ad-hoc map and lock; with it they share one contract and one race-tested implementation.
+
+Two non-negotiable shapes (per the protocol spec and `docs/architecture.md`):
+
+1. **Server-id → binary is 1:1, first-claim-wins.** A second binary claiming a held server-id is a `4409` conflict. The 30-second grace period on disconnect is a separate ticket (#8) and **wraps** the registry rather than living inside it; the registry itself implements immediate semantics (claim/release with no timer).
+2. **Server-id → phones is 1:N, gated on a binary holding the slot.** A phone whose server-id has no binary is a `4404` no-server. Phones can only be registered when the slot is held.
+
+The registry holds **opaque connection handles** (a `Conn` interface) so the WS upgrade tickets can pick the WebSocket library independently and tests can use mocks. Concrete `Conn` implementations land in #4 and #5; this ticket ships only the interface.
+
+## Design
+
+### Package & files
+
+- New file: `internal/relay/registry.go`
+- New test file: `internal/relay/registry_test.go`
+- Both `package relay` — same convention as `envelope.go` / `envelope_test.go`. Tests can branch on unexported state via `errors.Is` and reach package-private helpers if needed.
+
+No edits to `doc.go`, `envelope.go`, `cmd/pyrycode-relay/main.go`, or `docs/`. The registry has no consumers yet; wiring lands in #4/#5.
+
+### The `Conn` interface
+
+```go
+// Conn is the registry's view of a WebSocket connection. Real implementations
+// land in the /v1/server (#4) and /v1/client (#5) upgrade tickets; tests use
+// mocks.
+//
+// All three methods MUST be safe to call from any goroutine. ConnID and Send
+// MUST NOT block on locks the registry itself takes — the registry never
+// invokes Conn methods while holding its own lock, but callers iterating a
+// PhonesFor snapshot may.
+type Conn interface {
+    // ConnID returns the relay-assigned connection identifier. Stable for the
+    // lifetime of the connection. The format is decided in a later ticket;
+    // the registry treats it as an opaque key for slice lookups.
+    ConnID() string
+
+    // Send delivers a fully-formed wire message. Implementations are
+    // responsible for serialising writes to the underlying socket. Errors
+    // are returned to the caller; the registry does not interpret them.
+    Send(msg []byte) error
+
+    // Close terminates the connection. Idempotent. Safe to call concurrently
+    // with Send (the implementation must handle the race).
+    Close()
+}
+```
+
+Three methods, named precisely after the AC. The interface is defined in the registry's package because consumers (frame forwarding, etc.) all live in `internal/relay`; defining at the consumer is moot when there is one consumer package. (Keeping the interface in `relay` also avoids an import cycle once `cmd/pyrycode-relay/main.go` wires up real implementations from a `relay/ws` subpackage in #4/#5.)
+
+### The `Registry` type
+
+```go
+// Registry maps server-ids to a single binary connection (1:1) and to a list
+// of phone connections (1:N). All methods are safe to call concurrently.
+//
+// The registry holds Conn handles by reference and never inspects their
+// internal state. It does not call any Conn method while holding its own
+// lock — close/send-fan-out is the caller's responsibility, performed
+// against a PhonesFor snapshot.
+type Registry struct {
+    mu       sync.RWMutex
+    binaries map[string]Conn
+    phones   map[string][]Conn
+}
+
+// NewRegistry constructs an empty Registry.
+func NewRegistry() *Registry
+```
+
+One `sync.RWMutex` covers both maps. Sharding by server-id is a possible later optimisation but irrelevant at v1 — the relay handles tens to hundreds of concurrent connections, not millions, and the lock is held for sub-microsecond critical sections (map lookup, slice append/copy). Two separate locks would invite ordering bugs (always-A-then-B) for no measurable gain.
+
+### Methods
+
+```go
+// ClaimServer registers conn as the binary for serverID. First-claim-wins:
+// the second concurrent caller for the same serverID receives
+// ErrServerIDConflict. The conflicting caller's conn is left untouched
+// (registry does not Close it).
+//
+// Use ReleaseServer to free the slot. ClaimServer does NOT inspect or
+// modify the phones slice for serverID; phones registered before a release
+// remain in the map until they are explicitly UnregisterPhone'd by their
+// owner, or removed by a higher-layer cleanup (#8 grace period).
+func (r *Registry) ClaimServer(serverID string, conn Conn) error
+
+// ReleaseServer removes the binary entry for serverID. Returns true if a
+// binary held the slot, false otherwise. Does NOT close the connection;
+// caller owns conn lifecycle. Does NOT touch the phones slice — see the
+// "Open questions" section for why this is intentional and how #8 will
+// layer cleanup on top.
+func (r *Registry) ReleaseServer(serverID string) (released bool)
+
+// RegisterPhone appends conn to the phones slice for serverID. Returns
+// ErrNoServer if no binary currently holds serverID. Does not deduplicate
+// by ConnID — the caller is responsible for not registering the same
+// phone twice.
+func (r *Registry) RegisterPhone(serverID string, conn Conn) error
+
+// UnregisterPhone removes the phone whose ConnID equals connID from the
+// slice for serverID. No-op if no matching phone is present (including
+// when no slice exists for serverID). Does NOT close the connection;
+// caller owns conn lifecycle.
+//
+// When the slice becomes empty as a result, the entry is deleted from the
+// phones map so Counts and PhonesFor see no orphaned empty slices.
+func (r *Registry) UnregisterPhone(serverID string, connID string)
+
+// BinaryFor returns the binary holding serverID, if any.
+func (r *Registry) BinaryFor(serverID string) (Conn, bool)
+
+// PhonesFor returns a snapshot of the phones registered for serverID.
+// The returned slice is freshly allocated; the caller may iterate, append,
+// or otherwise mutate it without affecting the registry's internal state
+// or holding any registry lock. Returns nil for an unknown serverID.
+//
+// The Conn handles inside the slice are the same references the registry
+// holds; calling Send/Close on them affects the live connection.
+func (r *Registry) PhonesFor(serverID string) []Conn
+
+// Counts returns the number of binaries currently claimed and the total
+// number of phone connections summed across all server-ids. For #10's
+// health endpoint.
+func (r *Registry) Counts() (binaries, phones int)
+```
+
+### Sentinel errors
+
+```go
+var (
+    // ErrServerIDConflict is returned by ClaimServer when serverID is
+    // already held by another binary. Maps to WS close code 4409.
+    ErrServerIDConflict = errors.New("relay: server-id already claimed")
+
+    // ErrNoServer is returned by RegisterPhone when no binary holds the
+    // requested serverID. Maps to WS close code 4404.
+    ErrNoServer = errors.New("relay: no binary for server-id")
+)
+```
+
+Same idiom as `envelope.go`: callers use `errors.Is`. The mapping to close codes is informational; the registry doesn't know about WebSockets and won't return close codes itself — the WS upgrade handlers do that translation.
+
+### Algorithms
+
+**ClaimServer** — write lock, single-check map, set or error.
+
+```
+mu.Lock(); defer mu.Unlock()
+if _, ok := binaries[serverID]; ok { return ErrServerIDConflict }
+binaries[serverID] = conn
+return nil
+```
+
+**ReleaseServer** — write lock, delete-and-report.
+
+```
+mu.Lock(); defer mu.Unlock()
+if _, ok := binaries[serverID]; !ok { return false }
+delete(binaries, serverID)
+return true
+```
+
+**RegisterPhone** — write lock, gate on binary presence, append.
+
+```
+mu.Lock(); defer mu.Unlock()
+if _, ok := binaries[serverID]; !ok { return ErrNoServer }
+phones[serverID] = append(phones[serverID], conn)
+return nil
+```
+
+**UnregisterPhone** — write lock, linear scan by ConnID, remove preserving order is unnecessary; use swap-and-truncate.
+
+```
+mu.Lock(); defer mu.Unlock()
+slice, ok := phones[serverID]
+if !ok { return }
+for i, c := range slice {
+    if c.ConnID() == connID {
+        slice[i] = slice[len(slice)-1]
+        slice[len(slice)-1] = nil  // GC hygiene; Conn may be a heavy struct.
+        slice = slice[:len(slice)-1]
+        if len(slice) == 0 {
+            delete(phones, serverID)
+        } else {
+            phones[serverID] = slice
+        }
+        return
+    }
+}
+```
+
+ConnID is called *while holding the lock* — that's allowed because the contract on `Conn` says `ConnID` must not block (it's a getter). `Send` and `Close`, which can block on the network, are never called from inside any registry method.
+
+Linear scan is correct for v1: a single server-id holds a small number of phones (typically 1–3 per user across devices). If profiling later shows hot spots, switch to `map[connID]Conn` per server-id; the API doesn't change.
+
+**BinaryFor** — read lock, map lookup.
+
+```
+mu.RLock(); defer mu.RUnlock()
+c, ok := binaries[serverID]
+return c, ok
+```
+
+**PhonesFor** — read lock, copy.
+
+```
+mu.RLock(); defer mu.RUnlock()
+src := phones[serverID]
+if len(src) == 0 { return nil }
+out := make([]Conn, len(src))
+copy(out, src)
+return out
+```
+
+The `nil` return on empty is deliberate: callers that range over the result iterate zero times either way; the caller never has to distinguish "unknown server-id" from "known with zero phones" because `RegisterPhone` makes those the same state (the registry only stores non-empty slices, courtesy of `UnregisterPhone`'s `delete` step).
+
+**Counts** — read lock, len + summed iteration.
+
+```
+mu.RLock(); defer mu.RUnlock()
+b := len(binaries)
+p := 0
+for _, s := range phones { p += len(s) }
+return b, p
+```
+
+Iteration is acceptable: `Counts` is called by the health endpoint, not on the hot path. Tracking running totals is a premature optimisation.
+
+### Concurrency model
+
+- **One `sync.RWMutex` for the whole registry.** All write methods take `Lock`; all read methods take `RLock`.
+- **No fairness assumptions.** Go's `sync.RWMutex` does not guarantee starvation-freedom for writers; the registry's API doesn't expose that. The five-method surface area is small enough that lock contention is unobserved in practice.
+- **No callbacks, no channels.** The registry is a passive store. Notification of binary loss / new phone arrival happens elsewhere (the WS upgrade handlers and #8's grace timer).
+- **Conn methods are never invoked under the lock.** Callers wanting to broadcast a frame to all phones for a server-id must:
+  1. Call `PhonesFor` to get a snapshot.
+  2. Iterate the snapshot and call `Send` on each — no registry lock held.
+  This pattern is the entire reason `PhonesFor` returns a copy.
+- **`ConnID()` IS called under the lock** by `UnregisterPhone`. Documented in `Conn`'s doc comment as a non-blocking, lock-free operation.
+
+### Error handling
+
+Two sentinels (`ErrServerIDConflict`, `ErrNoServer`). No wrapping with `fmt.Errorf` is needed — there is no underlying error to chain. No panics. No logging. Library code; the caller chooses the response.
+
+### Lifecycle expectations the registry does NOT enforce
+
+These are listed so the developer doesn't add defensive code for them:
+
+- The registry does not close `Conn` handles on its own. `ReleaseServer` and `UnregisterPhone` remove map entries but never call `Close`. Callers (WS upgrade handlers) own the connection lifetime.
+- The registry does not deduplicate phones by `ConnID` on `RegisterPhone`. Caller invariant: each `Conn` is registered once.
+- The registry does not bound the per-server-id phone slice. Per-server connection limits belong to the WS upgrade layer (rate limiting / cap headers, not this ticket's concern).
+- The registry does not propagate binary loss to phones. When `ReleaseServer` runs, phones for that server-id remain in the map — the layer that owns phones must `UnregisterPhone` them (and choose whether to close them) when their server's binary is gone. This is intentional: the grace ticket (#8) needs that decoupling to keep phones alive across a 30-second window.
+
+## Testing strategy
+
+`internal/relay/registry_test.go`, `package relay`, `testing` only.
+
+### Mock `Conn`
+
+A minimal in-package test helper:
+
+```go
+type fakeConn struct {
+    id     string
+    sent   [][]byte  // optional, for forwarding tests in #6; harmless here.
+    closed bool
+}
+
+func (c *fakeConn) ConnID() string         { return c.id }
+func (c *fakeConn) Send(msg []byte) error  { c.sent = append(c.sent, msg); return nil }
+func (c *fakeConn) Close()                 { c.closed = true }
+```
+
+`fakeConn` has no synchronisation. The tests do not concurrently mutate the same `fakeConn` — the registry's race tests register/unregister distinct fakes per goroutine (or read-only fakes via `BinaryFor`/`PhonesFor`). This is a deliberate scope cut: the registry's race coverage proves the registry is race-free with well-behaved Conns, not that fakeConn is race-free.
+
+### Functional tests (subtest names map 1:1 to AC bullets)
+
+Each is a focused unit test using `errors.Is` for error assertions; no string compare.
+
+1. **`TestClaimServer_FirstWinsSecondConflicts`** — claim succeeds, second claim with a different `Conn` returns `ErrServerIDConflict`, original `BinaryFor` lookup still resolves to the first conn.
+2. **`TestReleaseServer_ReclaimSucceeds`** — claim, release (returns `true`), claim again with a different conn (returns `nil`), `BinaryFor` resolves to the new conn. Then release of an unheld id returns `false`.
+3. **`TestRegisterPhone_RequiresBinary`** — `RegisterPhone("s1", phone)` with no binary returns `ErrNoServer`; after `ClaimServer("s1", bin)`, the same call succeeds.
+4. **`TestUnregisterPhone_RemovesByConnID`** — register three phones, `UnregisterPhone` the middle one, `PhonesFor` returns exactly the other two and the removed phone is absent. `UnregisterPhone` with an unknown connID is a no-op.
+5. **`TestPhonesFor_SnapshotIsolation`** — register two phones, take the snapshot, mutate the snapshot (overwrite an element with a different `*fakeConn`), assert `PhonesFor` again still returns the original two. Also: register a third phone after the snapshot was taken, assert the original snapshot still has length 2.
+6. **`TestCounts_AcrossLifecycle`** — assert `(0,0)` initial, after `ClaimServer` `(1,0)`, after `RegisterPhone × 2` `(1,2)`, after `UnregisterPhone × 1` `(1,1)`, after `ReleaseServer` `(0,1)` (orphan phone retained — see open questions), after the orphan's `UnregisterPhone` `(0,0)`.
+7. **`TestUnregisterPhone_RemovesEmptySlice`** — register one phone, unregister it, assert `PhonesFor` returns `nil` AND the underlying `phones` map has no entry for that server-id (verifiable via `Counts` returning 0 phones; this AC is structural so a length check is sufficient).
+
+### Race coverage
+
+One test, focused on hammering the public API under `-race`:
+
+```go
+func TestRegistry_RaceFreedom(t *testing.T) {
+    t.Parallel()
+    r := NewRegistry()
+
+    const goroutines = 32
+    const opsPer = 200
+
+    var wg sync.WaitGroup
+    wg.Add(goroutines)
+    for g := 0; g < goroutines; g++ {
+        g := g
+        go func() {
+            defer wg.Done()
+            sid := fmt.Sprintf("s-%d", g%4)  // four contended server-ids
+            bin := &fakeConn{id: fmt.Sprintf("b-%d", g)}
+            for i := 0; i < opsPer; i++ {
+                _ = r.ClaimServer(sid, bin)
+                _, _ = r.BinaryFor(sid)
+                _ = r.RegisterPhone(sid, &fakeConn{id: fmt.Sprintf("p-%d-%d", g, i)})
+                _ = r.PhonesFor(sid)
+                r.UnregisterPhone(sid, fmt.Sprintf("p-%d-%d", g, i))
+                _, _ = r.Counts()
+                _ = r.ReleaseServer(sid)
+            }
+        }()
+    }
+    wg.Wait()
+}
+```
+
+The ticket asks for `-race -count=20`. The Makefile already runs `-race`; `count=20` is a CI-level invocation, not a test code knob. **The developer should NOT add `-count` flags to the test file.** Document the manual command in the test's doc comment so future contributors know how to stress it locally:
+
+```go
+// Run with: go test -race -count=20 -run TestRegistry_RaceFreedom ./internal/relay
+```
+
+The race test deliberately ignores returned errors: a `ClaimServer` that returns `ErrServerIDConflict` is expected behaviour under contention, not a test failure. The point is the absence of `DATA RACE` reports from the runtime.
+
+### What we deliberately do not test
+
+- **`Conn.Send` / `Conn.Close` invocation under lock** — the registry never calls them; there is no behaviour to assert. A "lock-not-held" assertion would require introspecting runtime state, which Go does not portably expose.
+- **Per-server-id phone bound** — out of scope; later ticket.
+- **`Conn.ConnID()` panic safety** — registry assumes it doesn't panic; matches the rest of the codebase's "trust your interfaces" stance.
+
+## Open questions
+
+1. **Should `ReleaseServer` clear the phones slice for that server-id?** The AC says it removes the binary entry; nothing about phones. The chosen semantic is **no, leave phones in place** — orphan phones survive a release until their owner unregisters them. Rationale: the grace period (#8) needs phones to remain reachable across a 30-second binary disconnect window, and the simplest way to allow that without coupling #8 into the registry is to keep release immediate-and-narrow. If a future ticket decides orphan phones should be auto-closed on release, it adds a `ReleaseServerAndClosePhones` method or wraps `ReleaseServer` in higher-layer logic — the public API in this ticket stays minimal.
+2. **`Counts` race-vs-snapshot consistency.** `Counts` is read under `RLock` and is internally consistent for one call. Two concurrent calls may observe different values — that's fine for a health endpoint; no caller needs cross-call consistency.
+3. **Should we keep an `Empty()` or `Has(serverID)` helper?** No. `BinaryFor` already returns `(Conn, bool)`; that's the existence test. Adding more helpers is anticipatory.
+
+## Out of scope (re-stated, for the developer)
+
+- No edits to `cmd/pyrycode-relay/main.go`. The registry has no consumers in this ticket.
+- No WS upgrade, no header validation, no `conn_id` generation — see #4, #5, and the dedicated conn-id ticket.
+- No grace period, no heartbeat, no metrics beyond `Counts`. See #7, #8, #10.
+- No external dependencies. `sync`, `errors`, plus `fmt`/`testing`/`sync` for tests.
+
+## Done means
+
+- `internal/relay/registry.go` exists with `Conn`, `Registry`, `NewRegistry`, the seven methods named in the AC, and the two sentinel errors. Every exported symbol carries a doc comment that names its concurrency contract (whether the lock is held, what the caller may/may not do with returned values).
+- `internal/relay/registry_test.go` covers the seven functional cases plus the race-freedom test. The race test runs clean under `go test -race -count=20 -run TestRegistry_RaceFreedom ./internal/relay`.
+- `make vet`, `make test`, `make build` all clean from the repo root.
+- One commit on `feature/3`: `feat(relay): connection registry (#3)`.
+
+---
+
+## Security review (security-sensitive label)
+
+### Threat surface for THIS ticket
+
+The registry has no networking and no direct exposure to adversarial input. Adversaries only reach it through future tickets (#4/#5 WS upgrades) that pass strings (`serverID`, `connID`) and `Conn` handles into its API. The review below confirms the registry's API does not turn well-behaved adversarial input from the WS layer into a security issue — it cannot vouch for the WS layer itself.
+
+### Categories walked
+
+- **Trust boundaries.** The registry trusts (a) `serverID` is whatever the WS upgrade handler decided to accept (header validation is a separate ticket; until it lands, the registry does not pretend to validate IDs); (b) `connID` returned by `Conn.ConnID()` is unique among connections registered for the same server-id (caller invariant — the conn-id generation ticket guarantees this); (c) `Conn` implementations honour the methods' contracts (non-blocking `ConnID`, idempotent `Close`). Each trust is documented in the doc comment of the API surface that depends on it. **Finding:** registry does not validate `serverID` content (length, charset). This is correct for this ticket — validation belongs at the WS upgrade layer (`x-pyrycode-server` header check). The registry's behaviour is well-defined for any string, including empty strings or strings with embedded NULs: they're just map keys. Enforced at the boundary (`internal/relay/registry.go`), not deeper.
+- **Resource exhaustion.** Per-server-id phone slice is unbounded. An adversary who could repeatedly call `RegisterPhone` for the same server-id could grow the slice without bound and consume memory. **Finding:** acceptable in this ticket because (a) `RegisterPhone` is gated on `ErrNoServer`, so an adversary first needs a binary holding the slot — which means the adversary has compromised the binary or its owner connected the binary themselves; (b) the WS upgrade layer (#5) is the natural place to enforce per-server connection caps. Mentioned in "Lifecycle expectations" so #5's developer doesn't assume the registry caps phones.
+- **Race conditions / TOCTOU.** ClaimServer's "check then set" is atomic under the write lock — a second claimant cannot slip between `binaries[serverID]` lookup and assignment. RegisterPhone's "check binary then append" is also atomic under the write lock. Reads (`BinaryFor`, `PhonesFor`, `Counts`) are atomic under the read lock. There is no API that returns a value the caller is supposed to act on before re-entering the registry — which is what would create a TOCTOU window. **Finding:** the API shape eliminates TOCTOU at this layer.
+- **Lock-order / deadlock.** Registry takes only its own `mu`. Never invokes `Send` or `Close` while holding the lock. `ConnID` is invoked under the lock by `UnregisterPhone` — documented contract on `Conn` says `ConnID` must not block. **Finding:** lock graph is a single node; cannot deadlock with itself, cannot deadlock with caller locks because no callbacks are made under the lock except the `ConnID` getter.
+- **Snapshot isolation / data leaks via aliasing.** `PhonesFor` returns a copy via `make` + `copy`. Callers cannot observe in-progress mutation by dereferencing slice header bytes. The Conn handles inside the copy are still the same references the registry holds — that's by design (broadcast pattern), and the Conn's own state is not registry state. **Finding:** matches the AC's stated invariant; tested explicitly in `TestPhonesFor_SnapshotIsolation`.
+- **Memory leaks under adversarial workload.** `UnregisterPhone` deletes the phones map entry when the slice empties, preventing growth of orphaned `[]Conn{}` entries. Released binaries are deleted from the map. The only documented leak path is "phones survive `ReleaseServer`" — addressed in open question (1) and intentionally left for the layer that owns phone lifetime.
+- **Information disclosure.** Counts returns aggregate ints; no per-server-id info. No method returns a Conn that wasn't already known to the caller (you can only retrieve Conns for server-ids you've already registered/claimed). **Finding:** API is not a sidechannel for enumerating other tenants.
+- **Panics / nil-deref.** Maps are initialised in `NewRegistry`. No `*Conn` dereferences (interface, not pointer). No slice index past length. **Finding:** none observed.
+
+### Adversarial framings considered
+
+- *"What if an attacker sends a server-id of 100 KB?"* — Registry treats it as a map key. Memory cost is one allocation per unique key. Capping the header length is the WS upgrade layer's job; the registry does not need to second-guess.
+- *"What if a malicious `Conn` impl returns different `ConnID()` values on each call?"* — `UnregisterPhone` linear-scans by ConnID; if `ConnID()` is unstable, the wrong phone (or no phone) may be removed. The `Conn` doc comment specifies "stable for the lifetime of the connection" — this is a contract the WS handler must honour. If a future Conn impl violates it, the bug manifests as a phone leak in the registry, not a security boundary failure (no other Conn's data is exposed).
+- *"What if a binary disconnects without releasing?"* — Registry holds a stale `Conn` reference until something calls `ReleaseServer`. The grace ticket (#8) is responsible for this; the registry's contract makes no promises about liveness.
+- *"What if two goroutines `UnregisterPhone` the same phone simultaneously?"* — Both take the write lock serially. The first finds and removes; the second finds nothing and is a no-op. No double-free, no panic.
+- *"Slow `Conn.ConnID()` blocks `UnregisterPhone` while holding the lock and starves other callers."* — Yes, that's the cost of the documented contract. The `Conn` doc comment names `ConnID` as non-blocking; if a future impl violates that, fix the impl. The registry could copy the slice and scan outside the lock to defend against this, but that creates a new TOCTOU window (a phone could be added after the snapshot and erroneously not be removed). The cleaner contract — and the one this spec adopts — is "`ConnID` is a getter."
+
+### Verdict: PASS
+
+The registry's API and lock discipline do not introduce vulnerabilities under any adversarial framing examined. The findings flagged above are out-of-scope by design (per-server caps, header validation, conn-id generation) and are documented in the spec so downstream tickets pick them up. The `security-sensitive` label is justified by the registry's role on every routing path; the review is correspondingly thorough even though this ticket itself does no I/O.
diff --git a/internal/relay/registry.go b/internal/relay/registry.go
new file mode 100644
index 0000000..f3256c7
--- /dev/null
+++ b/internal/relay/registry.go
@@ -0,0 +1,188 @@
+package relay
+
+import (
+	"errors"
+	"sync"
+)
+
+// Sentinel errors returned by Registry methods. Callers branch on these with
+// errors.Is. The mapping to WebSocket close codes is informational; the
+// registry has no knowledge of WebSockets and the WS upgrade handlers do
+// the translation.
+var (
+	// ErrServerIDConflict is returned by ClaimServer when serverID is
+	// already held by another binary. Maps to WS close code 4409.
+	ErrServerIDConflict = errors.New("relay: server-id already claimed")
+
+	// ErrNoServer is returned by RegisterPhone when no binary holds the
+	// requested serverID. Maps to WS close code 4404.
+	ErrNoServer = errors.New("relay: no binary for server-id")
+)
+
+// Conn is the registry's view of a WebSocket connection. Real implementations
+// land in the /v1/server (#4) and /v1/client (#5) upgrade tickets; tests use
+// mocks.
+//
+// All three methods MUST be safe to call from any goroutine. ConnID and Send
+// MUST NOT block on locks the registry itself takes — the registry never
+// invokes Send or Close while holding its own lock, but it does call ConnID
+// under the lock during UnregisterPhone, so ConnID must be a non-blocking
+// getter.
+type Conn interface {
+	// ConnID returns the relay-assigned connection identifier. Stable for
+	// the lifetime of the connection. The registry treats it as an opaque
+	// key for slice lookups and calls it under its internal lock, so the
+	// implementation must not block.
+	ConnID() string
+
+	// Send delivers a fully-formed wire message. Implementations are
+	// responsible for serialising writes to the underlying socket. Errors
+	// are returned to the caller; the registry does not interpret them.
+	Send(msg []byte) error
+
+	// Close terminates the connection. Idempotent. Safe to call
+	// concurrently with Send (the implementation must handle the race).
+	Close()
+}
+
+// Registry maps server-ids to a single binary connection (1:1) and to a list
+// of phone connections (1:N). All methods are safe to call concurrently.
+//
+// The registry holds Conn handles by reference and never inspects their
+// internal state. It does not call Send or Close on a Conn while holding its
+// own lock — broadcast / close fan-out is the caller's responsibility,
+// performed against a PhonesFor snapshot.
+type Registry struct {
+	mu       sync.RWMutex
+	binaries map[string]Conn
+	phones   map[string][]Conn
+}
+
+// NewRegistry constructs an empty Registry.
+func NewRegistry() *Registry {
+	return &Registry{
+		binaries: make(map[string]Conn),
+		phones:   make(map[string][]Conn),
+	}
+}
+
+// ClaimServer registers conn as the binary for serverID. First-claim-wins:
+// a second concurrent caller for the same serverID receives
+// ErrServerIDConflict and the conflicting caller's conn is left untouched
+// (the registry does not Close it).
+//
+// Use ReleaseServer to free the slot. ClaimServer does NOT inspect or modify
+// the phones slice for serverID; phones registered before a release remain
+// in the map until they are explicitly UnregisterPhone'd by their owner, or
+// removed by a higher-layer cleanup (#8 grace period).
+func (r *Registry) ClaimServer(serverID string, conn Conn) error {
+	r.mu.Lock()
+	defer r.mu.Unlock()
+	if _, ok := r.binaries[serverID]; ok {
+		return ErrServerIDConflict
+	}
+	r.binaries[serverID] = conn
+	return nil
+}
+
+// ReleaseServer removes the binary entry for serverID. Returns true if a
+// binary held the slot, false otherwise. Does NOT close the connection;
+// the caller owns conn lifecycle. Does NOT touch the phones slice — orphan
+// phones survive a release until their owner unregisters them, so the grace
+// period (#8) can keep phones reachable across a binary disconnect window.
+func (r *Registry) ReleaseServer(serverID string) (released bool) {
+	r.mu.Lock()
+	defer r.mu.Unlock()
+	if _, ok := r.binaries[serverID]; !ok {
+		return false
+	}
+	delete(r.binaries, serverID)
+	return true
+}
+
+// RegisterPhone appends conn to the phones slice for serverID. Returns
+// ErrNoServer if no binary currently holds serverID. The registry does not
+// deduplicate by ConnID — the caller is responsible for not registering the
+// same phone twice.
+func (r *Registry) RegisterPhone(serverID string, conn Conn) error {
+	r.mu.Lock()
+	defer r.mu.Unlock()
+	if _, ok := r.binaries[serverID]; !ok {
+		return ErrNoServer
+	}
+	r.phones[serverID] = append(r.phones[serverID], conn)
+	return nil
+}
+
+// UnregisterPhone removes the phone whose ConnID equals connID from the
+// slice for serverID. No-op if no matching phone is present (including when
+// no slice exists for serverID). Does NOT close the connection; the caller
+// owns conn lifecycle.
+//
+// When the slice becomes empty as a result, the entry is deleted from the
+// phones map so Counts and PhonesFor see no orphaned empty slices.
+func (r *Registry) UnregisterPhone(serverID string, connID string) {
+	r.mu.Lock()
+	defer r.mu.Unlock()
+	slice, ok := r.phones[serverID]
+	if !ok {
+		return
+	}
+	for i, c := range slice {
+		if c.ConnID() != connID {
+			continue
+		}
+		slice[i] = slice[len(slice)-1]
+		slice[len(slice)-1] = nil
+		slice = slice[:len(slice)-1]
+		if len(slice) == 0 {
+			delete(r.phones, serverID)
+		} else {
+			r.phones[serverID] = slice
+		}
+		return
+	}
+}
+
+// BinaryFor returns the binary holding serverID, if any.
+func (r *Registry) BinaryFor(serverID string) (Conn, bool) {
+	r.mu.RLock()
+	defer r.mu.RUnlock()
+	c, ok := r.binaries[serverID]
+	return c, ok
+}
+
+// PhonesFor returns a snapshot of the phones registered for serverID. The
+// returned slice is freshly allocated; the caller may iterate, append, or
+// otherwise mutate it without affecting the registry's internal state or
+// holding any registry lock. Returns nil for an unknown serverID or when no
+// phones are registered.
+//
+// The Conn handles inside the slice are the same references the registry
+// holds; calling Send or Close on them affects the live connection.
+func (r *Registry) PhonesFor(serverID string) []Conn {
+	r.mu.RLock()
+	defer r.mu.RUnlock()
+	src := r.phones[serverID]
+	if len(src) == 0 {
+		return nil
+	}
+	out := make([]Conn, len(src))
+	copy(out, src)
+	return out
+}
+
+// Counts returns the number of binaries currently claimed and the total
+// number of phone connections summed across all server-ids. For the health
+// endpoint (#10). One call is internally consistent; two concurrent calls
+// may observe different values.
+func (r *Registry) Counts() (binaries, phones int) {
+	r.mu.RLock()
+	defer r.mu.RUnlock()
+	b := len(r.binaries)
+	p := 0
+	for _, s := range r.phones {
+		p += len(s)
+	}
+	return b, p
+}
diff --git a/internal/relay/registry_test.go b/internal/relay/registry_test.go
new file mode 100644
index 0000000..b1cf0e4
--- /dev/null
+++ b/internal/relay/registry_test.go
@@ -0,0 +1,255 @@
+package relay
+
+import (
+	"errors"
+	"fmt"
+	"sync"
+	"testing"
+)
+
+// fakeConn is a minimal in-package Conn for tests. The registry's race
+// coverage proves the registry is race-free with well-behaved Conns; it does
+// not vouch for fakeConn under concurrent mutation, so tests register
+// distinct fakes per goroutine.
+type fakeConn struct {
+	id     string
+	sent   [][]byte
+	closed bool
+}
+
+func (c *fakeConn) ConnID() string        { return c.id }
+func (c *fakeConn) Send(msg []byte) error { c.sent = append(c.sent, msg); return nil }
+func (c *fakeConn) Close()                { c.closed = true }
+
+func TestClaimServer_FirstWinsSecondConflicts(t *testing.T) {
+	t.Parallel()
+	r := NewRegistry()
+
+	first := &fakeConn{id: "b-1"}
+	second := &fakeConn{id: "b-2"}
+
+	if err := r.ClaimServer("s1", first); err != nil {
+		t.Fatalf("first ClaimServer: unexpected err: %v", err)
+	}
+	if err := r.ClaimServer("s1", second); !errors.Is(err, ErrServerIDConflict) {
+		t.Fatalf("second ClaimServer: got %v, want errors.Is(_, ErrServerIDConflict)", err)
+	}
+	got, ok := r.BinaryFor("s1")
+	if !ok {
+		t.Fatal("BinaryFor: missing entry after conflict")
+	}
+	if got.ConnID() != "b-1" {
+		t.Errorf("BinaryFor: got conn id %q, want %q (original)", got.ConnID(), "b-1")
+	}
+}
+
+func TestReleaseServer_ReclaimSucceeds(t *testing.T) {
+	t.Parallel()
+	r := NewRegistry()
+
+	first := &fakeConn{id: "b-1"}
+	second := &fakeConn{id: "b-2"}
+
+	if err := r.ClaimServer("s1", first); err != nil {
+		t.Fatalf("ClaimServer: %v", err)
+	}
+	if released := r.ReleaseServer("s1"); !released {
+		t.Fatal("ReleaseServer: got false, want true")
+	}
+	if err := r.ClaimServer("s1", second); err != nil {
+		t.Fatalf("re-ClaimServer: %v", err)
+	}
+	got, ok := r.BinaryFor("s1")
+	if !ok || got.ConnID() != "b-2" {
+		t.Errorf("BinaryFor after reclaim: got (%v, %v), want (b-2, true)", got, ok)
+	}
+	if released := r.ReleaseServer("unknown"); released {
+		t.Error("ReleaseServer of unheld id: got true, want false")
+	}
+}
+
+func TestRegisterPhone_RequiresBinary(t *testing.T) {
+	t.Parallel()
+	r := NewRegistry()
+
+	phone := &fakeConn{id: "p-1"}
+	if err := r.RegisterPhone("s1", phone); !errors.Is(err, ErrNoServer) {
+		t.Fatalf("RegisterPhone without binary: got %v, want errors.Is(_, ErrNoServer)", err)
+	}
+
+	if err := r.ClaimServer("s1", &fakeConn{id: "b-1"}); err != nil {
+		t.Fatalf("ClaimServer: %v", err)
+	}
+	if err := r.RegisterPhone("s1", phone); err != nil {
+		t.Fatalf("RegisterPhone after binary: unexpected err: %v", err)
+	}
+}
+
+func TestUnregisterPhone_RemovesByConnID(t *testing.T) {
+	t.Parallel()
+	r := NewRegistry()
+
+	if err := r.ClaimServer("s1", &fakeConn{id: "b-1"}); err != nil {
+		t.Fatalf("ClaimServer: %v", err)
+	}
+	p1 := &fakeConn{id: "p-1"}
+	p2 := &fakeConn{id: "p-2"}
+	p3 := &fakeConn{id: "p-3"}
+	for _, p := range []*fakeConn{p1, p2, p3} {
+		if err := r.RegisterPhone("s1", p); err != nil {
+			t.Fatalf("RegisterPhone %q: %v", p.id, err)
+		}
+	}
+
+	r.UnregisterPhone("s1", "p-2")
+
+	got := r.PhonesFor("s1")
+	if len(got) != 2 {
+		t.Fatalf("PhonesFor: got len=%d, want 2", len(got))
+	}
+	ids := map[string]bool{}
+	for _, c := range got {
+		ids[c.ConnID()] = true
+	}
+	if !ids["p-1"] || !ids["p-3"] || ids["p-2"] {
+		t.Errorf("PhonesFor: got ids=%v, want {p-1, p-3}", ids)
+	}
+
+	r.UnregisterPhone("s1", "does-not-exist")
+	if got := r.PhonesFor("s1"); len(got) != 2 {
+		t.Errorf("PhonesFor after no-op unregister: got len=%d, want 2", len(got))
+	}
+	r.UnregisterPhone("unknown-server", "p-1")
+}
+
+func TestPhonesFor_SnapshotIsolation(t *testing.T) {
+	t.Parallel()
+	r := NewRegistry()
+
+	if err := r.ClaimServer("s1", &fakeConn{id: "b-1"}); err != nil {
+		t.Fatalf("ClaimServer: %v", err)
+	}
+	p1 := &fakeConn{id: "p-1"}
+	p2 := &fakeConn{id: "p-2"}
+	if err := r.RegisterPhone("s1", p1); err != nil {
+		t.Fatalf("RegisterPhone p1: %v", err)
+	}
+	if err := r.RegisterPhone("s1", p2); err != nil {
+		t.Fatalf("RegisterPhone p2: %v", err)
+	}
+
+	snap := r.PhonesFor("s1")
+	if len(snap) != 2 {
+		t.Fatalf("snap len: got %d, want 2", len(snap))
+	}
+	snap[0] = &fakeConn{id: "evil"}
+
+	again := r.PhonesFor("s1")
+	if len(again) != 2 {
+		t.Fatalf("again len: got %d, want 2", len(again))
+	}
+	for _, c := range again {
+		if c.ConnID() == "evil" {
+			t.Fatal("registry observed mutation through snapshot")
+		}
+	}
+
+	if err := r.RegisterPhone("s1", &fakeConn{id: "p-3"}); err != nil {
+		t.Fatalf("RegisterPhone p3: %v", err)
+	}
+	if len(snap) != 2 {
+		t.Errorf("original snapshot length changed: got %d, want 2", len(snap))
+	}
+}
+
+func TestCounts_AcrossLifecycle(t *testing.T) {
+	t.Parallel()
+	r := NewRegistry()
+
+	assert := func(stage string, wantB, wantP int) {
+		t.Helper()
+		b, p := r.Counts()
+		if b != wantB || p != wantP {
+			t.Errorf("%s: Counts=(%d,%d), want (%d,%d)", stage, b, p, wantB, wantP)
+		}
+	}
+
+	assert("initial", 0, 0)
+	if err := r.ClaimServer("s1", &fakeConn{id: "b-1"}); err != nil {
+		t.Fatalf("ClaimServer: %v", err)
+	}
+	assert("after claim", 1, 0)
+
+	if err := r.RegisterPhone("s1", &fakeConn{id: "p-1"}); err != nil {
+		t.Fatalf("RegisterPhone p1: %v", err)
+	}
+	if err := r.RegisterPhone("s1", &fakeConn{id: "p-2"}); err != nil {
+		t.Fatalf("RegisterPhone p2: %v", err)
+	}
+	assert("after 2 phones", 1, 2)
+
+	r.UnregisterPhone("s1", "p-1")
+	assert("after unregister 1", 1, 1)
+
+	if released := r.ReleaseServer("s1"); !released {
+		t.Fatal("ReleaseServer: got false, want true")
+	}
+	// Orphan phones survive ReleaseServer by design (see open question 1).
+	assert("after release (orphan retained)", 0, 1)
+
+	r.UnregisterPhone("s1", "p-2")
+	assert("after orphan unregister", 0, 0)
+}
+
+func TestUnregisterPhone_RemovesEmptySlice(t *testing.T) {
+	t.Parallel()
+	r := NewRegistry()
+
+	if err := r.ClaimServer("s1", &fakeConn{id: "b-1"}); err != nil {
+		t.Fatalf("ClaimServer: %v", err)
+	}
+	if err := r.RegisterPhone("s1", &fakeConn{id: "p-1"}); err != nil {
+		t.Fatalf("RegisterPhone: %v", err)
+	}
+	r.UnregisterPhone("s1", "p-1")
+
+	if got := r.PhonesFor("s1"); got != nil {
+		t.Errorf("PhonesFor after empty: got %v, want nil", got)
+	}
+	if _, p := r.Counts(); p != 0 {
+		t.Errorf("phones count after empty: got %d, want 0", p)
+	}
+}
+
+// TestRegistry_RaceFreedom hammers the public API from many goroutines and
+// asserts the absence of DATA RACE reports under -race.
+//
+// Run with: go test -race -count=20 -run TestRegistry_RaceFreedom ./internal/relay
+func TestRegistry_RaceFreedom(t *testing.T) {
+	t.Parallel()
+	r := NewRegistry()
+
+	const goroutines = 32
+	const opsPer = 200
+
+	var wg sync.WaitGroup
+	wg.Add(goroutines)
+	for g := 0; g < goroutines; g++ {
+		g := g
+		go func() {
+			defer wg.Done()
+			sid := fmt.Sprintf("s-%d", g%4)
+			bin := &fakeConn{id: fmt.Sprintf("b-%d", g)}
+			for i := 0; i < opsPer; i++ {
+				_ = r.ClaimServer(sid, bin)
+				_, _ = r.BinaryFor(sid)
+				_ = r.RegisterPhone(sid, &fakeConn{id: fmt.Sprintf("p-%d-%d", g, i)})
+				_ = r.PhonesFor(sid)
+				r.UnregisterPhone(sid, fmt.Sprintf("p-%d-%d", g, i))
+				_, _ = r.Counts()
+				_ = r.ReleaseServer(sid)
+			}
+		}()
+	}
+	wg.Wait()
+}
diff --git a/pyrycode-relay b/pyrycode-relay
new file mode 100755
index 0000000..9fb20d4
Binary files /dev/null and b/pyrycode-relay differ