Skip to content

feat(relay): connection registry (#3)#12

Merged
ilmoniemi merged 5 commits into
mainfrom
feature/3
May 8, 2026
Merged

feat(relay): connection registry (#3)#12
ilmoniemi merged 5 commits into
mainfrom
feature/3

Conversation

@ilmoniemi
Copy link
Copy Markdown
Contributor

What

Adds internal/relay/registry.go with:

Lock discipline: Send and Close are never invoked while the registry holds its mutex. ConnID is invoked under the lock by UnregisterPhone — the Conn doc comment names it as a non-blocking getter. PhonesFor returns a freshly-allocated slice so broadcast fan-out runs lock-free against the snapshot.

Issue

Closes #3.

Testing

internal/relay/registry_test.go (package relay, stdlib testing only):

  • TestClaimServer_FirstWinsSecondConflicts
  • TestReleaseServer_ReclaimSucceeds
  • TestRegisterPhone_RequiresBinary
  • TestUnregisterPhone_RemovesByConnID
  • TestPhonesFor_SnapshotIsolation
  • TestCounts_AcrossLifecycle (asserts orphan phones survive ReleaseServer per spec open-question 1)
  • TestUnregisterPhone_RemovesEmptySlice
  • TestRegistry_RaceFreedom — 32 goroutines × 200 ops hammering the public API

Verified locally:

  • + "go vet ./..." + clean
  • + "go test -race ./..." + clean
  • + "go test -race -count=20 -run TestRegistry_RaceFreedom ./internal/relay" + clean
  • + "go build ./cmd/pyrycode-relay" + succeeds

Architecture compliance

  • Mirrors the doc-comment / sentinel-error idiom from + "envelope.go" + ; tests in package relay per established convention.
  • Stdlib only (sync, errors in production; errors, fmt, sync, testing in tests).
  • UnregisterPhone deletes the map entry when the slice empties (no orphan empty slices observable via Counts / PhonesFor).
  • ReleaseServer is intentionally narrow — does not touch phones, leaving the grace period (relay: 30-second grace period on binary disconnect before releasing server-id #8) free to layer cleanup on top.
  • No edits to doc.go, envelope.go, cmd/pyrycode-relay/main.go, or docs/ — registry has no consumers in this ticket.

🤖 Generated with Claude Code

ilmoniemi and others added 2 commits May 8, 2026 20:08
Adds internal/relay/registry.go with the Conn interface and a thread-safe
Registry mapping server-ids to a single binary connection (1:1) and to a
list of phone connections (1:N). Sentinel errors ErrServerIDConflict and
ErrNoServer let callers branch via errors.Is.

Tests cover the AC: first-claim-wins semantics, reclaim after release,
phone gating on a held server-id, removal-by-ConnID with sibling
isolation, snapshot isolation for PhonesFor, lifecycle Counts, empty-
slice cleanup, and a race-freedom hammer test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ilmoniemi
Copy link
Copy Markdown
Contributor Author

Code Review: #3

Decision: PASS

Findings

None blocking. Implementation matches the architect's spec line-for-line.

Verification

  • go vet ./... — clean
  • go test -race ./internal/relay — clean
  • go test -race -count=20 -run TestRegistry_RaceFreedom ./internal/relay — clean
  • go build ./cmd/pyrycode-relay — clean

What I checked

Go correctness

  • ClaimServer / ReleaseServer / RegisterPhone are atomic check-then-mutate under the write lock — no TOCTOU window for first-claim-wins or ErrNoServer gating.
  • UnregisterPhone's swap-and-truncate (registry.go:135-137) handles the i == len-1 edge case correctly: self-copy, nil the last slot, truncate. GC-clears the freed reference per spec.
  • PhonesFor returns a make+copy snapshot; broadcast fan-out can run lock-free against it. TestPhonesFor_SnapshotIsolation exercises both element overwrite and post-snapshot growth.
  • Counts summed iteration is fine for a health-endpoint caller; not on the hot path.

Concurrency

  • Single sync.RWMutex, no callbacks under the lock. ConnID() is the only Conn method invoked under the lock — documented contract on the interface (registry.go:27-30, 32-36) names it as a non-blocking getter.
  • Send and Close are never invoked from registry methods — verified by reading every method.
  • Race test (32 goroutines × 200 ops on 4 contended server-ids) hammers the public API and passes clean under -count=20.

Idiom & style

  • Sentinel errors with errors.Is, mirroring envelope.go. Doc comments on every exported symbol name the concurrency contract.
  • Stdlib only (sync, errors in production; errors, fmt, sync, testing in tests) — matches go.mod constraint.
  • Tests in package relay per established convention.

Security (security-sensitive label)

  • Architect's spec contains a ## Security review section with verdict PASS, so the design-level pass is on record. I walked the diff against the same checklist:
    • No tokens, secrets, or headers in log lines (no logging in registry at all — appropriate for a passive store).
    • No file I/O, no subprocess calls, no crypto, no network code in this diff.
    • Trust boundaries respected: registry treats serverID as an opaque map key (validation belongs at the WS upgrade layer, per spec). Per-server phone bound deferred to relay: WS upgrade for /v1/client — accept phone connection, look up server-id, register #5 per spec — documented in lifecycle expectations.
    • Lock-order graph is a single node; no deadlock possibility.
    • UnregisterPhone clears the freed slice slot to nil before truncation — no stale Conn reference held in the underlying array.

Test coverage

  • All seven AC cases covered. Race test runnable manually via the documented go test -race -count=20 invocation; no count flag bled into the test code (correct — -count is a CI knob, not a test-source knob).
  • TestCounts_AcrossLifecycle correctly asserts the orphan-phones-survive-release semantic from spec open-question 1.

Summary

Tight, well-documented implementation of a foundational data structure. Lock discipline is conservative and correct. Doc comments make the concurrency contract legible to future consumers (#4-#10). Ready to merge.

ilmoniemi added 2 commits May 8, 2026 20:14
Add feature doc and ADR for the connection registry shipped in #3.
Update INDEX, PROJECT-MEMORY (mark #3 done, add passive-store and
under-lock-getter patterns), and lessons (slice swap GC hygiene,
race-count placement, copy-on-read accessor pattern).
Resolves additive doc conflicts from concurrent dispatch of #3 (registry),
#9 (autocert), and #11 (threat-model):

- docs/PROJECT-MEMORY.md: keep registry, autocert, and threat-model rows;
  drop superseded "TLS / autocert | Not started" placeholder; merge all
  Patterns established bullets.
- docs/knowledge/INDEX.md: keep both new feature bullets; renumber the
  registry ADR link to 0003.
- docs/lessons.md: keep all six new lesson sections (registry + autocert).
- docs/knowledge/decisions/0002-connection-registry-passive-store.md ->
  0003-connection-registry-passive-store.md (autocert ADR landed first
  via #9 / PR #13 and owns 0002).
- docs/knowledge/features/connection-registry.md: update internal ADR
  link to 0003.

No code changes. make vet / make test -race / make build all clean.
@ilmoniemi ilmoniemi merged commit 47098ee into main May 8, 2026
2 checks passed
@ilmoniemi ilmoniemi deleted the feature/3 branch May 8, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

relay: connection registry — server-id → binary + server-id → [phone] thread-safe maps

1 participant