You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As an operator monitoring the relay, I want /healthz to return JSON with the relay's version and current connection counts, so that uptime checks have a structured signal beyond "200 OK" and dashboards can graph load over time.
Context
The current /healthz handler returns "ok\n" plain-text. This ticket upgrades it to JSON with structured fields. Foundational for any external monitoring (Uptime Kuma, Healthchecks.io, custom Prometheus exporters in future).
The endpoint is unauthenticated by design — public-readable so that off-host probes can hit it without secret distribution. Exposing live connection counts to anonymous callers is an accepted tradeoff (operators value the signal more than the small amount of operational intel it leaks). Per-server-id breakdowns are explicitly out of scope because those would leak identifying information.
Acceptance Criteria
GET /healthz returns HTTP 200 with Content-Type: application/json.
The response body is a JSON object with exactly these five fields:
status is the string "ok" for every response in v1 (no degraded/unhealthy states yet).
version matches the relay's build-time version string (the same value --version prints).
connected_binaries is the count of currently-claimed binary connections; connected_phones is the total phone connections summed across all server-ids. Both reflect the live registry at request time.
uptime_seconds is a non-negative integer measuring seconds since the relay started serving requests.
Response body is under 200 bytes for typical counts — the endpoint gets hit often.
A test exercises the handler end-to-end: asserts the status code, content-type, and that all five fields are present and well-typed.
A test with a populated registry confirms connected_binaries and connected_phones track the registry's actual state.
Technical Notes
The Version package var in cmd/pyrycode-relay/main.go already exists and is overridden via -ldflags — reuse it; don't introduce a parallel source of truth.
Capture the start time once at main() entry and pass it (alongside the registry) into whatever constructs the handler. Don't read it from a package-level var startTime = time.Now() — that fires at import time, before any flag parsing, which is misleading for short-lived test binaries and for any future scenario where the relay defers serving until after setup work.
No authentication on /healthz. No rate-limiting in this ticket either — if probe storms become a problem, that's a separate ticket.
Out of Scope
Per-server-id breakdowns (would leak server-ids).
Latency histograms, request counters, or other Prometheus-style metrics (separate ticket if/when we want a /metrics endpoint).
Alternate status values (degraded, unhealthy) — v1 is binary OK or no-response.
User Story
As an operator monitoring the relay, I want
/healthzto return JSON with the relay's version and current connection counts, so that uptime checks have a structured signal beyond "200 OK" and dashboards can graph load over time.Context
The current
/healthzhandler returns"ok\n"plain-text. This ticket upgrades it to JSON with structured fields. Foundational for any external monitoring (Uptime Kuma, Healthchecks.io, custom Prometheus exporters in future).The endpoint is unauthenticated by design — public-readable so that off-host probes can hit it without secret distribution. Exposing live connection counts to anonymous callers is an accepted tradeoff (operators value the signal more than the small amount of operational intel it leaks). Per-server-id breakdowns are explicitly out of scope because those would leak identifying information.
Acceptance Criteria
GET /healthzreturns HTTP 200 withContent-Type: application/json.{ "status": "ok", "version": "0.1.0", "connected_binaries": 3, "connected_phones": 12, "uptime_seconds": 4512 }statusis the string"ok"for every response in v1 (no degraded/unhealthy states yet).versionmatches the relay's build-time version string (the same value--versionprints).connected_binariesis the count of currently-claimed binary connections;connected_phonesis the total phone connections summed across all server-ids. Both reflect the live registry at request time.uptime_secondsis a non-negative integer measuring seconds since the relay started serving requests.connected_binariesandconnected_phonestrack the registry's actual state.Technical Notes
Versionpackage var incmd/pyrycode-relay/main.goalready exists and is overridden via-ldflags— reuse it; don't introduce a parallel source of truth.(*relay.Registry).Counts()(added in relay: connection registry — server-id → binary + server-id → [phone] thread-safe maps #3) returns(binaries, phones int)— the natural source for the two count fields. The handler will need a registry handle, whichmainalready constructs.main()entry and pass it (alongside the registry) into whatever constructs the handler. Don't read it from a package-levelvar startTime = time.Now()— that fires at import time, before any flag parsing, which is misleading for short-lived test binaries and for any future scenario where the relay defers serving until after setup work./healthz. No rate-limiting in this ticket either — if probe storms become a problem, that's a separate ticket.Out of Scope
/metricsendpoint).statusvalues (degraded,unhealthy) — v1 is binary OK or no-response.Size Estimate
XS — ~30 LOC handler + ~30 LOC tests.