Skip to content

relay: /healthz returns version + connected-binary count + connected-phone count (JSON) #10

@ilmoniemi

Description

@ilmoniemi

User Story

As an operator monitoring the relay, I want /healthz to return JSON with the relay's version and current connection counts, so that uptime checks have a structured signal beyond "200 OK" and dashboards can graph load over time.

Context

The current /healthz handler returns "ok\n" plain-text. This ticket upgrades it to JSON with structured fields. Foundational for any external monitoring (Uptime Kuma, Healthchecks.io, custom Prometheus exporters in future).

The endpoint is unauthenticated by design — public-readable so that off-host probes can hit it without secret distribution. Exposing live connection counts to anonymous callers is an accepted tradeoff (operators value the signal more than the small amount of operational intel it leaks). Per-server-id breakdowns are explicitly out of scope because those would leak identifying information.

Acceptance Criteria

  • GET /healthz returns HTTP 200 with Content-Type: application/json.
  • The response body is a JSON object with exactly these five fields:
    {
      "status": "ok",
      "version": "0.1.0",
      "connected_binaries": 3,
      "connected_phones": 12,
      "uptime_seconds": 4512
    }
  • status is the string "ok" for every response in v1 (no degraded/unhealthy states yet).
  • version matches the relay's build-time version string (the same value --version prints).
  • connected_binaries is the count of currently-claimed binary connections; connected_phones is the total phone connections summed across all server-ids. Both reflect the live registry at request time.
  • uptime_seconds is a non-negative integer measuring seconds since the relay started serving requests.
  • Response body is under 200 bytes for typical counts — the endpoint gets hit often.
  • A test exercises the handler end-to-end: asserts the status code, content-type, and that all five fields are present and well-typed.
  • A test with a populated registry confirms connected_binaries and connected_phones track the registry's actual state.

Technical Notes

  • The Version package var in cmd/pyrycode-relay/main.go already exists and is overridden via -ldflags — reuse it; don't introduce a parallel source of truth.
  • (*relay.Registry).Counts() (added in relay: connection registry — server-id → binary + server-id → [phone] thread-safe maps #3) returns (binaries, phones int) — the natural source for the two count fields. The handler will need a registry handle, which main already constructs.
  • Capture the start time once at main() entry and pass it (alongside the registry) into whatever constructs the handler. Don't read it from a package-level var startTime = time.Now() — that fires at import time, before any flag parsing, which is misleading for short-lived test binaries and for any future scenario where the relay defers serving until after setup work.
  • No authentication on /healthz. No rate-limiting in this ticket either — if probe storms become a problem, that's a separate ticket.

Out of Scope

  • Per-server-id breakdowns (would leak server-ids).
  • Latency histograms, request counters, or other Prometheus-style metrics (separate ticket if/when we want a /metrics endpoint).
  • Alternate status values (degraded, unhealthy) — v1 is binary OK or no-response.
  • Auth or rate-limiting on the endpoint.

Size Estimate

XS — ~30 LOC handler + ~30 LOC tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    security-sensitiveTouches auth, crypto, or internet-exposed input pathssize:xsTiny ticket: <30 lines production code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions