relay: startup self-check refuses to run as multi-instance deploy

## User Story
As an operator who accidentally runs `fly scale count 3` (or any equivalent on another host), I want the relay to refuse to start on any replica past the first with a loud, specific error, so that silent server-id routing breakage is caught before phones connect.

## Context
The connection registry (`internal/relay/registry.go`) is in-memory per process. Two replicas serve disjoint registries: a phone connected to replica A cannot reach a binary connected to replica B. Docker + Fly.io makes `fly scale count 3` a one-line command. Without a runtime guard, an operator or AI agent could scale out and silently break server-id routing for half the connections.

This ticket is the **deterministic backstop** in the belt-and-suspenders pair noted in `docs/PROJECT-MEMORY.md` — the doc half (sibling #64, already merged) handles the stochastic guard.

## Acceptance Criteria
- [ ] At process startup, the relay performs a self-check intended to detect that it is part of a multi-instance deploy.
- [ ] On positive detection, the process refuses to start (non-zero exit) and emits an `ERROR`-level log whose message contains the substring `multi-instance deploy detected` and names the bypass env var `PYRYCODE_RELAY_SINGLE_INSTANCE` so an operator can grep for it.
- [ ] Setting `PYRYCODE_RELAY_SINGLE_INSTANCE=1` causes the self-check to be skipped, and startup proceeds. Bypass is intended for emergencies and migration windows.
- [ ] When no multi-instance signal is present (the common single-instance case), startup proceeds unchanged. Existing single-instance deployments must not regress.
- [ ] Tests cover at minimum: (a) multi-instance signal present → refuses to start with the documented error substring; (b) `PYRYCODE_RELAY_SINGLE_INSTANCE=1` → starts; (c) no multi-instance signal → starts.

## Technical Notes
- The bypass env var name `PYRYCODE_RELAY_SINGLE_INSTANCE` is fixed — sibling #64 has merged and committed that literal name (and the value `1`) into `docs/architecture.md`. Do not rename; if a rename is genuinely needed, raise it as a fresh ticket against `docs/architecture.md` first.
- The parent body (#39) sketched two candidate detection mechanisms: (1) an env-var contract `PYRYCODE_RELAY_ASSERT_SINGLE_INSTANCE=true` set by the deploy manifest, refuse-to-start when absent; (2) host-exposed signals like `FLY_MACHINE_ID` combined with a count assertion. Mechanism (1) would regress existing deployments that don't set the assertion (violates AC #4), and the repo has no `fly.toml` or other host manifest under our control today (the deploy artifact is `Dockerfile` only) — so mechanism (2), or any equivalent local-only host-env signal, is the natural choice. Architect confirms.
- No new runtime dependency on platform-specific SDKs/API clients (e.g. no Fly machines API client). The detection mechanism must be either an env-var contract or a local-only signal (env vars exposed by the host).
- The check belongs in the `cmd/pyrycode-relay/main.go` startup path or a small helper in `internal/relay/`. Architect's call.

## Out of scope
- Implementing shared-registry support (Redis / NATS / sticky-session).
- Health-check integration with platform autoscalers.

## Size Estimate
XS

Split from #39.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

relay: startup self-check refuses to run as multi-instance deploy #65

User Story

Context

Acceptance Criteria

Technical Notes

Out of scope

Size Estimate

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

relay: startup self-check refuses to run as multi-instance deploy #65

Description

User Story

Context

Acceptance Criteria

Technical Notes

Out of scope

Size Estimate

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions