feat(relay): refuse to boot on multi-instance deploy (#65)#93
Conversation
CheckSingleInstance is the deterministic backstop to the docs/architecture.md § Single-instance constraint prose half (#64): when FLY_APP_NAME is non-empty (Fly platform signal) and PYRYCODE_RELAY_SINGLE_INSTANCE is not exactly "1", the relay refuses to start with a structured ERROR log and exit code 2. The in-memory connection registry is per-process; silent fly scale count > 1 would split server-id routing across replicas. Bypass env var registered in envContracts so a typo like =true fails CheckEnvConfig first. Wiring: between CheckEnvConfig and CheckInsecureListenInProduction in cmd/pyrycode-relay/main.go. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Code Review: #65Decision: FAIL Findings
SummaryImplementation matches the spec faithfully: decision order, sentinel error contents, env-config registry row, wiring placement between |
`envSingleInstanceBypass` holds the literal env-var name string `PYRYCODE_RELAY_SINGLE_INSTANCE`, not a credential. gosec G101 flags it at LOW confidence; suppress with an inline `#nosec` annotation naming the rule and the reason so the suppression scope is narrow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Code Review: #65Decision: PASS FindingsNone. SummaryClean implementation that matches the architect's spec exactly:
Security review (security-sensitive label)Architect's
CI
|
- new feature doc: single-instance startup self-check (contract, wiring sequence, behaviour matrix, scope/limits, threat model) - new per-ticket note codebase/65.md (implementation, patterns, lessons learned including the gosec G101 false-positive) - env-config-validator.md updated: second row in the registry, ordering rationale generalised to all downstream Check* helpers, behaviour matrix now covers both env vars - production-mode.md wiring sequence: CheckSingleInstance inserted between CheckEnvConfig and CheckInsecureListenInProduction; the Config-struct deferral note bumped to five startup checks - INDEX.md: one-line summary at the top of Features
What
Adds
CheckSingleInstance: a boot-time refusal that fires when the relay detects it is running on a multi-instance-capable platform (today: Fly.io, signalled by a non-emptyFLY_APP_NAME) AND the operator has not asserted single-instance intent viaPYRYCODE_RELAY_SINGLE_INSTANCE=1. On positive detection the relay logs an ERROR containing the substringmulti-instance deploy detectedand the literal bypass env-var name, then exits 2.This is the deterministic backstop half of the belt-and-suspenders pair noted in
docs/PROJECT-MEMORY.md. The doc half (#64) shipped earlier — prose indocs/architecture.md§ Single-instance constraint that pins the literal env-var name and value used here.Implementation follows the spec at
docs/specs/architecture/65-startup-multi-instance-check.mdexactly:internal/relay/single_instance.go— new helper, signature mirrorsCheckInsecureListenInProduction. ExportsErrMultiInstanceDeployDetected(branchable viaerrors.Is).internal/relay/env_config.go— one new row inenvContractsfor the bypass env var, so a typo likePYRYCODE_RELAY_SINGLE_INSTANCE=truefailsCheckEnvConfigwith the structured per-key error beforeCheckSingleInstancereads the value.cmd/pyrycode-relay/main.go— wired betweenCheckEnvConfigandCheckInsecureListenInProduction. Rationale per spec § Wiring order: deploy-shape misconfiguration is more fundamental than production-mode flag misconfiguration; typo'd bypass surfaces as a structured config error first.Issue
Closes #65.
Testing
internal/relay/single_instance_test.go— table-driven matrix covering AC #5(a/b/c) plus guard-rails for the bypass-set-no-signal no-op case and the non-exact-"1" bypass values ("true","0"," 1"). Separate assertions lock the error message contents to the AC #2 substrings (multi-instance deploy detectedandPYRYCODE_RELAY_SINGLE_INSTANCE) and verifyerrors.Isbranchability.internal/relay/env_config_test.go— addsTestCheckEnvConfig_SingleInstanceBypassMalformedValueto lock the new envContracts row.Architecture compliance
CheckRunningAsRoot/CheckInsecureListenInProduction(injected-getenvseam).internal/relay(envSingleInstanceBypass), not duplicated at the call site, matching theenvProductionModeconvention.CheckInsecureListenInProductionblock (logger.Error+"refusing to start: …"+ structured fields +return 2).