Skip to content

feat(relay): refuse to boot as effective uid 0 in production mode (#78)#85

Merged
ilmoniemi merged 3 commits into
mainfrom
feature/78
May 13, 2026
Merged

feat(relay): refuse to boot as effective uid 0 in production mode (#78)#85
ilmoniemi merged 3 commits into
mainfrom
feature/78

Conversation

@ilmoniemi
Copy link
Copy Markdown
Contributor

What

Adds a deterministic in-process backstop for the CI non-root-build contract: when PYRYCODE_RELAY_PRODUCTION=1 AND syscall.Geteuid() == 0, the relay refuses to start with exit 2 before any listener is opened. A docker run --user 0, securityContext: { runAsUser: 0 }, or a missing/overridden USER Dockerfile directive at deploy time would otherwise silently run the internet-facing process as root.

Three pieces, all minimal:

  • internal/relay/production.go — new exported sentinel ErrRunningAsRoot and check function CheckRunningAsRoot(geteuid func() int, getenv func(string) string) error, sibling to ErrInsecureListenInProduction / CheckInsecureListenInProduction from relay: refuse to boot with --insecure-listen in production mode #77. Reuses IsProductionMode; no second read of PYRYCODE_RELAY_PRODUCTION.
  • internal/relay/production_test.go — table-driven matrix (non-prod+uid 0 → nil; prod+uid 1000 → nil; prod+uid 0 → sentinel; prod+nobody-uid 65534 → nil) plus a TestErrRunningAsRoot_IsBranchable paralleling the sibling. fakeGeteuid mirrors fakeGetenv.
  • cmd/pyrycode-relay/main.go — wired in immediately after CheckInsecureListenInProduction and before CheckCapabilities, with structured log fields err, env_var=PYRYCODE_RELAY_PRODUCTION, effective_uid=syscall.Geteuid(), and a fix hint naming the two valid resolutions. Exit code 2 matches the sibling block.

Issue

Closes #78.

Testing

  • go test -race ./... — all green (full suite).
  • go vet ./... — clean.
  • go build ./cmd/pyrycode-relay — builds.
  • New tests: TestCheckRunningAsRoot_Matrix (4 rows) and TestErrRunningAsRoot_IsBranchable. Tests use the injected func() int seam — no re-exec as root, no t.Setenv, t.Parallel()-safe.

Architecture compliance

Follows docs/specs/architecture/78-refuse-boot-as-root-in-production.md verbatim:

  • Sentinel name (ErrRunningAsRoot), check signature (CheckRunningAsRoot(geteuid, getenv)), uid source (injected func() int), and exit code (2) — all as specified.
  • Reuses IsProductionMode from relay: refuse to boot with --insecure-listen in production mode #77 — no duplicate env-var read.
  • Wiring order CheckEnvConfigCheckInsecureListenInProductionCheckRunningAsRootCheckCapabilities preserved (security review § Adversarial framing).
  • Log fields cover the AC's effective_uid + env_var=PYRYCODE_RELAY_PRODUCTION requirement; env_var carries the name, not the value (per relay: refuse to boot with --insecure-listen in production mode #77 precedent and the security review's "Error messages, logs, telemetry" guidance — keeps unvalidated operator strings out of centralised logs).
  • No new dependencies, no panic, no commented-out code. syscall.Geteuid works on linux and darwin without build tags.

🤖 Generated with Claude Code

ilmoniemi and others added 2 commits May 13, 2026 09:39
Adds a deterministic in-process backstop for the CI non-root-build
contract: when PYRYCODE_RELAY_PRODUCTION=1 AND syscall.Geteuid() == 0,
the relay refuses to start with exit 2 before any listener is opened.
A `docker run --user 0` or a missing/overridden USER directive at deploy
time would otherwise silently run the internet-facing process as root.

Mirrors the sibling pattern from #77: exported sentinel ErrRunningAsRoot,
CheckRunningAsRoot(geteuid, getenv) with injected seams for tests, and
structured log fields (effective_uid, env_var, fix) at the call site.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ilmoniemi
Copy link
Copy Markdown
Contributor Author

Code Review: #78

Decision: PASS

Findings

None.

Summary

Implementation tracks the architect's spec verbatim:

  • ErrRunningAsRoot sentinel and CheckRunningAsRoot(geteuid, getenv) added in internal/relay/production.go alongside the relay: refuse to boot with --insecure-listen in production mode #77 siblings, same style (single sentinel, no wrapping, errors.Is-branchable). Godoc comments name the function that returns the sentinel and justify fail-fast (production.go:48-77).
  • Production-mode read is delegated to IsProductionMode — no duplicate env read.
  • Wiring in cmd/pyrycode-relay/main.go:73-87 lands in the exact slot specified (after CheckInsecureListenInProduction, before CheckCapabilities), with the wiring-order comment pointing back to the spec. Structured log includes err, env_var, effective_uid, and fix; exits with status 2 matching the sibling block's convention.
  • Tests in production_test.go:125-198 cover the 4-row matrix from the spec (non-prod+uid0→nil, prod+1000→nil, prod+0→sentinel, prod+65534→nil) plus the errors.Is branchability test paralleling relay: refuse to boot with --insecure-listen in production mode #77. All t.Parallel()-safe, using the injected fakeGeteuid / fakeGetenv seams — no process-uid mutation, no re-exec.

Security review (label-gated)

Spec contains a complete ## Security review section with verdict PASS and a structured findings list (trust boundaries, secrets, file ops, subprocess, crypto, network/I/O, logs, concurrency, threat-model alignment, adversarial framing). Architect's required pass is present.

Security goggles on the diff itself:

  • No tokens, secrets, or env-var values in logs — env_var field is the name string only; effective_uid is a kernel-supplied int (log-injection structurally impossible).
  • No file ops, no subprocess, no crypto, no new listeners.
  • The check runs before any listener is opened, as required.
  • No // #nosec annotations, no shell-out, no math/rand.

Local go test -race ./internal/relay/... is green. CI is pending; trusting the test/security/image-scan jobs to land green before merge.

The double syscall.Geteuid() call (once in the check, once in the log site at main.go:84) is intentional per spec — the process cannot mutate its own euid between adjacent syscalls and the cost is negligible. Not a finding.

Folds the second consumer of the PYRYCODE_RELAY_PRODUCTION contract into
the production-mode feature doc (CheckRunningAsRoot + ErrRunningAsRoot),
adds per-ticket codebase note for #78, and refreshes the INDEX entry.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ilmoniemi ilmoniemi merged commit 15f552c into main May 13, 2026
4 checks passed
@ilmoniemi ilmoniemi deleted the feature/78 branch May 13, 2026 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

relay: refuse to boot as effective uid 0 in production mode

1 participant