Skip to content

feat(relay): validate required env vars at boot (#80)#84

Merged
ilmoniemi merged 3 commits into
mainfrom
feature/80
May 13, 2026
Merged

feat(relay): validate required env vars at boot (#80)#84
ilmoniemi merged 3 commits into
mainfrom
feature/80

Conversation

@ilmoniemi
Copy link
Copy Markdown
Contributor

What

Adds a boot-time, structured env-var validator that runs over an explicit contract table and aborts with a per-key reason on the first missing/malformed value.

  • New internal/relay/env_config.go:
    • ErrInvalidConfigSentinel package-level sentinel (branchable via errors.Is).
    • *ErrInvalidConfig{Key, Reason} struct error with a custom Is so errors.Is(err, ErrInvalidConfigSentinel) returns true without double-printing the prefix; field extraction via errors.As.
    • envContracts registry — single source of truth for "what env vars the relay reads". Today: one row, PYRYCODE_RELAY_PRODUCTION (optional, exact-"1" or-unset).
    • CheckEnvConfig(lookup func(string) (string, bool)) erroros.LookupEnv shape so missing-but-required is distinguishable from present-but-empty.
    • Package-private checkEnvConfigWith seam so tests can exercise the required: true branch without a production registry entry.
  • cmd/pyrycode-relay/main.go: wired between the --domain/--insecure-listen flag guard and CheckInsecureListenInProduction. Logs env_var + reason via errors.As, exits 2 on failure. Ordering is load-bearing — running this before CheckInsecureListenInProduction prevents IsProductionMode's silent-non-production fallback from being reached with an unvalidated env. No value field in the log line (per security review's forward-looking caveat).

Issue

Closes #80. Architecture: docs/specs/architecture/80-validate-env-vars-at-boot.md.

Testing

internal/relay/env_config_test.go:

  • TestCheckEnvConfig_ValidEnvReturnsNil — unset and "1" both pass.
  • TestCheckEnvConfig_MalformedValueReturnsStructuredError — covers "true", "0", "yes", " 1", "1 ", "PRODUCTION", ""; asserts errors.Is sentinel match, errors.As extraction, Key and Reason shape (mirrors IsProductionMode value matrix so a future loosening of the "1"-exact rule breaks both at once).
  • TestCheckEnvConfig_MissingRequiredKey — exercises the required: true branch via checkEnvConfigWith with a synthetic contract table (no production row is required today).
  • TestCheckEnvConfig_IgnoresUnregisteredKeys — locks in "validator polices its declared contract, not arbitrary process env".
  • TestErrInvalidConfigSentinel_IsBranchableerrors.Is + errors.As dual contract.

Verification:

go vet ./...           # clean
go test -race ./...    # ok
go build ./cmd/pyrycode-relay

All tests are t.Parallel() and never touch process env (injected fakeLookup).

Architecture compliance

  • Type name ErrInvalidConfig per the AC's literal wording; sentinel named ErrInvalidConfigSentinel to distinguish.
  • Inline per-key validators in the contract table (no separate package), per the ticket's Technical Notes.
  • os.LookupEnv shape (func(string) (string, bool)) per the AC — required to distinguish missing-but-required from present-but-empty. IsProductionMode keeps its func(string) string getter; the two seams coexist.
  • Wiring order verified in cmd/pyrycode-relay/main.go: env-config check → CheckInsecureListenInProductionCheckCapabilities.
  • Security-review findings honoured: no raw env-var value in the log fields (only err, env_var, reason); reason string for PYRYCODE_RELAY_PRODUCTION is safe to echo because the legitimate value is the literal "1".

🤖 Generated with Claude Code

ilmoniemi and others added 2 commits May 13, 2026 09:24
Introduce CheckEnvConfig + envContracts registry and wire into main.go
ahead of CheckInsecureListenInProduction so that a typo like
PYRYCODE_RELAY_PRODUCTION=true is caught before IsProductionMode's
silent-non-production fallback can be reached.

ErrInvalidConfig carries the offending Key and Reason; branchable via
ErrInvalidConfigSentinel; matched via errors.As for structured logging.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ilmoniemi
Copy link
Copy Markdown
Contributor Author

Code Review: #80

Decision: PASS

Findings

  • [SHOULD FIX] internal/relay/env_config.go:41-42 — The envContract.validate doc comment claims "validate receives the raw value when the var is present and non-empty; the registry walk handles the unset and present-but-empty cases." The implementation in checkEnvConfigWith (lines 78-92) does not separate the present-but-empty case — when present == true, it calls c.validate(val) directly even if val == "". This is intentional (the malformed-value test deliberately covers value: "" for PYRYCODE_RELAY_PRODUCTION), but the doc comment contradicts it. Update the comment to describe the actual contract (validate receives the raw value whenever the var is present, including empty strings; the registry walk only handles the unset case). The spec's § Design item 3 has the same outdated wording — worth a follow-up note when archiving the spec.
  • [NIT] cmd/pyrycode-relay/main.go:50 — comment references docs/specs/architecture/80-validate-env-vars-at-boot.md, but the file lives at docs/specs/architecture/80-env-config-validator.md. Stale path.

Summary

All five ACs are satisfied. ErrInvalidConfig + ErrInvalidConfigSentinel give the required errors.Is / errors.As dual contract. The contract table is the single source of truth — verified by grep that cmd/ and internal/ have no other os.Getenv / os.LookupEnv reads beyond the two registered sites. Wiring order is correct: CheckEnvConfig runs before CheckInsecureListenInProduction, so a typo'd PYRYCODE_RELAY_PRODUCTION=true aborts at exit 2 instead of falling through IsProductionMode's silent-non-production fallback. Tests cover all four AC branches plus the sentinel-branchability case; t.Parallel() is applied; no os.Setenv / t.Setenv. Security review section in the spec passes the architect's checklist; the log fields (err, env_var, reason) match the architect's "no value field" rule. go vet and go test -race ./... are green. The two findings above are doc-only and do not block merge.

Add feature doc and per-ticket codebase note for the boot-time env-var
validator. Cross-link from production-mode.md so readers know the
malformed-value cases listed there are now caught by CheckEnvConfig
before IsProductionMode is consulted. Update INDEX.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ilmoniemi ilmoniemi merged commit ab3b6ed into main May 13, 2026
3 checks passed
@ilmoniemi ilmoniemi deleted the feature/80 branch May 13, 2026 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

relay: validate required env vars at boot, fail-fast on missing/malformed

1 participant