Skip to content

relay: refuse to boot when Linux effective capabilities exceed allowlist #79

@ilmoniemi

Description

@ilmoniemi

User Story

As an operator deploying pyrycode-relay in a Linux container, I want the relay to refuse to start when its effective Linux capabilities exceed an explicit allowlist, so that a container runtime that grants stray capabilities (CAP_SYS_ADMIN, CAP_NET_ADMIN, etc.) fails loudly at boot rather than silently accepting elevated privileges.

Context

Container runtimes can grant capabilities via --cap-add, capability bounding sets, or default Docker profiles. A misconfigured deploy that grants the relay extra capabilities escapes CI build scans and runs with more privilege than intended.

This ticket adds a Linux-only boot-time check that parses the effective capability set from /proc/self/status (the CapEff: line) and aborts if any capability outside an explicit allowlist is present. On non-Linux platforms (darwin dev runs), the check is a no-op with a single startup log line.

Related: #9 (ErrCacheDirInsecure boot-time refusal pattern).

Acceptance Criteria

  • An exported sentinel ErrUnexpectedCapability exists in internal/relay, branchable via errors.Is, alongside an exported allowlist of capabilities the relay legitimately needs. The error message identifies the unexpected capability (bit position and symbolic name where known) and the allowlist contents. Architect determines the allowlist contents by reading current code — at minimum CAP_NET_BIND_SERVICE if the relay binds privileged ports without setcap, otherwise the empty set.
  • On Linux, a check function parses /proc/self/status's CapEff: hex mask and returns ErrUnexpectedCapability if any bit outside the allowlist is set, nil otherwise. On non-Linux platforms the check is a no-op returning nil and logs skipping linux-only capability check on <GOOS> exactly once at startup. The Linux/non-Linux split is handled at compile time (build tags); this ticket is the first introduction of that pattern in the repo, so architect chooses the file naming.
  • The check is wired into cmd/pyrycode-relay/main.go after flag parse, before any listener is started, with fail-fast on error and a structured log line that names the unexpected capability and the operator fix (--cap-drop or equivalent).
  • Unit tests cover: (a) empty CapEff → nil; (b) only-allowlisted CapEff → nil; (c) CapEff with one bit outside allowlist → sentinel; (d) malformed /proc/self/status content → wrapped error (not a panic). The CapEff string is injected via a test seam rather than touching real /proc.
  • Capability check has no production-mode predicate — it runs in every environment (including dev) because stray capabilities are never legitimate. Sibling tickets (relay: refuse to boot with --insecure-listen in production mode #77, relay: refuse to boot as effective uid 0 in production mode #78) introduce the PYRYCODE_RELAY_PRODUCTION=1 contract; this ticket does not consume it.

Technical Notes

  • /proc/self/status's CapEff: line is a 64-bit hex mask. Linux capability bit positions are stable kernel ABI.
  • Whether to also check CapPrm / CapBnd / CapInh is architect's call — CapEff is the minimum.
  • No existing _linux.go / _other.go files in the repo; this ticket establishes the convention.

Size Estimate

S

Split from #42.

Metadata

Metadata

Assignees

No one assigned

    Labels

    security-sensitiveTouches auth, crypto, or internet-exposed input pathssize:sSmall ticket: <100 lines production code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions