Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions .github/workflows/security-scan.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
name: security-scan

# Re-runs the same two security scanners that gate PRs (ci.yml's
# `image-scan` job from #68 and `govulncheck` from `security` in #41)
# against the latest `main` SHA on a daily schedule. The goal is to
# surface CVEs disclosed against deps that have NOT changed since the
# last PR — without a periodic re-scan these are invisible until the
# next time someone bumps the affected dep. A failing run produces a
# red row in the Actions list; auto-issue filing on top of that signal
# lands in #73.
on:
schedule:
# 06:00 UTC daily. Off-peak for US/EU/JP and keeps the
# disclosure-to-detection window under ~24h. See § Schedule
# cadence in 72-periodic-security-scan.md for rationale.
- cron: '0 6 * * *'
workflow_dispatch:

# `contents: read` is the only permission needed: checkout + read-only
# scans. The `issues: write` widening that #73 needs lives in #73,
# scoped to its single issue-filing step, NOT here.
permissions:
contents: read

jobs:
image-scan:
runs-on: ubuntu-latest
# Belt-and-suspenders mirror of ci.yml's image-scan job: even
# though the workflow-level block already grants only
# `contents: read`, a future top-level escalation must not
# silently widen this job. Same convention as ci.yml.
permissions:
contents: read
steps:
# Explicit `ref: main` so a future change to the default-branch
# name or a manual `workflow_dispatch` invocation from a feature
# branch still scans `main` (the ticket's AC #2 names `main`).
- uses: actions/checkout@v6
with:
ref: main
- name: build image
# Same artifact construction as ci.yml's image-scan job.
run: docker build -t pyrycode-relay:${{ github.sha }} .
# Tracks: aquasecurity/trivy-action@v0.36.0
# Pinned by commit SHA so a tag-swap upstream cannot change
# what runs against our image between Renovate bumps. Mirror
# of ci.yml's image-scan pin — must move in lockstep with that
# file when Renovate bumps the action.
- name: trivy image scan
uses: aquasecurity/trivy-action@ed142fd0673e97e23eac54620cfb913e5ce36c25
with:
image-ref: pyrycode-relay:${{ github.sha }}
format: table
severity: CRITICAL,HIGH
ignore-unfixed: true
vuln-type: os,library
exit-code: '1'

govulncheck:
runs-on: ubuntu-latest
permissions:
contents: read
steps:
- uses: actions/checkout@v6
with:
ref: main
- uses: actions/setup-go@v6
with:
go-version: '1.26.x'
# Tracks: golang.org/x/vuln/cmd/govulncheck@v1.1.4
# Mirror of ci.yml's `security` job pin — must move in lockstep
# with that file when Renovate bumps the tag. Same rationale
# for a semver tag vs commit SHA documented in #41's spec
# (Go module checksum-DB makes a tag effectively immutable for
# `go install`).
- name: install govulncheck
run: go install golang.org/x/vuln/cmd/govulncheck@v1.1.4
- name: govulncheck
run: govulncheck ./...
2 changes: 1 addition & 1 deletion docs/knowledge/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ One-line pointers into the evergreen knowledge base. Newest entries at the top o

- [Connection-count gauges](features/connection-count-gauges.md) — `pyrycode_relay_connected_binaries` and `pyrycode_relay_connected_phones` exposed via a pull-based `prometheus.Collector` reading `Registry.Counts()` on each scrape; zero edits to `registry.go`; scalar (no labels) by design — `{server="..."}` would carry the attacker-influenced `x-pyrycode-server` header onto the metrics surface, which threat-model § Log hygiene forbids; stale grace-expiry fires can't move the gauge because the pointer-identity guard (ADR-0006) keeps the maps unchanged and the gauge IS the map size; race-tested against 16 mutator goroutines + a tight-loop scraper under `-race`. First collector wired into the #59 seam (#61).
- [Metrics registry (scaffolding)](features/metrics-registry.md) — private `*prometheus.Registry` + `NewMetricsHandler` factory wrapping `promhttp.HandlerFor` (text format only; OpenMetrics off; `HandlerOpts.Registry: reg` keeps `promhttp_metric_handler_*` off `DefaultRegisterer`). Seam shape for siblings: per-concern collector struct in its own file, constructed by a helper taking `prometheus.Registerer` (no mega-struct, no package-level vars) — first instantiated by #61's `connectionsCollector`. Listener still pending (#60). Structural defence against default-registry leaks via `TestMetricsRegistry_NoGlobalRegistrarLeak` (#59).
- [Docker image](features/docker-image.md) — portable OCI artifact: multi-stage `Dockerfile` builds a fully-static binary (`CGO_ENABLED=0`, `-trimpath -s -w`) into `distroless/static-debian12:nonroot`; both base images digest-pinned with `# Tracks:` comments; exposes `:80`/`:443` and declares `/var/lib/relay/autocert` volume; host-specific wiring (TLS policy, ports, volumes, healthcheck) is #38's problem (#32). PR-time Trivy CVE scan against the just-built image lives in CI as the `image-scan` job, fails on **fixable** CRITICAL/HIGH only (`ignore-unfixed: true`), action pinned by commit SHA with `# Tracks: <tag>` comment mirroring the Dockerfile pin convention; intentional overlap with `govulncheck` (source-reachability vs. shipped-artifact) (#68).
- [Docker image](features/docker-image.md) — portable OCI artifact: multi-stage `Dockerfile` builds a fully-static binary (`CGO_ENABLED=0`, `-trimpath -s -w`) into `distroless/static-debian12:nonroot`; both base images digest-pinned with `# Tracks:` comments; exposes `:80`/`:443` and declares `/var/lib/relay/autocert` volume; host-specific wiring (TLS policy, ports, volumes, healthcheck) is #38's problem (#32). PR-time Trivy CVE scan against the just-built image lives in CI as the `image-scan` job, fails on **fixable** CRITICAL/HIGH only (`ignore-unfixed: true`), action pinned by commit SHA with `# Tracks: <tag>` comment mirroring the Dockerfile pin convention; intentional overlap with `govulncheck` (source-reachability vs. shipped-artifact) (#68). Both scanners are also re-run daily against `main` via `.github/workflows/security-scan.yml` (cron + `workflow_dispatch`) so disclosed CVEs against unchanged deps surface within ≤24h rather than staying invisible until the next bump (#72).
- [Binary-side frame forwarder](features/binary-forwarder.md) — per-binary read pump: unwraps each inbound routing envelope, linear-scans `PhonesFor(serverID)` for `env.ConnID`, writes `env.Frame` verbatim to that phone; opaque inner bytes; synchronous (handler discards the return); diverges from #25 in error policy — unknown `conn_id`, malformed envelope, phone `Send` error all log+continue (a single bad frame never tears down the binary); replaced `/v1/server`'s `CloseRead` placeholder (#26).
- [WebSocket heartbeat](features/heartbeat.md) — per-conn goroutine on both endpoints sends RFC 6455 ping every 30s; closes with `1011 "heartbeat timeout"` if no pong within 30s. Detects half-open TCP within 60s; ctx-cancel exit path leaves close to the handler defer (#7).
- [Phone-side frame forwarder](features/phone-forwarder.md) — per-phone read pump: wraps each inbound phone frame in the routing envelope keyed by the phone's `conn_id` and `Send`s it to the binary holding `serverID`; opaque inner bytes; synchronous (handler discards the return); replaced `/v1/client`'s `CloseRead` placeholder; added `WSConn.Read` (single-caller) (#25).
Expand Down
54 changes: 54 additions & 0 deletions docs/knowledge/codebase/72.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Ticket #72 — periodic security-scan cron workflow

Closes the post-merge invisibility window for CVEs disclosed against dependencies the relay has not touched since the last PR. Adds `.github/workflows/security-scan.yml`, a daily scheduled workflow that re-runs both PR-time scanners — the Trivy runtime-image scan from `ci.yml`'s `image-scan` job (#68) and `govulncheck ./...` from `ci.yml`'s `security` job (#41) — against the latest `main` SHA. A regression now surfaces as a red row in the Actions list within ≤24h of disclosure; auto-issue filing on top of that signal lands in #73.

## Implementation

- **New file `.github/workflows/security-scan.yml`** — single workflow with two parallel jobs (`image-scan`, `govulncheck`) on independent runners. No edits to `ci.yml`; no reusable workflow / composite action (AC #5: clarity over DRY at v1, ~25 duplicated lines per scanner is below the abstraction threshold).
- **Schedule.** `cron: '0 6 * * *'` (06:00 UTC daily) + `workflow_dispatch:` for manual re-runs. Daily, not weekly: CVE disclosure cadence is continuous and weekly would create a six-day invisibility window. 06:00 UTC is off-peak across US/EU/JP so the cron doesn't contend with PR-time CI.
- **`permissions: contents: read` at the workflow level AND on each job.** Belt-and-suspenders mirror of `ci.yml`'s convention — a future top-level escalation (e.g. someone wiring an `id-token: write` step in some other job) must not silently widen these jobs. The `issues: write` widening that #73 will need stays in #73, scoped to its single step.
- **Both jobs `actions/checkout@v6` with explicit `ref: main`.** A manual `workflow_dispatch` from a feature branch still scans `main`, and a future default-branch rename can't silently re-target the scan.
- **`image-scan` job.** `docker build -t pyrycode-relay:${{ github.sha }} .` then `aquasecurity/trivy-action@ed142fd0673e97e23eac54620cfb913e5ce36c25` (commit-SHA pinned, tracks `v0.36.0`) with the same inputs as `ci.yml`: `severity: CRITICAL,HIGH`, `ignore-unfixed: true`, `vuln-type: os,library`, `exit-code: '1'`, `format: table`. Policy must be identical to PR-time or "vulnerable" means different things on different triggers.
- **`govulncheck` job.** `actions/setup-go@v6` (go 1.26.x) → `go install golang.org/x/vuln/cmd/govulncheck@v1.1.4` → `govulncheck ./...`. Same semver pin as `ci.yml`'s `security` job.
- **Lockstep pin convention.** Both pin lines carry `# Tracks: <upstream>@<tag>` comments naming the source-of-truth in `ci.yml`. When Renovate bumps either pin in `ci.yml`, the corresponding line in `security-scan.yml` MUST move in the same PR; the comments are the structural smell-detector for drift.
- **`gosec` is NOT ported.** The PR-time `security` job runs both `gosec` and `govulncheck`; only the latter audits disclosed CVEs against unchanged code. `gosec` is a static analyser whose findings depend on the source tree, not on the vuln DB — a periodic re-run against unchanged `main` produces identical output every day. Re-running it on cron would be pure noise.

## Patterns established

- **Periodic re-scan of any merge-gating supply-chain scanner.** When a scanner gates PRs on disclosed-CVEs-vs-unchanged-deps, it also belongs on a periodic schedule — PR-time only covers the PR moment, and the dep set sits stable for weeks at a time. The cron workflow is a sibling to the PR-time workflow, not a replacement; same scanner, same flags, same pins, different trigger. Static analysers (`gosec`, linters) do NOT belong on this schedule; their output is a function of code, not a function of time.
- **Parallel jobs per audited surface, not one combined job.** `image-scan` (artifact-side, OS packages + Go-binary content) and `govulncheck` (source-side, reachable Go modules) get separate jobs so the Actions-list row identifies which surface flagged. Mirrors the structure already in `ci.yml`. Same shape future surfaces (SBOM, Cosign verification) should follow.
- **Lockstep convention for duplicated pins across workflows.** When the same third-party reference is pinned in two workflow files for clarity-over-DRY reasons, each pin carries a `# Tracks: …` comment naming the sibling-of-truth in plain English. A Renovate bump that touches one file but not the other is then a one-token diff with a co-signal (the comment doesn't match the run line), reviewable at PR time. Programmatic lint (one-line `grep | sort -u | wc -l` equality) is the deterministic backstop, deferred per evidence-based-fix-selection until silent drift is actually observed.
- **Cron + `workflow_dispatch:` for any periodic CI workflow.** The dispatch trigger is what makes manual re-runs and pre-merge dry-runs (`gh workflow run --ref <branch>`) possible without editing the trigger surface. Free permission-wise (no new tokens, no new scopes), so include it by default.

## Verification

AC #6 explicitly scopes verification to the green path; the vulnerable-state path is exercised by #73.

1. **Pre-merge dry run on the feature branch.** `gh workflow run security-scan.yml --ref feature/72` — `workflow_dispatch` works on any branch containing the file. Both jobs run green; step logs match what the PR-time `image-scan` and `security` jobs produced on the most recent merged PR (modulo vuln-DB timestamps).
2. **Post-merge manual invocation against `main`.** After this PR ships, the Actions tab → `security-scan` workflow → "Run workflow" on `main` completes both jobs green.

Neither verification step requires a code change to `main`.

## Out of scope (delegated)

- **Auto-issue filing on a red run.** Lives in #73 (depends on this ticket). The `issues: write` permission widening lives there, scoped to that one step.
- **Slack / email notification.** Auto-filed GitHub issue (in #73) is the notification mechanism.
- **SARIF upload to GitHub's Security tab.** Would require `security-events: write`, widens the minimum-permissions posture; deferred to its own ticket if/when wanted.
- **Per-release-branch scans.** Only `main` is scanned; the relay has no long-lived release branches yet.
- **Lockstep-drift lint.** Deterministic safety net for the pin-sync convention. Deferred per evidence-based-fix-selection — no observed drift yet.
- **Reusable workflow / composite action.** Considered and rejected at v1: AC #5 specifies clarity over DRY, parameter surface would be empty (callers pass identical inputs), Renovate handles bumps across two files fine. Re-evaluate if a third caller appears or parameter surfaces diverge.

## Lessons learned

- **A scanner that gates merges needs a periodic complement, or it has a built-in invisibility window.** PR-time scanning means "what's vulnerable as of this PR's diff"; the dep set sits stable between PRs while the vuln DB updates continuously. Without a periodic re-scan the gap between disclosure and the next dependency bump is silent. Same logic should apply to any future supply-chain gate (Cosign verification, SBOM diff, etc.) — schedule the same scanner against `main` on a cadence shorter than the typical inter-bump interval.
- **Pick cadence against disclosure rate, not against runner cost.** Daily costs ~6 free-tier minutes per day; weekly would save five days of runner time at the cost of a six-day disclosure-to-detection window. The runner cost is rounding error; the detection window is the operational risk. When the variable being optimized is "time-to-signal," err shorter unless a concrete cost pressure says otherwise.
- **Duplicate two workflow files rather than abstract prematurely.** A reusable workflow with an empty parameter surface is pure indirection; the abstraction earns its keep only once parameter surfaces diverge or a third caller appears. The `# Tracks:` lockstep comment is the cheap defence against drift in the meantime — a one-line comment beats a third workflow file. Same pattern as the Dockerfile digest / Trivy SHA / `govulncheck` semver pins: pick the cheapest mechanism that makes drift visible at review time, escalate to a lint only on observed evidence.
- **Don't reflexively include every PR-time check in the periodic mirror.** `gosec` lives in the PR-time `security` job and was the natural candidate to port; it doesn't belong on cron because its output is a function of the source tree, not the vuln DB. The mirror-only-disclosed-CVE-scanners rule is load-bearing: porting `gosec` would have produced a daily green job that adds zero signal and crowds the Actions list. Read each PR-time check's failure mode before mirroring it.

## Cross-links

- [Architect spec](../../specs/architecture/72-periodic-security-scan.md) — full design rationale, alternatives considered, security review.
- [Ticket #68 codebase notes](68.md) — PR-time Trivy image scan, source of the `image-scan` job this workflow mirrors.
- [Ticket #41 codebase notes](41.md) — `govulncheck` semver pin, source of the `govulncheck` job this workflow mirrors.
- [Feature: Docker image](../features/docker-image.md) § *CI image scanning* — the runtime-image surface this cron covers post-merge.
- [Threat model](../../threat-model.md) — § *Supply chain — Go dependencies*; periodic re-scan closes the post-merge invisibility window the residual-risk paragraph names.
4 changes: 3 additions & 1 deletion docs/knowledge/features/docker-image.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,9 @@ The image layer contributes hardening to three operational surfaces (see `docs/t

## CI image scanning

PR-time scanning is wired in `.github/workflows/ci.yml` as the `image-scan` job (#68). Each PR builds the image locally as `pyrycode-relay:${{ github.sha }}` and runs `aquasecurity/trivy-action` (commit-SHA pinned, `# Tracks: <upstream-tag>` comment alongside — same convention as the Dockerfile base-image digest pins) against it. The job fails on **fixable** CRITICAL/HIGH CVEs only (`ignore-unfixed: true`); unfixed CVEs print but don't block. Covers `os,library` vuln types — distroless's OS package set plus Go-binary content Trivy re-derives, intentionally overlapping with `govulncheck`'s source-reachability view. Periodic re-scan (catches CVEs disclosed after merge against unchanged bases) is a follow-up ticket.
PR-time scanning is wired in `.github/workflows/ci.yml` as the `image-scan` job (#68). Each PR builds the image locally as `pyrycode-relay:${{ github.sha }}` and runs `aquasecurity/trivy-action` (commit-SHA pinned, `# Tracks: <upstream-tag>` comment alongside — same convention as the Dockerfile base-image digest pins) against it. The job fails on **fixable** CRITICAL/HIGH CVEs only (`ignore-unfixed: true`); unfixed CVEs print but don't block. Covers `os,library` vuln types — distroless's OS package set plus Go-binary content Trivy re-derives, intentionally overlapping with `govulncheck`'s source-reachability view.

Post-merge scanning lives in `.github/workflows/security-scan.yml` as a separate workflow on a daily `0 6 * * *` UTC cron + `workflow_dispatch` (#72). Same scanner, same pins (lockstep with `ci.yml` via `# Tracks:` comments), same flags — re-runs both the image-side Trivy job and the source-side `govulncheck` job against the latest `main` SHA, so a CVE disclosed against deps that haven't changed since the last PR surfaces as a red row in the Actions list within ≤24h instead of staying invisible until the next dep bump. Auto-issue filing on a red cron run is a follow-up (#73).

## Cross-links

Expand Down
Loading