Skip to content

ci: auto-file GitHub issue when periodic security-scan finds a regression (#73)#75

Merged
ilmoniemi merged 3 commits into
mainfrom
feature/73
May 12, 2026
Merged

ci: auto-file GitHub issue when periodic security-scan finds a regression (#73)#75
ilmoniemi merged 3 commits into
mainfrom
feature/73

Conversation

@ilmoniemi
Copy link
Copy Markdown
Contributor

What

Extend .github/workflows/security-scan.yml so a failing daily cron run opens a labelled GitHub issue instead of just producing a red row in the Actions list.

  • Both scanners now emit JSON (trivy.json, govulncheck.json) so the body can be machine-rendered without table-column regex fragility. Policy flags (severity, ignore-unfixed, vuln-type, exit-code) are unchanged — same definition of a finding as PR-time.
  • Each scanner job gains two if: failure() post-steps: a jq-based renderer that writes issue-title.txt + issue-body.md, and an actions/upload-artifact@v5 step (guarded by hashFiles(...) != '' so scanner-infrastructure failures with no parseable findings produce no artifact and no spurious issue).
  • New file-issue job (needs: [image-scan, govulncheck], if: failure()) downloads the artifacts via actions/download-artifact@v5, runs a dedup gh issue list --search "in:title \"<full-title>\"" against open security-sensitive issues, and posts via gh issue create --body-file (NEVER --body "<...>", so CVE-derived content never passes through shell-argument expansion).
  • issues: write is granted at the job level on file-issue only — the workflow-level block stays contents: read and the scanner jobs keep their belt-and-suspenders contents: read (AC relay: connection registry — server-id → binary + server-id → [phone] thread-safe maps #3).
  • Workflow-level concurrency: security-scan with cancel-in-progress: false (the block deferred from relay: periodic security-scan cron workflow (re-runs Trivy + govulncheck against main) #72) so overlapping scheduled/manual runs queue rather than truncate artifacts mid-upload.

Issue

Closes #73. Architecture spec: docs/specs/architecture/73-auto-issue-on-security-regression.md.

Testing

This is a CI workflow change — no Go code touched, no unit tests added. The spec's testing strategy is operator-driven against the deployed workflow:

  1. Clean-main green path (AC relay: WS upgrade for /v1/server — accept binary connection, validate headers, claim server-id #4 first half). After merge, hit "Run workflow" on the Actions tab against main. Expect: both scanners green, file-issue skipped (failure() false), no issue filed.
  2. Vulnerable-state path (AC relay: WS upgrade for /v1/server — accept binary connection, validate headers, claim server-id #4 second half). On a throwaway branch (e.g. verify/73-vulnerable-state), pin the Dockerfile base image to a digest with a known fixable CRITICAL/HIGH CVE (or add a temporary golang.org/x/text v0.3.7 import reachable from main() to surface a GO-... advisory). gh workflow run security-scan.yml --ref verify/73-vulnerable-state. Expect: the failing scanner job goes red, file-issue runs, one issue is filed with the deterministic title, security-sensitive label, body containing the CVE/vuln id + component + workflow run URL. Trigger a second run on the same branch; expect duplicate suppressed for: <title> in the file-issue log and no second issue. Close the test issue, delete the branch.
  3. Pre-merge sanity (optional). Temporarily add a pull_request: trigger to observe a green run on this PR, then remove before merge.

Architecture compliance

Security-review section in the architect spec covers trust boundaries (CVE-DB content → gh issue create --body-file), token scoping (github.token, job-scoped permissions), adversarial scenarios (markdown injection bounded by GitHub's renderer; concurrent runs coalesced; dedup false-positive/-negative both accepted).

🤖 Generated with Claude Code

ilmoniemi and others added 2 commits May 12, 2026 20:49
…sion (#73)

Extend `.github/workflows/security-scan.yml` so a red cron run gains an
owner via a labelled GitHub issue instead of scrolling past in Actions.

- Switch both scanners to JSON output (`format: json` for Trivy,
  `-json` for govulncheck) so the rendered issue body can name the
  primary CVE/vuln id and affected component without fragile table
  parsing. Policy flags (severity, ignore-unfixed, vuln-type,
  exit-code) are unchanged — same definition of a finding as PR-time.
- Add per-scanner `if: failure()` render + upload-artifact steps that
  pre-bake an `issue-title.txt` + `issue-body.md` pair and hand them
  off as `regression-image-scan` / `regression-govulncheck` artifacts.
- Add a `file-issue` job (`needs: [image-scan, govulncheck]`,
  `if: failure()`) that downloads the artifacts and posts via
  `gh issue create --body-file` (never `--body "<...>"`). `gh issue
  list --search "in:title \"<full-title>\""` provides best-effort
  dedup against open `security-sensitive` issues.
- `issues: write` is scoped to the `file-issue` job ONLY — the
  workflow-level block stays `contents: read` and the scanner jobs
  keep their belt-and-suspenders `contents: read` (AC #3).
- Add workflow-level `concurrency: security-scan` (deferred from #72)
  with `cancel-in-progress: false` so overlapping scheduled/manual
  runs queue rather than truncate artifacts mid-upload.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ilmoniemi
Copy link
Copy Markdown
Contributor Author

Code Review: #73

Decision: PASS

Security-sensitive checklist

  • security-sensitive label is present on relay: auto-file GitHub issue when periodic security-scan finds a regression #73.
  • Spec at docs/specs/architecture/73-auto-issue-on-security-regression.md contains a ## Security review section with verdict PASS and a complete findings list (trust boundaries, tokens, file ops, subprocess, crypto, network, logs, concurrency, threat-model alignment, adversarial scenarios). Architect's security-review obligation satisfied.
  • Diff matches the spec's security-relevant invariants (see Findings).

AC verification

Security-goggles pass on the diff

  • gh issue create uses --body-file (:260), never --body "<...>" — CVE-derived content never traverses shell-argument expansion.
  • GH_TOKEN: ${{ github.token }} (:227) is the workflow-issued token; token never echoed, logged, or compared. No secrets.*, no PAT.
  • All multi-line shell steps use set -euo pipefail (render + file-issue) or set -o pipefail (govulncheck-invocation step, per the spec's explicit prescription).
  • shopt -s nullglob (:231) makes the artifacts/regression-* loop empty-safe when only one scanner failed or both produced no parseable findings.
  • hashFiles('issue-title.txt') != '' (:118, :192) guards uploads so a scanner-infrastructure failure (no parseable JSON) produces no artifact and no spurious issue.
  • concurrency: { group: security-scan, cancel-in-progress: false } (:32-34) coalesces overlapping runs without truncating mid-upload — the deferral named in relay: periodic security-scan cron workflow (re-runs Trivy + govulncheck against main) #72's spec is now closed.

Findings

  • [NIT] .github/workflows/security-scan.yml:108-111 and :181-184 — in the flood scenario the head -100 cap is meant to handle (>100 unique findings), jq … | sort -u | head -100 can fail under bash's default pipefail (head exits → sort writes to closed pipe → SIGPIPE → pipeline exit 141 → set -e aborts the step). issue-title.txt was already written, so the hashFiles(...) upload guard still trips and file-issue posts an issue with a truncated body (the fenced ``` opened by printf is never closed because the brace group exits mid-write). Implementation mirrors the spec verbatim — flagging for visibility; defer-to-architect if a fix is wanted (e.g. (jq … | sort -u || true) | head -100, or capture to a temp file then head -100 file).
  • [NIT] .github/workflows/security-scan.yml:245--search "in:title \"$title\"" interpolates a CVE-derived $title into a quoted GitHub search query. CVE IDs (CVE-YYYY-NNNNN) and Go vuln IDs (GO-YYYY-NNNN) plus typical Go module paths are ASCII-only with no backslashes or unbalanced quotes, so practically safe. The spec already classes this as accepted residual (worst case: a false-negative dedup files one duplicate, bounded blast radius).

Summary

Three jobs, two trust tiers, clean privilege split. The artifact-handoff design correctly keeps issues: write off the scanner jobs (AC #3) while letting the filer post pre-rendered content without re-scanning. Loud failures (set -euo pipefail, no silent dedup-miss) match the project's loud-failure convention. The two NITs above are corner cases the architect could fold into a follow-up; neither blocks merge.

- docs/knowledge/codebase/73.md — implementation summary, patterns established
  (privilege split via artifact handoff, --body-file for machine-generated bodies,
  job-level permission widening, deterministic-title dedup, shopt -s nullglob),
  verification flows, and lessons learned (passive→active signal conversion,
  shell-argument expansion as threat surface, lockstep-via-comment).
- docs/knowledge/features/docker-image.md § CI image scanning — now names the
  file-issue job, the artifact-handoff privilege split, and the dedup mechanism.
- docs/threat-model.md § Supply chain — cites #73 alongside #72 as the
  actionability layer on the periodic re-scan control.
- docs/knowledge/INDEX.md — extends the docker-image entry with the #73 summary.
@ilmoniemi ilmoniemi merged commit d7fe982 into main May 12, 2026
3 checks passed
@ilmoniemi ilmoniemi deleted the feature/73 branch May 12, 2026 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

relay: auto-file GitHub issue when periodic security-scan finds a regression

1 participant