feat(skills): gate deploys on catalogue consistency and SkillSpector scan#909
feat(skills): gate deploys on catalogue consistency and SkillSpector scan#909DevelopmentCats wants to merge 8 commits into
Conversation
Adds a READMEVALIDATION_ONLINE=1 mode to cmd/readmevalidation that shallow-clones every owner/repo@ref declared in registry/<namespace>/skills/README.md and verifies: - every catalogue override slug exists upstream as skills/<slug>/SKILL.md, - each upstream SKILL.md has the required name and description frontmatter per the agentskills.io v0.2.0 specification, - no slug is provided by more than one source within the same namespace. The mode is off by default so go run ./cmd/readmevalidation keeps working without network access. CI enables it when network is available. Lifts the shallow-clone shape from coder/registry-server/cmd/build/skills.go so the verifier matches what the deploy pipeline will actually ingest. Branch-only refs for now, matching that pipeline's current behavior.
New helper that walks every registry/<namespace>/skills/README.md, extracts each unique owner/repo@ref from the YAML frontmatter, and runs NVIDIA SkillSpector over the upstream GitHub URL. One SARIF file per source is written to ./sarif (configurable via the first positional argument). The script does not gate on findings. It exits non-zero only when skillspector itself crashes. Severity gating lives in the workflow so the deploy policy is visible alongside the deploy step.
Extends the existing validate-readme-files CI job with an online phase that runs cmd/readmevalidation with READMEVALIDATION_ONLINE=1 when the PR touches skills paths. Adds a scan-skills job that downloads and runs NVIDIA SkillSpector over every declared upstream source and uploads SARIF to GitHub Code scanning, failing on findings at the error level. Refactors deploy-registry into three jobs: - verify: runs the same online validator + SkillSpector before deploy. - deploy: now needs: verify, so gcloud builds triggers run only fires when the catalogue is consistent and the scan is clean. - open-issue-on-failure: posts to a single rolling tracker issue (label skills-gate) via JasonEtco/create-an-issue with update_existing, so repeat failures update the same issue rather than spamming. A dorny/paths-filter step drives both the online phase and the scan-skills job so PRs that do not touch skills pay no extra cost.
- Drop go-git/v5 dependency and use the system git binary via exec.Command. The library pulled go 1.25.0 transitively, breaking golangci-lint v2.1.6 which is built against go 1.24. - Apply prettier-plugin-sh formatting to scan-skill-sources.sh. - Disable actions/setup-go's default module cache in the deploy verify job to satisfy zizmor's cache-poisoning rule on jobs that hold security-events: write.
SkillSpector is not on PyPI yet, so the install must reference the NVIDIA/SkillSpector repository directly. Pinned at commit 2eb844780ab163f01468ecf142c40a2ec0fcaec0 to keep the policy reproducible. Will switch to a pip-published version when one ships. Also bumps actions/setup-python from v5.6.0 to v6.2.0 to get off the deprecated Node 20 action runtime, and pins the latest pip before installing skillspector so its hatchling build env has the recent packaging release that supports the SPDX license expression in its pyproject.toml.
SkillSpector exits non-zero when its scan finds anything to report, even though it still writes the SARIF file. Treating that as a crash short-circuits SARIF upload and prevents the workflow's severity jq step from making the policy decision. Inverts the check: scan_failed is now set only when the SARIF file is missing or empty, which is the genuine 'tool crashed' signal.
|
You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool. What Enabling Code Scanning Means:
For more information about GitHub Code Scanning, check out the documentation. |
Will be removed once the SkillSpector severity policy is locked in.
SkillSpector flags 13 error-level findings against coder/skills@main, all in the intentionally credential-handling setup skill (GitHub device-flow scripts and references). These are real patterns matching SkillSpector's PE3 'Credential Access' and E2 'Env Variable Harvesting' rules, but they describe expected behavior for an installer skill rather than a vulnerability. Initial policy is warn-only: SARIF still flows to Code scanning so the findings are visible on PRs, the severity gate still runs and prints, but continue-on-error keeps the workflow green. A follow-up PR can either tune the rules in coder/skills, add SkillSpector suppressions, or flip continue-on-error to false to make the gate strict. Also drops the debug upload-artifact step now that the SARIF shape is known.
bpmct
left a comment
There was a problem hiding this comment.
how does this scanning help us if its at deploy time and there are dynamically changing repos?
Its probably not the most elegant solution, but the idea is that anytime we would deploy the registry it would run this skill verification on each skills repo and if it fails the deployment will just fail and create an issue. I was fiddling with some stuff while researching on skill scanning last night and how we would implement this, and this is a WIP currently |
Summary
Adds a two-stage gate in front of every registry deploy so the agent-skills
catalogue cannot ship broken or unsafe content:
cmd/readmevalidationwith aREADMEVALIDATION_ONLINE=1mode that shallow-clones every declaredowner/repo@ref, verifies everysources[].skillsslug exists upstreamas
skills/<slug>/SKILL.md, validates eachSKILL.mdfrontmatteragainst the agentskills.io v0.2.0 specification (required
nameanddescription), and rejects duplicate slugs across sources within thesame namespace. Strict gate, blocks on failure.
(static-only,
--no-llm) over each unique upstream source repo anduploads SARIF to GitHub Code scanning. Warn-only gate for now, see
"Initial scan policy" below.
Sequential gating: offline catalogue checks → online catalogue checks →
SkillSpector. If anything fails (including the SkillSpector gate once it
is flipped to strict), the deploy's
gcloud builds triggers runstepnever fires and a single rolling tracker issue (label
skills-gate) isopened or updated by
JasonEtco/create-an-issue, so repeat failuresupdate the same issue rather than spamming.
Initial scan policy: warn-only
SkillSpector flags 13 error-level findings against
coder/skills@mainon the first run, all in the intentionally credential-handling setup
skill:
PE3 Credential Accessinskills/setup/scripts/github-device-fetch.shPE3 Credential Accessinskills/setup/scripts/github-device-poll.shE2 Env Variable Harvestinginskills/setup/references/coder-agents.mdPE3 Credential Accessintest/claude.sh,test/codex.shThese are real pattern matches but expected behavior for an installer
skill that orchestrates a GitHub device-flow login. Rather than
rubber-stamping them or holding this PR for upstream cleanup, the
severity step is shipped as
continue-on-error: true. Findings stillflow to Code scanning so they show up as PR annotations and Security
tab alerts. A follow-up PR can:
coder/skills(add SkillSpector ignore comments, move testscripts out of the scan path, restructure the device-flow helpers).
continue-on-error: falsein bothci.yamlanddeploy-registry.yamlto make the gate strict.What this solves
The registry-server build pipeline currently drops catalogue overrides
for upstream slugs it cannot find, silently. Upstream rename, deletion,
or branch-flip is invisible until a contributor notices a 404. This PR
makes the gap loud and blocks the deploy on the consistency check.
What this does NOT solve
@main. Pinning to SHAs or tagsdepends on coder/registry-server PR chore: remove it wrappers from required variables tests #442 follow-up chore: add CI step for validating contributor READMEs #1 landing first
(the
cloneSkillSourceRepobranch-only bug).upstream at deploy time. A future follow-up can move the scan inside
the deploy job to close it further.
Files
cmd/readmevalidation/coderskills.govalidateAllCoderSkillsOnline,fetchSkillsSourceContent,verifyReadmeAgainstSources,validateSkillMarkdownFrontmatter). FactorsvalidateAllCoderSkillsto expose parsed readmes. Uses the systemgitbinary viaexec.Commandto avoid pulling in a Go Git library.cmd/readmevalidation/main.goREADMEVALIDATION_ONLINE=1.cmd/readmevalidation/readmefiles.govalidationPhaseUpstream.scripts/scan-skill-sources.shrepo@reftuples, runs SkillSpector per source, writes SARIF. Treats missing-SARIF as the only true tool crash; severity gating lives in the workflow..github/workflows/ci.yamldorny/paths-filteronvalidate-readme-files+ newscan-skillsjob, both skipped on PRs that do not touch skills paths. SkillSpector severity step iscontinue-on-error: true..github/workflows/deploy-registry.yamlverify→deploy+open-issue-on-failure. Same SkillSpector severity policy asci.yaml..github/ISSUE_TEMPLATE/skills-failure.mdJasonEtco/create-an-issuewithupdate_existing: true.Verification
Local, before pushing, against current
coder/skills@main:Failure path (injected
nonexistent-slugintosources[].skills):CI status on this PR: all seven checks green, including the new
Validate README files(online catalogue check),Scan skill sources with SkillSpector(warn-only gate, SARIF uploaded), andskillspector(Code scanning ingest).
Validate Terraform outputflaked twice onregistry.terraform.ioprovider downloads (Error while installing coder/coder v2.18.0: ... failed to retrieve authentication checksums for provider); passed on the third retry.Open follow-ups
one is available. Currently pinned at NVIDIA/SkillSpector
2eb844780ab163f01468ecf142c40a2ec0fcaec0; SkillSpector is not onPyPI yet.
coder/skills@main,then flip
continue-on-error: falsein both workflows.cloneSkillSourceRepoincoder/registry-serveraccepts tagsand SHAs, switch sources from
@mainto immutable refs and add aRenovate-style bumper.
Implementation plan and decision log
The full plan that drove this PR (option space, the choice to shell out
to
gitrather than vendorgo-git, why we extendcmd/readmevalidationinstead of adding a new binary, severity policy reasoning) lives in
plans/skills-scan-pr-plan.mdand is available on request.This PR was created with help from Coder Agents.