Skip to content

feat(skills): gate deploys on catalogue consistency and SkillSpector scan#909

Closed
DevelopmentCats wants to merge 8 commits into
mainfrom
cat/skills-scan-gate
Closed

feat(skills): gate deploys on catalogue consistency and SkillSpector scan#909
DevelopmentCats wants to merge 8 commits into
mainfrom
cat/skills-scan-gate

Conversation

@DevelopmentCats
Copy link
Copy Markdown
Collaborator

@DevelopmentCats DevelopmentCats commented Jun 2, 2026

Summary

Adds a two-stage gate in front of every registry deploy so the agent-skills
catalogue cannot ship broken or unsafe content:

  1. Catalogue consistency. Extends cmd/readmevalidation with a
    READMEVALIDATION_ONLINE=1 mode that shallow-clones every declared
    owner/repo@ref, verifies every sources[].skills slug exists upstream
    as skills/<slug>/SKILL.md, validates each SKILL.md frontmatter
    against the agentskills.io v0.2.0 specification (required name and
    description), and rejects duplicate slugs across sources within the
    same namespace. Strict gate, blocks on failure.
  2. Security scan. Runs NVIDIA SkillSpector
    (static-only, --no-llm) over each unique upstream source repo and
    uploads SARIF to GitHub Code scanning. Warn-only gate for now, see
    "Initial scan policy" below.

Sequential gating: offline catalogue checks → online catalogue checks →
SkillSpector. If anything fails (including the SkillSpector gate once it
is flipped to strict), the deploy's gcloud builds triggers run step
never fires and a single rolling tracker issue (label skills-gate) is
opened or updated by JasonEtco/create-an-issue, so repeat failures
update the same issue rather than spamming.

Initial scan policy: warn-only

SkillSpector flags 13 error-level findings against coder/skills@main
on the first run, all in the intentionally credential-handling setup
skill:

  • 5x PE3 Credential Access in skills/setup/scripts/github-device-fetch.sh
  • 1x PE3 Credential Access in skills/setup/scripts/github-device-poll.sh
  • 1x E2 Env Variable Harvesting in skills/setup/references/coder-agents.md
  • 2x PE3 Credential Access in test/claude.sh, test/codex.sh
  • 4 others on the same setup files

These are real pattern matches but expected behavior for an installer
skill that orchestrates a GitHub device-flow login. Rather than
rubber-stamping them or holding this PR for upstream cleanup, the
severity step is shipped as continue-on-error: true. Findings still
flow to Code scanning so they show up as PR annotations and Security
tab alerts. A follow-up PR can:

  • Tune coder/skills (add SkillSpector ignore comments, move test
    scripts out of the scan path, restructure the device-flow helpers).
  • Flip continue-on-error: false in both ci.yaml and
    deploy-registry.yaml to make the gate strict.

What this solves

The registry-server build pipeline currently drops catalogue overrides
for upstream slugs it cannot find, silently. Upstream rename, deletion,
or branch-flip is invisible until a contributor notices a 404. This PR
makes the gap loud and blocks the deploy on the consistency check.

What this does NOT solve

Files

Path Change
cmd/readmevalidation/coderskills.go New online verification path (validateAllCoderSkillsOnline, fetchSkillsSourceContent, verifyReadmeAgainstSources, validateSkillMarkdownFrontmatter). Factors validateAllCoderSkills to expose parsed readmes. Uses the system git binary via exec.Command to avoid pulling in a Go Git library.
cmd/readmevalidation/main.go Runs the online check after the offline check when READMEVALIDATION_ONLINE=1.
cmd/readmevalidation/readmefiles.go Adds validationPhaseUpstream.
scripts/scan-skill-sources.sh Iterates skills READMEs, extracts unique repo@ref tuples, runs SkillSpector per source, writes SARIF. Treats missing-SARIF as the only true tool crash; severity gating lives in the workflow.
.github/workflows/ci.yaml dorny/paths-filter on validate-readme-files + new scan-skills job, both skipped on PRs that do not touch skills paths. SkillSpector severity step is continue-on-error: true.
.github/workflows/deploy-registry.yaml Refactored into verifydeploy + open-issue-on-failure. Same SkillSpector severity policy as ci.yaml.
.github/ISSUE_TEMPLATE/skills-failure.md Single tracker template used by JasonEtco/create-an-issue with update_existing: true.

Verification

Local, before pushing, against current coder/skills@main:

$ go build ./cmd/readmevalidation
$ READMEVALIDATION_ONLINE=1 ./readmevalidation
...
[info]  verifying upstream skill sources  num_unique_sources=1
[info]  upstream skill source verification passed  num_files=1  num_sources=1

Failure path (injected nonexistent-slug into sources[].skills):

exit=1
[erro]  Error during "Upstream skill source verification" phase of README validation:
   - "registry/coder/skills/README.md": sources[0].skills["nonexistent-slug"]:
     slug not found upstream at coder/skills/skills/nonexistent-slug/SKILL.md
     (repo=coder/skills ref=main)

CI status on this PR: all seven checks green, including the new
Validate README files (online catalogue check), Scan skill sources with SkillSpector (warn-only gate, SARIF uploaded), and skillspector
(Code scanning ingest). Validate Terraform output flaked twice on
registry.terraform.io provider downloads (Error while installing coder/coder v2.18.0: ... failed to retrieve authentication checksums for provider); passed on the third retry.

Open follow-ups

  • Move SkillSpector from a git-pinned commit to a tagged release when
    one is available. Currently pinned at NVIDIA/SkillSpector
    2eb844780ab163f01468ecf142c40a2ec0fcaec0; SkillSpector is not on
    PyPI yet.
  • Triage the warn-only SkillSpector findings in coder/skills@main,
    then flip continue-on-error: false in both workflows.
  • Once cloneSkillSourceRepo in coder/registry-server accepts tags
    and SHAs, switch sources from @main to immutable refs and add a
    Renovate-style bumper.
Implementation plan and decision log

The full plan that drove this PR (option space, the choice to shell out
to git rather than vendor go-git, why we extend cmd/readmevalidation
instead of adding a new binary, severity policy reasoning) lives in
plans/skills-scan-pr-plan.md and is available on request.

This PR was created with help from Coder Agents.

Adds a READMEVALIDATION_ONLINE=1 mode to cmd/readmevalidation that
shallow-clones every owner/repo@ref declared in
registry/<namespace>/skills/README.md and verifies:

- every catalogue override slug exists upstream as skills/<slug>/SKILL.md,
- each upstream SKILL.md has the required name and description frontmatter
  per the agentskills.io v0.2.0 specification,
- no slug is provided by more than one source within the same namespace.

The mode is off by default so go run ./cmd/readmevalidation keeps working
without network access. CI enables it when network is available.

Lifts the shallow-clone shape from coder/registry-server/cmd/build/skills.go
so the verifier matches what the deploy pipeline will actually ingest.
Branch-only refs for now, matching that pipeline's current behavior.
New helper that walks every registry/<namespace>/skills/README.md, extracts
each unique owner/repo@ref from the YAML frontmatter, and runs
NVIDIA SkillSpector over the upstream GitHub URL. One SARIF file per source
is written to ./sarif (configurable via the first positional argument).

The script does not gate on findings. It exits non-zero only when
skillspector itself crashes. Severity gating lives in the workflow so
the deploy policy is visible alongside the deploy step.
Extends the existing validate-readme-files CI job with an online phase
that runs cmd/readmevalidation with READMEVALIDATION_ONLINE=1 when the
PR touches skills paths. Adds a scan-skills job that downloads and runs
NVIDIA SkillSpector over every declared upstream source and uploads
SARIF to GitHub Code scanning, failing on findings at the error level.

Refactors deploy-registry into three jobs:

- verify: runs the same online validator + SkillSpector before deploy.
- deploy: now needs: verify, so gcloud builds triggers run only fires
  when the catalogue is consistent and the scan is clean.
- open-issue-on-failure: posts to a single rolling tracker issue
  (label skills-gate) via JasonEtco/create-an-issue with update_existing,
  so repeat failures update the same issue rather than spamming.

A dorny/paths-filter step drives both the online phase and the scan-skills
job so PRs that do not touch skills pay no extra cost.
- Drop go-git/v5 dependency and use the system git binary via
  exec.Command. The library pulled go 1.25.0 transitively, breaking
  golangci-lint v2.1.6 which is built against go 1.24.
- Apply prettier-plugin-sh formatting to scan-skill-sources.sh.
- Disable actions/setup-go's default module cache in the deploy
  verify job to satisfy zizmor's cache-poisoning rule on jobs that
  hold security-events: write.
SkillSpector is not on PyPI yet, so the install must reference the
NVIDIA/SkillSpector repository directly. Pinned at commit
2eb844780ab163f01468ecf142c40a2ec0fcaec0 to keep the policy
reproducible. Will switch to a pip-published version when one ships.

Also bumps actions/setup-python from v5.6.0 to v6.2.0 to get off the
deprecated Node 20 action runtime, and pins the latest pip before
installing skillspector so its hatchling build env has the recent
packaging release that supports the SPDX license expression in its
pyproject.toml.
SkillSpector exits non-zero when its scan finds anything to report,
even though it still writes the SARIF file. Treating that as a crash
short-circuits SARIF upload and prevents the workflow's severity jq
step from making the policy decision. Inverts the check: scan_failed
is now set only when the SARIF file is missing or empty, which is
the genuine 'tool crashed' signal.
@github-advanced-security
Copy link
Copy Markdown

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

  • The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
  • Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
  • You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

Will be removed once the SkillSpector severity policy is locked in.
SkillSpector flags 13 error-level findings against coder/skills@main,
all in the intentionally credential-handling setup skill (GitHub
device-flow scripts and references). These are real patterns
matching SkillSpector's PE3 'Credential Access' and E2 'Env Variable
Harvesting' rules, but they describe expected behavior for an
installer skill rather than a vulnerability.

Initial policy is warn-only: SARIF still flows to Code scanning so
the findings are visible on PRs, the severity gate still runs and
prints, but continue-on-error keeps the workflow green. A follow-up
PR can either tune the rules in coder/skills, add SkillSpector
suppressions, or flip continue-on-error to false to make the gate
strict.

Also drops the debug upload-artifact step now that the SARIF shape
is known.
Copy link
Copy Markdown
Member

@bpmct bpmct left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this scanning help us if its at deploy time and there are dynamically changing repos?

@DevelopmentCats
Copy link
Copy Markdown
Collaborator Author

Member

Its probably not the most elegant solution, but the idea is that anytime we would deploy the registry it would run this skill verification on each skills repo and if it fails the deployment will just fail and create an issue. I was fiddling with some stuff while researching on skill scanning last night and how we would implement this, and this is a WIP currently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants