supply chain security collector

discover what supply chain security practices are visible across open source projects.

point it at any set of GitHub repos (or the entire CNCF landscape) and get back:

which projects publish SBOMs, signatures, and attestations in their GitHub releases
which GitHub Actions workflows reference cosign, syft, trivy, codeql, and 20+ other security tools
per-repo and per-project summaries, queryable in SQL

the output is a DuckDB database + Parquet files. query it, graph it, or feed it to anything.

what this covers today

this tool collects from GitHub's GraphQL API only. it sees:

release assets (SBOMs, .sig files, attestations, cosign bundles)
GitHub Actions workflow files (tool references via full-text search)
branch protection rules, SECURITY-INSIGHTS.yml, security advisories

what it doesn't see (yet)

many projects ship security artifacts through channels this tool can't reach:

OCI registries — container image signatures via cosign (cosign verify)
package managers — npm provenance, PyPI attestations, Go module checksums
non-GitHub CI — Prow (Kubernetes), Azure Pipelines, Jenkins, CircleCI
GitHub Attestations API — actions/attest-build-provenance (not yet queried)
GoReleaser / release tooling — config-driven signing that doesn't surface in workflow grep

every number this tool produces is a lower bound. absence of evidence ≠ evidence of absence. a project showing zero artifacts here may be signing everything — just not through a channel we collect from.

quick start

npm install
export GITHUB_PERSONAL_ACCESS_TOKEN=ghp_your_token_here
npm test

that's it. runs against 3 CNCF projects (Kubernetes, Harbor, Jaeger), produces a DuckDB database + Parquet files + analysis tables.

then look at what you got:

# query the database directly
duckdb output/test-three-projects/current/database.db \
  -c "SELECT nameWithOwner, has_sbom_artifact, uses_cosign, uses_codeql FROM agg_repo_summary"

# generate a markdown report
npm run report -- --database output/test-three-projects/current/database.db

what you get

output/test-three-projects/current/
├── database.db                                    # DuckDB database with all tables
├── parquet/                                       # all tables as Parquet files
│   ├── base_repositories.parquet                  # normalized entities
│   ├── base_releases.parquet
│   ├── base_release_assets.parquet
│   ├── base_workflows.parquet
│   ├── agg_repo_summary.parquet                   # analysis results
│   ├── agg_workflow_tools.parquet
│   ├── agg_artifact_patterns.parquet
│   └── ...
├── raw-responses.GetRepoDataExtendedInfo.jsonl    # API audit trail
├── security-insights-sboms.csv                    # extracted SBOM declarations
└── security-insights-attestations.csv             # extracted attestations

generate a report

npm run report -- --database output/test-three-projects/current/database.db

Produces a structured markdown report with executive summary, tool adoption landscape, SBOM/signing coverage, and maturity-based recommendations.

run the full CNCF landscape

npm run fetch:landscape    # download latest CNCF landscape metadata
npm start                  # collect + analyze ~230 projects

how it works

GitHub GraphQL API → TypeScript normalizers → DuckDB (base_* tables) → SQL models → analysis (agg_* tables)

Stage 1 — Collection & Normalization (src/neo.ts): Fetches from GitHub's GraphQL API, transforms nested responses into flat relational base_* tables using typed normalizers, writes to DuckDB + Parquet.

Stage 2 — Analysis (src/analyze.ts): Runs numbered SQL models against base_* tables to produce agg_* aggregation tables detecting security patterns.

what it detects

Category	Examples
SBOM artifacts	SPDX, CycloneDX in release assets
Signing artifacts	`.sig`, `.asc`, cosign signatures
Attestations	SLSA provenance, in-toto, VEX, sigstore bundles
CI/CD security tools	cosign, syft, trivy, codeql, snyk, grype, docker-scout, fossa, dependabot, renovate
Security Insights	SECURITY-INSIGHTS.yml parsing (SBOMs, attestations declared)

See docs/detection-reference.md for the full pattern catalog.

input formats

Two formats, auto-detected:

Simple — just repos:

[
  {"owner": "sigstore", "name": "cosign"},
  {"owner": "anchore", "name": "syft"}
]

Rich — with CNCF project metadata (generated from landscape.yml):

[
  {
    "project_name": "Kubernetes",
    "repos": [{"owner": "kubernetes", "name": "kubernetes", "primary": true}],
    "maturity": "graduated",
    "category": "Orchestration & Management",
    "has_security_audits": true
  }
]

Test files in input/:

File	Content
`test-single-project.json`	Kubernetes (1 repo)
`test-three-projects.json`	Kubernetes, Harbor, Jaeger (3 maturities)
`test-simple-format.json`	cosign, syft (simple format, no metadata)
`cncf-full-landscape.json`	Full CNCF landscape (~230 projects)

table layers

base_* — normalized entities from GraphQL (repositories, releases, release_assets, workflows, branch_protection_rules, cncf_projects, cncf_project_repos, security_md, si_documents, si_sboms)
agg_* — analysis output (repo_summary, workflow_tools, artifact_patterns, cncf_project_summary, executive_summary, tool_summary, and more)
raw_* — full GraphQL responses preserved in database

analyzing data

# DuckDB CLI
duckdb output/test-three-projects/current/database.db \
  -c "SELECT nameWithOwner, uses_cosign, uses_codeql, has_sbom_artifact FROM agg_repo_summary"

# Run analysis on an existing database
npm run analyze -- --database output/test-three-projects/current/database.db

# Generate markdown report
npm run report -- --database output/test-three-projects/current/database.db

# Build property graph (LadybugDB) for Cypher queries
npm run graph -- --database output/test-three-projects/current/database.db
npm run graph:list                 # list available Cypher queries
npm run graph:query -- graduated-no-signing   # run a specific query

Any tool that reads Parquet works too — the parquet/ directory has every table.

npm scripts

Command	Description
`npm test`	Quick test (3 CNCF projects)
`npm start`	Full CNCF landscape (~230 projects)
`npm run test:single`	Single project (Kubernetes)
`npm run test:simple`	Simple format (2 repos, no metadata)
`npm run collect`	Custom collection (`ts-node src/neo.ts` with flags)
`npm run analyze`	Run SQL analysis on existing database
`npm run report`	Generate markdown report from database
`npm run graph`	Build LadybugDB property graph
`npm run graph:query`	Run Cypher queries against graph
`npm run fetch:landscape`	Download latest CNCF landscape data
`npm run lint`	ESLint check
`npm run typecheck`	TypeScript type check
`npm run codegen`	Regenerate types from GraphQL schema
`npm run clean`	Remove output/, cache, generated files

advanced usage

npm run collect -- \
  --input your-repos.json \
  --queries GetRepoDataExtendedInfo \
  --parallel \
  --analyze

CLI flags: --input <file>, --queries <name>, --parallel, --analyze, --maturity <graduated|incubating|sandbox>, --repo-scope <primary|all>

extending

New GraphQL query: Create .graphql → npm run codegen → write normalizer in src/normalizers/ → register in ArtifactWriter.ts. See docs/adding-new-queries.md.

New analysis: Add numbered SQL file in sql/models/ → register in SecurityAnalyzer.ts. See sql/README.md.

Other GraphQL APIs: The collection layer is generic. Swap src/api.ts endpoint, write new queries and normalizers. Normalizers are hand-written (not auto-generated) — each transforms nested GraphQL responses into flat relational arrays.

interactive explorer

browse the full CNCF landscape data in your browser — no backend required:

Live Explorer →

DuckDB-WASM loads Parquet files directly in the browser. Write SQL, see charts, explore 236 projects interactively. Includes a 15-query pre-built library and an exploration journal.

presentation

this tool was built to support a CNCF TAG Security presentation (April 2026). all materials are in the repo:

Presentation materials — deck, findings report, strategy docs, diagrams
Project history — full timeline with annotated Mermaid charts
Key findings — what we found across 236 projects

documentation

Adding New Queries — step-by-step extension guide
Detection Reference — supply chain security pattern catalog
Data Model — table schemas and relationships
Output Architecture — output format and directory structure
SQL Analysis — SQL model architecture
Codegen Guide — GraphQL code generation
Project Milestones — project history and evolution

project structure

src/
├── neo.ts                  # CLI entry point, collection orchestrator
├── analyze.ts              # Analysis CLI
├── api.ts                  # GitHub GraphQL client
├── ArtifactWriter.ts       # DuckDB + Parquet writer
├── SecurityAnalyzer.ts     # SQL model execution engine
├── ReportGenerator.ts      # Markdown report generator
├── report-cli.ts           # Report CLI
├── normalizers/            # Query-specific normalizers (hand-written)
├── graphql/                # GraphQL query definitions
├── graph/                  # LadybugDB property graph integration
└── generated/              # GraphQL codegen output (git-ignored)

sql/models/                 # Numbered SQL analysis models (00-05)
input/                      # Test and input data files
cypher/                     # Standalone Cypher query files

requirements

Node 18+
GitHub Personal Access Token (set GITHUB_PERSONAL_ACCESS_TOKEN)
Python 3.12 (only for Jupyter notebooks)

license

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
.github/workflows		.github/workflows
cypher		cypher
docs		docs
input		input
notebooks		notebooks
schema		schema
scripts		scripts
site		site
sql		sql
src		src
.env.template		.env.template
.gitignore		.gitignore
.markdownlintrc		.markdownlintrc
.python-version		.python-version
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
codegen.ts		codegen.ts
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
spectaql.yml		spectaql.yml
tsconfig.json		tsconfig.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

supply chain security collector

what this covers today

what it doesn't see (yet)

quick start

what you get

generate a report

run the full CNCF landscape

how it works

what it detects

input formats

table layers

analyzing data

npm scripts

advanced usage

extending

interactive explorer

presentation

documentation

project structure

requirements

license

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

supply chain security collector

what this covers today

what it doesn't see (yet)

quick start

what you get

generate a report

run the full CNCF landscape

how it works

what it detects

input formats

table layers

analyzing data

npm scripts

advanced usage

extending

interactive explorer

presentation

documentation

project structure

requirements

license

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages