Add security hardening, fuzzing, and release automation by Neverdecel · Pull Request #34 · Neverdecel/CodeRAG

Neverdecel · 2026-06-16T19:14:51Z

Summary

This PR adds comprehensive security hardening, continuous fuzzing infrastructure, and automated release workflows to CodeRAG. It introduces keyless Sigstore signing for releases, Python SAST analysis via bandit, fuzzing of the source chunker via Atheris, and ClusterFuzzLite/OSS-Fuzz integration.

Key Changes

Security & Release Automation:

Release workflow (release.yml): Automated build, sign (keyless Sigstore), and publish of sdist + wheel artifacts on version tags with OpenSSF Scorecard compliance
Bandit SAST job (security.yml): Added Python-focused static security analysis gating on MEDIUM+ severity findings
Least privilege permissions: Refactored docker-beta.yml and codeql.yml to apply write scopes only at the job level, not workflow-wide (OpenSSF Token-Permissions compliance)

Fuzzing Infrastructure:

Atheris fuzzer (fuzz/fuzz_chunk_file.py): Continuous fuzzing harness for chunk_file() (the most exposed parser) with bounded runs on PRs and longer weekly bursts; validates the contract that chunking never crashes and structural invariants on output chunks
Fuzz workflow (fuzz.yml): Triggers on chunker-related changes or weekly schedule; runs 50k iterations on PRs, 600s on schedule
ClusterFuzzLite/OSS-Fuzz integration (.clusterfuzzlite/): Dockerfile and build script for integration with continuous fuzzing platforms

Housekeeping:

Renamed LICENSE-2.0.txt → LICENSE and updated all references (pyproject.toml, Dockerfile, README.md)
Added # nosec B104 annotation in http_api.py for a false-positive security finding (host classification, not socket binding)
Fixed SQL query formatting in sqlite_store.py (line continuation)

Notable Details

Fuzzer uses atheris.instrument_imports() to enable coverage-guided fuzzing of CodeRAG's chunking logic
Release artifacts include .sigstore bundles for provenance verification via sigstore verify
Bandit runs with -ll (MEDIUM+ only) to reduce noise while catching real issues
All GitHub Actions use pinned commit SHAs for supply-chain security

https://claude.ai/code/session_01VgY3wMWzuBw6QFNivhXZYy

…zzing, signed releases) Address the low-scoring OpenSSF Scorecard checks (overall 6.5 at 075f9e0): Token-Permissions (0 -> 10): - docker-beta.yml: drop top-level write scopes to `contents: read`; move packages/id-token/attestations write to the build job only. - codeql.yml: declare a top-level `permissions: contents: read`. Fuzzing (0): - Add an Atheris fuzz target for the source chunker (the most exposed parser), plus ClusterFuzzLite config (.clusterfuzzlite/) and a bounded fuzz CI workflow. Signed-Releases: - Add release.yml: on a v* tag, build sdist+wheel, keyless-sign each with Sigstore (OIDC), and publish a GitHub Release with the .sigstore bundles. SAST (8): - Add a bandit job to security.yml; annotate two confirmed false positives (parameterized SQL IN-clause; a public-host classifier) with `# nosec`. License (9 -> 10): - Rename LICENSE-2.0.txt -> LICENSE (standard name) and update references in pyproject.toml, README.md, Dockerfile. https://claude.ai/code/session_01VgY3wMWzuBw6QFNivhXZYy

…t hang indexing The Atheris fuzz target (added in this PR) found a ~180-byte TypeScript input — mostly newlines with a few stray tokens — that drove the tree-sitter grammar's GLR error-recovery super-linear: a single parse ran for minutes and ballooned RSS past 2 GB. Indexing arbitrary repos must never let one hostile/garbled file hang or OOM the indexer. Fix: set a per-parse time budget (timeout_micros, 2s) on the cached tree-sitter parsers. A parse that blows the budget raises, and chunk_file falls back to line windows — its existing graceful-degradation contract. Real source parses in single-digit milliseconds, so the guard never trips on legitimate code. Regression tests in tests/test_chunking.py: assert every tree-sitter parser carries the budget, and that the exact fuzzer-found input degrades to windows (SIGALRM-bounded so a future regression fails fast instead of hanging). https://claude.ai/code/session_01VgY3wMWzuBw6QFNivhXZYy

claude added 2 commits June 16, 2026 19:13

Neverdecel merged commit 0144a0a into master Jun 16, 2026
13 checks passed

Neverdecel deleted the claude/kind-albattani-lrt3vf branch June 18, 2026 08:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add security hardening, fuzzing, and release automation#34

Add security hardening, fuzzing, and release automation#34
Neverdecel merged 2 commits into
masterfrom
claude/kind-albattani-lrt3vf

Neverdecel commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Neverdecel commented Jun 16, 2026

Summary

Key Changes

Notable Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants