Skip to content

ci: add Atheris fuzz targets and ClusterFuzzLite#11482

Draft
julian-risch wants to merge 3 commits into
mainfrom
ci/add-fuzzing
Draft

ci: add Atheris fuzz targets and ClusterFuzzLite#11482
julian-risch wants to merge 3 commits into
mainfrom
ci/add-fuzzing

Conversation

@julian-risch
Copy link
Copy Markdown
Member

@julian-risch julian-risch commented Jun 2, 2026

Related Issues

Proposed Changes:

Adds fuzzing of Haystack's untrusted-input entry points, wired into CI. Scorecard detects this via two independent signals: the import atheris harnesses (fuzzedWithPythonAtheris) and the .clusterfuzzlite/ deployment (fuzzedWithClusterFuzzLite).

Fuzz harnessestest/fuzz/, using Atheris:

Harness Target Why
fuzz_pipeline_loads.py Pipeline.loads Deserializing a serialized pipeline (YAML) is a documented attack surface (see SECURITY.md).
fuzz_document_from_dict.py Document.from_dict Reconstructing a Document from an untrusted dict.
fuzz_filters.py document_matches_filter Evaluating an untrusted filter expression.

Each harness catches the exceptions that are a normal reaction to malformed input (DeserializationError, FilterError, ValueError, …) so only genuine crashes, unbounded recursion, hangs, or unexpected exception types are reported. The "expected" lists can be tightened over time to surface subtler bugs.

ClusterFuzzLite.clusterfuzzlite/: Dockerfile + build.sh + project.yaml build the harnesses with the OSS-Fuzz Python toolchain (compile_python_fuzzer).

CI.github/workflows/cflite_pr.yml: a short, code-change-scoped run (180s) on PRs that touch fuzzed code or the fuzzing setup. Least-privilege contents: read token, SHA-pinned ClusterFuzzLite actions, SARIF upload disabled (so no security-events: write needed). Crashes fail the job and upload as artifacts.

licenserc.toml: excludes .clusterfuzzlite from the license-header check (consistent with docker/.github).

How did you test it?

  • pre-commit run --files on all changed files — all hooks pass (ruff, actionlint, codespell, yaml).
  • Verified the harnesses against the project's ruff config (ruff check test/fuzz/ clean).
  • Ran the license-header check (hawkeye check) locally: "No missing header file has been found."
  • Confirmed pytest does not collect fuzz_*.py (only test_*.py), and mypy's type-check scope does not include test/fuzz/.

Notes for the reviewer

  • The .clusterfuzzlite/Dockerfile base image (gcr.io/oss-fuzz-base/base-builder-python) is pinned by digest for supply-chain integrity. Dependabot tracks it via a docker ecosystem entry for /.clusterfuzzlite, so digest bumps are automated; the Dockerfile also documents how to resolve a fresh digest manually if needed.
  • End-to-end CI validation only works after merge: the cflite_pr workflow needs to exist on the default branch before it runs on PRs, so its first real exercise will be the next PR after this lands. I'd suggest watching that run.
  • Atheris is intentionally not added to the dev dependencies (it builds a native extension); the README documents on-demand local install.
  • Optional future upgrade: submit to OSS-Fuzz for continuous fuzzing on Google's infra — the harnesses and build.sh here are already compatible.

Checklist

🤖 Generated with Claude Code

Address the OpenSSF Scorecard Fuzzing check (0/10) and add real fuzzing of the
project's untrusted-input entry points.

- test/fuzz/: three Atheris harnesses — Pipeline.loads (serialized pipeline
  deserialization), Document.from_dict, and document_matches_filter (filter
  expressions). Each catches the exceptions that are a normal reaction to
  malformed input so only genuine crashes/hangs/unexpected errors are reported.
- .clusterfuzzlite/: Dockerfile + build.sh + project.yaml to build the harnesses
  with the OSS-Fuzz Python toolchain.
- .github/workflows/cflite_pr.yml: short, code-change-scoped ClusterFuzzLite run
  on PRs that touch fuzzed code, least-privilege token, SHA-pinned actions.
- licenserc.toml: exclude .clusterfuzzlite from the license-header check.

Scorecard detects this via both the `import atheris` harnesses and the
.clusterfuzzlite deployment. pytest does not collect fuzz_*.py.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
haystack-docs Ignored Ignored Preview Jun 2, 2026 11:03am

Request Review

Comment thread test/fuzz/fuzz_filters.py
return
try:
document_matches_filter(filters, _DOCUMENT)
except _EXPECTED:
"""Feed one fuzzer-generated input to ``Pipeline.loads`` as a YAML document."""
try:
Pipeline.loads(data)
except _EXPECTED:
return
try:
Document.from_dict(obj)
except _EXPECTED:
Pin gcr.io/oss-fuzz-base/base-builder-python to its current digest instead of
the rolling latest tag, for supply-chain integrity. The OSS-Fuzz base-builder
is updated frequently, so the comment documents how to refresh the digest.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Coverage report

This PR does not seem to contain any modification to coverable code.

Add a docker ecosystem entry for /.clusterfuzzlite so Dependabot keeps the
digest-pinned gcr.io/oss-fuzz-base/base-builder-python in Dockerfile up to date.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant