ci: add Atheris fuzz targets and ClusterFuzzLite#11482
Draft
julian-risch wants to merge 3 commits into
Draft
Conversation
Address the OpenSSF Scorecard Fuzzing check (0/10) and add real fuzzing of the project's untrusted-input entry points. - test/fuzz/: three Atheris harnesses — Pipeline.loads (serialized pipeline deserialization), Document.from_dict, and document_matches_filter (filter expressions). Each catches the exceptions that are a normal reaction to malformed input so only genuine crashes/hangs/unexpected errors are reported. - .clusterfuzzlite/: Dockerfile + build.sh + project.yaml to build the harnesses with the OSS-Fuzz Python toolchain. - .github/workflows/cflite_pr.yml: short, code-change-scoped ClusterFuzzLite run on PRs that touch fuzzed code, least-privilege token, SHA-pinned actions. - licenserc.toml: exclude .clusterfuzzlite from the license-header check. Scorecard detects this via both the `import atheris` harnesses and the .clusterfuzzlite deployment. pytest does not collect fuzz_*.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
| return | ||
| try: | ||
| document_matches_filter(filters, _DOCUMENT) | ||
| except _EXPECTED: |
| """Feed one fuzzer-generated input to ``Pipeline.loads`` as a YAML document.""" | ||
| try: | ||
| Pipeline.loads(data) | ||
| except _EXPECTED: |
| return | ||
| try: | ||
| Document.from_dict(obj) | ||
| except _EXPECTED: |
Pin gcr.io/oss-fuzz-base/base-builder-python to its current digest instead of the rolling latest tag, for supply-chain integrity. The OSS-Fuzz base-builder is updated frequently, so the comment documents how to refresh the digest. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Add a docker ecosystem entry for /.clusterfuzzlite so Dependabot keeps the digest-pinned gcr.io/oss-fuzz-base/base-builder-python in Dockerfile up to date. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related Issues
Proposed Changes:
Adds fuzzing of Haystack's untrusted-input entry points, wired into CI. Scorecard detects this via two independent signals: the
import atherisharnesses (fuzzedWithPythonAtheris) and the.clusterfuzzlite/deployment (fuzzedWithClusterFuzzLite).Fuzz harnesses — test/fuzz/, using Atheris:
fuzz_pipeline_loads.pyPipeline.loadsfuzz_document_from_dict.pyDocument.from_dictDocumentfrom an untrusted dict.fuzz_filters.pydocument_matches_filterEach harness catches the exceptions that are a normal reaction to malformed input (
DeserializationError,FilterError,ValueError, …) so only genuine crashes, unbounded recursion, hangs, or unexpected exception types are reported. The "expected" lists can be tightened over time to surface subtler bugs.ClusterFuzzLite — .clusterfuzzlite/:
Dockerfile+build.sh+project.yamlbuild the harnesses with the OSS-Fuzz Python toolchain (compile_python_fuzzer).CI — .github/workflows/cflite_pr.yml: a short,
code-change-scoped run (180s) on PRs that touch fuzzed code or the fuzzing setup. Least-privilegecontents: readtoken, SHA-pinned ClusterFuzzLite actions, SARIF upload disabled (so nosecurity-events: writeneeded). Crashes fail the job and upload as artifacts.licenserc.toml: excludes
.clusterfuzzlitefrom the license-header check (consistent withdocker/.github).How did you test it?
pre-commit run --fileson all changed files — all hooks pass (ruff, actionlint, codespell, yaml).ruff check test/fuzz/clean).hawkeye check) locally: "No missing header file has been found."pytestdoes not collectfuzz_*.py(onlytest_*.py), and mypy's type-check scope does not includetest/fuzz/.Notes for the reviewer
.clusterfuzzlite/Dockerfilebase image (gcr.io/oss-fuzz-base/base-builder-python) is pinned by digest for supply-chain integrity. Dependabot tracks it via adockerecosystem entry for/.clusterfuzzlite, so digest bumps are automated; the Dockerfile also documents how to resolve a fresh digest manually if needed.cflite_prworkflow needs to exist on the default branch before it runs on PRs, so its first real exercise will be the next PR after this lands. I'd suggest watching that run.build.shhere are already compatible.Checklist
ci:).🤖 Generated with Claude Code