[v0.5.0] benchmark — public token/correctness/latency harness vs docs MCPs (human-led)#70
Conversation
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (2)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Refs #63
What changed
docs/benchmarks/PUBLIC-BENCHMARK-METHODOLOGY.mdas the human-led methodology foundation for the v0.5.0 public benchmark.Refs #63work packages so agents can later implement plumbing without owning methodology judgment.pyjwttransitive dependency from2.12.1to2.13.0after GitHub's dependency audit flaggedPYSEC-2026-175,PYSEC-2026-177,PYSEC-2026-178, andPYSEC-2026-179on the initial PR run.Acceptance notes
pyjwttransitive dependency.Validation
uv run ruff check src/ tests/-> passeduv run pyright src/-> passed, with upstream pyright update warning onlyuv run pytest --tb=short -q-> 307 passeduv run python-docs-mcp-server doctor-> all checks passeduv lock --check-> passeduv export --locked --format requirements-txt --all-groups --all-extras --no-emit-project --no-hashes --output-file /tmp/requirements-audit-63.txt && uvx pip-audit --requirement /tmp/requirements-audit-63.txt --no-deps --disable-pip --progress-spinner off-> no known vulnerabilities founduv pip compile --quiet pyproject.toml -o /tmp/requirements-check-63.txt-> passedWhy this approach
Issue #63 explicitly says the benchmark is human-led because methodology and corpus selection are maintainer judgment calls. This PR handles that judgment first and leaves the runnable harness as follow-up implementation work.