Add coverage validation: 90% main-source line coverage gate (py_binary + py_test) by tinder-maxwellelliott · Pull Request #356 · Tinder/bazel-diff

tinder-maxwellelliott · 2026-05-18T19:26:10Z

Summary

Wires up a coverage gate using a Python py_binary + py_test so the enforcement logic itself is covered by tests. Bazel already produces an LCOV combined report on the test-jre21 jobs (bazel coverage --combined_report=lcov //cli/...) and uploads it as an artifact, but nothing read it — a PR could land that dropped coverage arbitrarily. This PR adds:

tools/coverage_check.py + //tools:coverage-check py_binary — parses the LCOV report, prints a per-file table sorted by coverage ascending, an overall summary, and exits non-zero if main-source line coverage is below $COVERAGE_THRESHOLD (default 90). Test sources and files with LF:0 (no instrumentation data) are filtered out so thin production-code coverage can't hide behind well-covered tests or placeholder records.
tools/coverage_check_test.py + //tools:coverage_check_test py_test — 19 unit tests against parse_lcov, filter_main_source, format_report, and main(), covering per-file extraction, non-line LCOV fields skipped, malformed payloads tolerated, multi-prefix include union, zero-line records dropped, threshold boundary inclusive, missing-file → exit 2, env-var threshold override, and stdin/stderr expectations of the main entry point.
make coverage / make coverage-check / make coverage-test — one-line invocations. make coverage runs bazel coverage then the binary; make coverage-check runs only the binary against an existing LCOV; make coverage-test runs the py_test.
CI: updated Run bazel-diff tests step to coverage-instrument //cli/... //tools:coverage_check_test, plus a new Enforce coverage threshold step that invokes the py_binary on every test-jre21 matrix entry (ubuntu/macos × bazel 8.x/9.x). The threshold step is placed AFTER the artifact upload so a failed threshold check still preserves the LCOV report.

Threshold set to 90 per the goal.

Why py_binary + py_test (revised from the prior bash-script version)

You asked for the enforcement to be Python via py_binary and to add a py_test so the enforcement logic itself has coverage. The earlier draft of this PR used bash + awk, which couldn't be cleanly unit-tested. Replacing it with Python gave us:

Importable units (parse_lcov, filter_main_source, format_report, main) that the py_test exercises directly without spawning subprocesses.
A py_library shared between :coverage-check (the runnable binary) and :coverage_check_test (the test target). With imports = ["."] on the library, the test imports coverage_check by its module name without needing a workspace-rooted path.
A BUILD_WORKING_DIRECTORY chdir hook in main() so bazel run //tools:coverage-check -- bazel-out/_coverage/... resolves the relative LCOV path against the user's cwd instead of the runfiles directory.

Why draft, still

CI on this PR will tell us the actual current main-source coverage — the data point I couldn't measure cleanly locally (E2ETest hangs on my machine due to a JDK-env sandbox issue, unrelated to this PR; I can run the rest, which shows main-source line coverage of 47.20% without E2E across //cli/... //tools:coverage_check_test). The CI run includes E2ETest, which exercises a lot of the integration paths (BazelQueryService.kt is 0.33% locally without E2E and likely much higher with it), so the real number will be higher — but very likely below 90%.

When the Enforce coverage threshold step fires on this PR, the log will print the per-file table and the overall % before exiting 1. From there we have three reasonable paths:

Land at 90% and add tests in follow-up PRs — pick the worst-covered files and write unit coverage for them. Keep the gate at 90; PRs that don't reach 90 stay red.
Land with a lower starting threshold and ratchet up — set $COVERAGE_THRESHOLD in the workflow to whatever CI reports today (rounded down a hair), get the gate green, then raise the number as tests land.
Refine scope — exclude CLI entry points (Main.kt, picocli command shells, DI wiring) from the denominator via --include.

I'd lean toward (2) for the merge, then (1) for the long game. Happy to do whichever you prefer.

Known limitation: Python coverage instrumentation

bazel coverage //tools:coverage_check_test emits LF:0/LH:0 for tools/coverage_check.py today — Bazel detects the file but doesn't gather data because the Python toolchain in MODULE.bazel doesn't have configure_coverage_tool = True set on the rules_python extension. The py_test still runs and asserts behavior of every public function, so the file IS covered by tests — just not numerically reported in LCOV. The threshold check filters out zero-line records so this doesn't tank the gate.

The day we configure the Python toolchain with coverage_tool, the file will start contributing real numerical coverage to the gate without further code changes; the default --include already lists tools/coverage_check.py.

Verification

bazel test //tools:coverage_check_test — 19/19 PASS.
bazel run //tools:coverage-check -- bazel-out/_coverage/_coverage_report.dat --threshold 40 — exit 0, prints overall 47.20%, "PASS".
bazel run //tools:coverage-check -- bazel-out/_coverage/_coverage_report.dat --threshold 90 — exit 1, prints "FAIL: coverage 47.20% is below threshold 90.00%."
bazel run //tools:generate-readme — regenerates README from template, in-sync output.
Did not modify MODULE.bazel.lock; reverted incidental churn before commit.

Test plan

19 py_test cases cover the parser, filter, formatter, and main() entry.
Threshold boundary inclusive (60% threshold against 60% coverage = PASS; 60.01% = FAIL).
Per-file table sorted by coverage ascending (worst-covered first).
Test sources (/test/) and zero-line records excluded from numerator and denominator.
Makefile targets work; bazel run //tools:coverage-check honors BUILD_WORKING_DIRECTORY.
README regenerated from template.
CI run on this PR reports the actual main-source coverage % including E2ETest.
Decide threshold strategy based on CI output.

🤖 Generated with Claude Code

Bazel already produced an LCOV combined report on the test-jre21 jobs (`bazel coverage --combined_report=lcov //cli/...`) and uploaded it as an artifact, but nothing read it -- a PR could land that dropped coverage arbitrarily. This wires up: * `tools/coverage_check.py` + `:coverage-check` py_binary -- parses the LCOV report, prints a per-file table sorted by coverage ascending and an overall summary, exits non-zero if main-source line coverage is below `$COVERAGE_THRESHOLD` (default 90). Test sources and files with `LF:0` (no instrumentation data) are filtered out so thin production-code coverage can't hide behind well-covered tests or placeholder records. * `tools/coverage_check_test.py` + `:coverage_check_test` py_test -- 19 unit tests against parse_lcov, filter_main_source, format_report, and main(), covering: per-file extraction, non-line LCOV fields skipped, malformed payloads tolerated, multi-prefix include union, zero-line records dropped, threshold boundary inclusive, missing LCOV file => exit 2, env-var threshold override. * `make coverage` / `make coverage-check` / `make coverage-test` -- one-line invocations. `make coverage` runs `bazel coverage` then the binary; `make coverage-check` runs only the binary against an existing LCOV; `make coverage-test` runs the py_test. * CI: the `Run bazel-diff tests` step now coverage-instruments `//cli/... //tools:coverage_check_test`, and a new `Enforce coverage threshold` step invokes the py_binary on every test-jre21 matrix entry. The threshold step is placed AFTER the artifact upload so a failed threshold check still preserves the LCOV report. The threshold is 90 per the issue. Coverage on this PR's test-jre21 matrix run is the source of truth for the current repo state. Implementation notes: - The enforcement logic is a `py_library` plus `py_binary` and `py_test` that depend on it via `imports = ["."]`, so the test imports the library by its module name without needing a workspace-rooted path. - The `BUILD_WORKING_DIRECTORY` chdir hook in coverage_check.main lets `bazel run //tools:coverage-check -- bazel-out/_coverage/...` resolve the relative LCOV path against the user's cwd instead of the runfiles directory. - Files with `LF:0` are dropped from the per-file table and the totals, not treated as 0% covered. Today bazel's Python coverage emits exactly that for `tools/coverage_check.py` (no `coverage_tool` configured on the Python toolchain) -- so the .py file's coverage is enforced via the py_test passing, not via LCOV numbers. Configuring the toolchain with `configure_coverage_tool = True` in MODULE.bazel would surface numerical Python coverage and the file would slot in automatically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI on PR #356 reported main-source coverage at 88.76% (1429 / 1610) -- 1.24 points short of the 90% threshold introduced in the previous commit. These five focused unit tests close the gap by covering the smallest files that were near-zero on coverage, all of which were thin enough to test in isolation without mocking out infrastructure: BazelTargetTypeTest enum constants exhaustively touched (0/6 -> 6/6) VersionProviderTest happy-path resource load (1/9 -> 6/9) BazelDiffTest run() throws + isVerbose() truth (2/6 -> 6/6) StderrLoggerTest all 4 overloads x verbose true+false (8/12 -> 12/12) BazelTargetTest three subclass shapes + every type discriminator branch (21/29 -> 28/29) Each test file is registered as a `kt_jvm_test` in cli/BUILD; they all run in <1s and join the existing `:cli-test-lib`. Notes: - Main.kt (0/4) is not addressed here: its only behaviour is `exitProcess(...)`, which kills the JVM and so cannot be exercised from in-process unit tests without a SecurityManager hack or a refactor to extract a testable `run(args): Int`. The E2E suite already invokes the built jar as a subprocess, but JaCoCo doesn't instrument across that boundary, so Main.kt stays at 0% for now. The threshold gate accommodates this: 6 + 5 + 4 + 4 + 7 = 26 lines added, projecting CI to (1429+26)/1610 = 90.37%, a hair past the bar. - VersionProvider's `throw IllegalArgumentException(...)` branch stays uncovered (3 unreachable lines) because the production code resolves the classloader from `this::class.java`, with no injection seam to swap it for a loader that doesn't contain `cli/version`. Exercising it would require either reflection-mutating the classloader or refactoring the resource lookup to take a `ClassLoader` parameter -- both out of scope here. - BazelTarget's last uncovered line is the `else -> BazelTargetType.UNKNOWN` branch, only reachable when `Build.Target.Discriminator` adds a new enum value the production code doesn't recognise. Triggering that today would require a hand-forged proto with a reserved discriminator number, which the protobuf builder rejects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The threshold step invokes `bazelisk run //tools:coverage-check`, which on macos-latest x bazel 9.x failed with: error: LCOV report not found at 'bazel-out/_coverage/_coverage_report.dat'. Hint: run 'bazel coverage --combined_report=lcov //cli/... //tools/...' first Root cause: USE_BAZEL_VERSION was set on the `Run bazel-diff tests` step but not propagated to the threshold step, so bazelisk fell back to the 8.5.1 pin in .bazelversion and started a fresh server. That second server re-symlinked bazel-out into its own output base, and the LCOV report produced by the 9.x server became unreachable through the bazel-out/_coverage/... path. Ubuntu 8.x/9.x didn't trip because the runners kept the same bazel server warm between steps -- but the dual-version dance is a real mismatch, so the macOS flake is the canonical bug. Setting USE_BAZEL_VERSION on the threshold step keeps both invocations on the same server. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…vement Two additive changes to the coverage gate that landed in the prior commits. 1. `tools/coverage_check.py --html DIR` (and `make coverage-html`) When passed `--html DIR`, the threshold check additionally shells out to `genhtml` and writes an annotated per-file HTML report there. The threshold gate still runs and still drives the exit code -- HTML is an extra artifact, not a replacement -- and it's emitted even when the gate fails so the below-threshold case (when the report is most useful for interactive investigation) is covered. Fails fast with a clear install hint if `genhtml` is missing from PATH (`brew install lcov` / `apt-get install lcov`); fails with the subprocess exit code if genhtml itself errors. Five new py_test cases (`HtmlReportTest`) mock `shutil.which` and `subprocess.run` so the tests don't require lcov to be installed in the test environment: - correct genhtml argv (LCOV file, --output-directory, --quiet) - clear error when genhtml is absent (exit 2, install-hint surfaced) - subprocess error code surfaced (exit 2, genhtml exit code reported) - HTML emitted even when the threshold check fails (exit 1 + HTML path printed) Output is added to .gitignore (`coverage-html/`). Total py_test count: 19 -> 24. 2. `.claude/skills/{coverage-status,improve-coverage}/SKILL.md` Two repo-local skill files so agents (Claude Code, etc.) know how to: - inspect current coverage state (`coverage-status`: what's the overall %, which files are below threshold, how to read the sorted table, how to produce an HTML report) - raise coverage to clear the gate (`improve-coverage`: how to pick high-leverage files, how to write a new kt_jvm_test, what won't work and why -- Main.kt's exitProcess, throw branches with no DI seam, unreachable enum else-branches) The improve-coverage skill cross-references the worked example from #356 itself (88.76% -> 90.37% by adding five small unit tests). .gitignore extended to keep .claude/settings.local.json and .claude/worktrees/ out of git while letting .claude/skills/ in -- precise patterns rather than a blanket `.claude/` ignore. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tinder-maxwellelliott force-pushed the claude/coverage-validation-system branch from d7c4b85 to 007a9cb Compare May 18, 2026 19:45

tinder-maxwellelliott changed the title ~~Add coverage validation: enforce >=90% main-source line coverage in CI~~ Add coverage validation: 90% main-source line coverage gate (py_binary + py_test) May 18, 2026

tinder-maxwellelliott marked this pull request as ready for review May 18, 2026 23:31

tinder-maxwellelliott and others added 2 commits May 18, 2026 19:44

tinder-maxwellelliott merged commit 830f09e into master May 19, 2026
15 checks passed

tinder-maxwellelliott deleted the claude/coverage-validation-system branch May 19, 2026 02:11

BrewTestBot mentioned this pull request May 19, 2026

bazel-diff 24.0.0 Homebrew/homebrew-core#283567

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add coverage validation: 90% main-source line coverage gate (py_binary + py_test)#356

Add coverage validation: 90% main-source line coverage gate (py_binary + py_test)#356
tinder-maxwellelliott merged 4 commits into
masterfrom
claude/coverage-validation-system

tinder-maxwellelliott commented May 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tinder-maxwellelliott commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why py_binary + py_test (revised from the prior bash-script version)

Why draft, still

Known limitation: Python coverage instrumentation

Verification

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tinder-maxwellelliott commented May 18, 2026 •

edited

Loading