Skip to content

Add coverage validation: 90% main-source line coverage gate (py_binary + py_test)#356

Merged
tinder-maxwellelliott merged 4 commits into
masterfrom
claude/coverage-validation-system
May 19, 2026
Merged

Add coverage validation: 90% main-source line coverage gate (py_binary + py_test)#356
tinder-maxwellelliott merged 4 commits into
masterfrom
claude/coverage-validation-system

Conversation

@tinder-maxwellelliott
Copy link
Copy Markdown
Collaborator

@tinder-maxwellelliott tinder-maxwellelliott commented May 18, 2026

Summary

Wires up a coverage gate using a Python py_binary + py_test so the enforcement logic itself is covered by tests. Bazel already produces an LCOV combined report on the test-jre21 jobs (bazel coverage --combined_report=lcov //cli/...) and uploads it as an artifact, but nothing read it — a PR could land that dropped coverage arbitrarily. This PR adds:

  • tools/coverage_check.py + //tools:coverage-check py_binary — parses the LCOV report, prints a per-file table sorted by coverage ascending, an overall summary, and exits non-zero if main-source line coverage is below $COVERAGE_THRESHOLD (default 90). Test sources and files with LF:0 (no instrumentation data) are filtered out so thin production-code coverage can't hide behind well-covered tests or placeholder records.
  • tools/coverage_check_test.py + //tools:coverage_check_test py_test — 19 unit tests against parse_lcov, filter_main_source, format_report, and main(), covering per-file extraction, non-line LCOV fields skipped, malformed payloads tolerated, multi-prefix include union, zero-line records dropped, threshold boundary inclusive, missing-file → exit 2, env-var threshold override, and stdin/stderr expectations of the main entry point.
  • make coverage / make coverage-check / make coverage-test — one-line invocations. make coverage runs bazel coverage then the binary; make coverage-check runs only the binary against an existing LCOV; make coverage-test runs the py_test.
  • CI: updated Run bazel-diff tests step to coverage-instrument //cli/... //tools:coverage_check_test, plus a new Enforce coverage threshold step that invokes the py_binary on every test-jre21 matrix entry (ubuntu/macos × bazel 8.x/9.x). The threshold step is placed AFTER the artifact upload so a failed threshold check still preserves the LCOV report.

Threshold set to 90 per the goal.

Why py_binary + py_test (revised from the prior bash-script version)

You asked for the enforcement to be Python via py_binary and to add a py_test so the enforcement logic itself has coverage. The earlier draft of this PR used bash + awk, which couldn't be cleanly unit-tested. Replacing it with Python gave us:

  • Importable units (parse_lcov, filter_main_source, format_report, main) that the py_test exercises directly without spawning subprocesses.
  • A py_library shared between :coverage-check (the runnable binary) and :coverage_check_test (the test target). With imports = ["."] on the library, the test imports coverage_check by its module name without needing a workspace-rooted path.
  • A BUILD_WORKING_DIRECTORY chdir hook in main() so bazel run //tools:coverage-check -- bazel-out/_coverage/... resolves the relative LCOV path against the user's cwd instead of the runfiles directory.

Why draft, still

CI on this PR will tell us the actual current main-source coverage — the data point I couldn't measure cleanly locally (E2ETest hangs on my machine due to a JDK-env sandbox issue, unrelated to this PR; I can run the rest, which shows main-source line coverage of 47.20% without E2E across //cli/... //tools:coverage_check_test). The CI run includes E2ETest, which exercises a lot of the integration paths (BazelQueryService.kt is 0.33% locally without E2E and likely much higher with it), so the real number will be higher — but very likely below 90%.

When the Enforce coverage threshold step fires on this PR, the log will print the per-file table and the overall % before exiting 1. From there we have three reasonable paths:

  1. Land at 90% and add tests in follow-up PRs — pick the worst-covered files and write unit coverage for them. Keep the gate at 90; PRs that don't reach 90 stay red.
  2. Land with a lower starting threshold and ratchet up — set $COVERAGE_THRESHOLD in the workflow to whatever CI reports today (rounded down a hair), get the gate green, then raise the number as tests land.
  3. Refine scope — exclude CLI entry points (Main.kt, picocli command shells, DI wiring) from the denominator via --include.

I'd lean toward (2) for the merge, then (1) for the long game. Happy to do whichever you prefer.

Known limitation: Python coverage instrumentation

bazel coverage //tools:coverage_check_test emits LF:0/LH:0 for tools/coverage_check.py today — Bazel detects the file but doesn't gather data because the Python toolchain in MODULE.bazel doesn't have configure_coverage_tool = True set on the rules_python extension. The py_test still runs and asserts behavior of every public function, so the file IS covered by tests — just not numerically reported in LCOV. The threshold check filters out zero-line records so this doesn't tank the gate.

The day we configure the Python toolchain with coverage_tool, the file will start contributing real numerical coverage to the gate without further code changes; the default --include already lists tools/coverage_check.py.

Verification

  • bazel test //tools:coverage_check_test19/19 PASS.
  • bazel run //tools:coverage-check -- bazel-out/_coverage/_coverage_report.dat --threshold 40 — exit 0, prints overall 47.20%, "PASS".
  • bazel run //tools:coverage-check -- bazel-out/_coverage/_coverage_report.dat --threshold 90 — exit 1, prints "FAIL: coverage 47.20% is below threshold 90.00%."
  • bazel run //tools:generate-readme — regenerates README from template, in-sync output.
  • Did not modify MODULE.bazel.lock; reverted incidental churn before commit.

Test plan

  • 19 py_test cases cover the parser, filter, formatter, and main() entry.
  • Threshold boundary inclusive (60% threshold against 60% coverage = PASS; 60.01% = FAIL).
  • Per-file table sorted by coverage ascending (worst-covered first).
  • Test sources (/test/) and zero-line records excluded from numerator and denominator.
  • Makefile targets work; bazel run //tools:coverage-check honors BUILD_WORKING_DIRECTORY.
  • README regenerated from template.
  • CI run on this PR reports the actual main-source coverage % including E2ETest.
  • Decide threshold strategy based on CI output.

🤖 Generated with Claude Code

Bazel already produced an LCOV combined report on the test-jre21 jobs
(`bazel coverage --combined_report=lcov //cli/...`) and uploaded it as an
artifact, but nothing read it -- a PR could land that dropped coverage
arbitrarily. This wires up:

  * `tools/coverage_check.py` + `:coverage-check` py_binary -- parses the
    LCOV report, prints a per-file table sorted by coverage ascending
    and an overall summary, exits non-zero if main-source line coverage
    is below `$COVERAGE_THRESHOLD` (default 90). Test sources and files
    with `LF:0` (no instrumentation data) are filtered out so thin
    production-code coverage can't hide behind well-covered tests or
    placeholder records.
  * `tools/coverage_check_test.py` + `:coverage_check_test` py_test --
    19 unit tests against parse_lcov, filter_main_source, format_report,
    and main(), covering: per-file extraction, non-line LCOV fields
    skipped, malformed payloads tolerated, multi-prefix include union,
    zero-line records dropped, threshold boundary inclusive, missing
    LCOV file => exit 2, env-var threshold override.
  * `make coverage` / `make coverage-check` / `make coverage-test` --
    one-line invocations. `make coverage` runs `bazel coverage` then the
    binary; `make coverage-check` runs only the binary against an
    existing LCOV; `make coverage-test` runs the py_test.
  * CI: the `Run bazel-diff tests` step now coverage-instruments
    `//cli/... //tools:coverage_check_test`, and a new `Enforce coverage
    threshold` step invokes the py_binary on every test-jre21 matrix
    entry. The threshold step is placed AFTER the artifact upload so a
    failed threshold check still preserves the LCOV report.

The threshold is 90 per the issue. Coverage on this PR's test-jre21
matrix run is the source of truth for the current repo state.

Implementation notes:
- The enforcement logic is a `py_library` plus `py_binary` and `py_test`
  that depend on it via `imports = ["."]`, so the test imports the
  library by its module name without needing a workspace-rooted path.
- The `BUILD_WORKING_DIRECTORY` chdir hook in coverage_check.main lets
  `bazel run //tools:coverage-check -- bazel-out/_coverage/...` resolve
  the relative LCOV path against the user's cwd instead of the runfiles
  directory.
- Files with `LF:0` are dropped from the per-file table and the totals,
  not treated as 0% covered. Today bazel's Python coverage emits exactly
  that for `tools/coverage_check.py` (no `coverage_tool` configured on
  the Python toolchain) -- so the .py file's coverage is enforced via
  the py_test passing, not via LCOV numbers. Configuring the toolchain
  with `configure_coverage_tool = True` in MODULE.bazel would surface
  numerical Python coverage and the file would slot in automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tinder-maxwellelliott tinder-maxwellelliott force-pushed the claude/coverage-validation-system branch from d7c4b85 to 007a9cb Compare May 18, 2026 19:45
@tinder-maxwellelliott tinder-maxwellelliott changed the title Add coverage validation: enforce >=90% main-source line coverage in CI Add coverage validation: 90% main-source line coverage gate (py_binary + py_test) May 18, 2026
CI on PR #356 reported main-source coverage at 88.76% (1429 / 1610) -- 1.24
points short of the 90% threshold introduced in the previous commit. These
five focused unit tests close the gap by covering the smallest files that
were near-zero on coverage, all of which were thin enough to test in
isolation without mocking out infrastructure:

  BazelTargetTypeTest      enum constants exhaustively touched (0/6 -> 6/6)
  VersionProviderTest      happy-path resource load           (1/9 -> 6/9)
  BazelDiffTest            run() throws + isVerbose() truth   (2/6 -> 6/6)
  StderrLoggerTest         all 4 overloads x verbose true+false (8/12 -> 12/12)
  BazelTargetTest          three subclass shapes + every type
                           discriminator branch              (21/29 -> 28/29)

Each test file is registered as a `kt_jvm_test` in cli/BUILD; they all run
in <1s and join the existing `:cli-test-lib`.

Notes:
- Main.kt (0/4) is not addressed here: its only behaviour is
  `exitProcess(...)`, which kills the JVM and so cannot be exercised from
  in-process unit tests without a SecurityManager hack or a refactor to
  extract a testable `run(args): Int`. The E2E suite already invokes the
  built jar as a subprocess, but JaCoCo doesn't instrument across that
  boundary, so Main.kt stays at 0% for now. The threshold gate accommodates
  this: 6 + 5 + 4 + 4 + 7 = 26 lines added, projecting CI to
  (1429+26)/1610 = 90.37%, a hair past the bar.
- VersionProvider's `throw IllegalArgumentException(...)` branch stays
  uncovered (3 unreachable lines) because the production code resolves the
  classloader from `this::class.java`, with no injection seam to swap it
  for a loader that doesn't contain `cli/version`. Exercising it would
  require either reflection-mutating the classloader or refactoring the
  resource lookup to take a `ClassLoader` parameter -- both out of scope
  here.
- BazelTarget's last uncovered line is the `else -> BazelTargetType.UNKNOWN`
  branch, only reachable when `Build.Target.Discriminator` adds a new
  enum value the production code doesn't recognise. Triggering that today
  would require a hand-forged proto with a reserved discriminator number,
  which the protobuf builder rejects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tinder-maxwellelliott tinder-maxwellelliott marked this pull request as ready for review May 18, 2026 23:31
tinder-maxwellelliott and others added 2 commits May 18, 2026 19:44
The threshold step invokes `bazelisk run //tools:coverage-check`, which on
macos-latest x bazel 9.x failed with:

    error: LCOV report not found at 'bazel-out/_coverage/_coverage_report.dat'.
    Hint: run 'bazel coverage --combined_report=lcov //cli/... //tools/...' first

Root cause: USE_BAZEL_VERSION was set on the `Run bazel-diff tests` step
but not propagated to the threshold step, so bazelisk fell back to the
8.5.1 pin in .bazelversion and started a fresh server. That second server
re-symlinked bazel-out into its own output base, and the LCOV report
produced by the 9.x server became unreachable through the
bazel-out/_coverage/... path.

Ubuntu 8.x/9.x didn't trip because the runners kept the same bazel
server warm between steps -- but the dual-version dance is a real
mismatch, so the macOS flake is the canonical bug. Setting
USE_BAZEL_VERSION on the threshold step keeps both invocations on the
same server.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…vement

Two additive changes to the coverage gate that landed in the prior commits.

1. `tools/coverage_check.py --html DIR` (and `make coverage-html`)

   When passed `--html DIR`, the threshold check additionally shells out to
   `genhtml` and writes an annotated per-file HTML report there. The
   threshold gate still runs and still drives the exit code -- HTML is an
   extra artifact, not a replacement -- and it's emitted even when the
   gate fails so the below-threshold case (when the report is most useful
   for interactive investigation) is covered.

   Fails fast with a clear install hint if `genhtml` is missing from PATH
   (`brew install lcov` / `apt-get install lcov`); fails with the
   subprocess exit code if genhtml itself errors. Five new py_test cases
   (`HtmlReportTest`) mock `shutil.which` and `subprocess.run` so the
   tests don't require lcov to be installed in the test environment:
     - correct genhtml argv (LCOV file, --output-directory, --quiet)
     - clear error when genhtml is absent (exit 2, install-hint surfaced)
     - subprocess error code surfaced (exit 2, genhtml exit code reported)
     - HTML emitted even when the threshold check fails (exit 1 + HTML
       path printed)

   Output is added to .gitignore (`coverage-html/`).

   Total py_test count: 19 -> 24.

2. `.claude/skills/{coverage-status,improve-coverage}/SKILL.md`

   Two repo-local skill files so agents (Claude Code, etc.) know how to:
     - inspect current coverage state (`coverage-status`: what's the
       overall %, which files are below threshold, how to read the
       sorted table, how to produce an HTML report)
     - raise coverage to clear the gate (`improve-coverage`: how to
       pick high-leverage files, how to write a new kt_jvm_test, what
       won't work and why -- Main.kt's exitProcess, throw branches with
       no DI seam, unreachable enum else-branches)

   The improve-coverage skill cross-references the worked example from
   #356 itself (88.76% -> 90.37% by adding five small unit tests).

   .gitignore extended to keep .claude/settings.local.json and
   .claude/worktrees/ out of git while letting .claude/skills/ in --
   precise patterns rather than a blanket `.claude/` ignore.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tinder-maxwellelliott tinder-maxwellelliott merged commit 830f09e into master May 19, 2026
15 checks passed
@tinder-maxwellelliott tinder-maxwellelliott deleted the claude/coverage-validation-system branch May 19, 2026 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant