Summary
We upgraded bazel-diff 18.0.0 → 18.1.0. Our CI uses a base graph produced by a prior main commit (still on 18.0.0) and a head graph produced by the upgrading commit (now on 18.1.0). get-impacted-targets then spawned ~5,000 serial bazel query rdeps(//..., @@<repo>//...) subprocesses and took ~3 hours to complete. It was called twice per workflow, so CI took ~7 hours end-to-end.
The cause is #330's stderr redirect fix interacting with #314's module-change detection - not #331's exec-bit hash or the SourceFileHasher format change as I suggested earlier.
Root cause
#330 (shipped in v18.1.0) changed BazelModService:
- stderr = Redirect.CAPTURE,
+ stderr = Redirect.SILENT,
That is correct — the old CAPTURE+CAPTURE combination hit captureAll in Process.kt, which calls redirectErrorStream(true), merging stderr into the captured stdout. So getModuleGraphJson() on 18.0.x returned "<INFO: Checking for file changes...>\n...<actual JSON>..." whenever bazel emitted anything on stderr during the mod graph --output=json call. The new version returns clean JSON.
The #330 commit message notes this explicitly: "Occasionally I see INFO: Checking for file changes...\n in the metadata when generating hashes, which is extra useless info that causes bazel-diff to think the whole graph has changed."
The fix shipped without a migration path for existing base graphs produced by the old version. In CalculateImpactedTargetsInteractor.execute():
- fromModuleGraphJson != toModuleGraphJson on line 44 > always true across the 18.0.x > 18.1.0 boundary (polluted string ≠ clean string).
- ModuleGraphParser.parseModuleGraph(fromGraph) - the stderr prefix makes the whole string invalid JSON;
try { ... } catch (e: Exception) {
return emptyMap()
}
- parseModuleGraph(toGraph) parses cleanly to the full module map (~90 modules for us).
- findChangedModules(emptyMap, fullMap) reports every module in the new graph as "added".
- queryTargetsDependingOnModules(90 moduleKeys, allTargets) then substring-matches each module name against allTargets.keys at CalculateImpactedTargetsInteractor.kt:293. In a workspace with many module extensions, this is a very loose match: "aspect_bazel_lib".contains(...) matches aspect_bazel_lib+, aspect_bazel_lib++toolchains+bats_toolchains, aspect_bazel_lib++toolchains+..., etc. For us that produces ~5,000 matching repo patterns from ~4,500 materialised external repos.
- Each match spawns one
bazel query rdeps(//..., @@<repo>//...) subprocess. JVM + bazel-client-connect ≈ 2s each, serial. ~5,000 × 2s = ~3h per get-impacted-targets call.
Most of those spawns log Loading: 0 packages loaded; the repo pattern doesn't resolve and the query fails instantly.
Potential fixes
In rough priority order:
- Make ModuleGraphParser.parseModuleGraph robust to non-JSON prefix/suffix. Strip or locate the JSON object within the captured string instead of failing the whole parse. This would have let 18.1.0 handle a polluted 18.0.x base graph correctly.
- Fall back to computeSimpleImpactedTargets when parse asymmetry is detected. If
fromModuleGraphJson != toModuleGraphJson but one side parses to empty, treating every module as "added" is far more expensive than the per-target hash diff would have been. computeSimpleImpactedTargets on a near-total invalidation finished in 8 min in our workspace.
- Tighten the module > repo match at
CalculateImpactedTargetsInteractor.kt:292-295. The current allTargets.keys.filter { it.contains(moduleName) } is O(modules x targets) with a very loose substring predicate. On workspaces with many module extensions (ours has ~4,500 canonical external repos for ~90 declared modules), a single "changed" module name can match thousands of canonical repo names. Match by canonical repo key or apparent name derived from bazel mod dump_repo_mapping instead. BazelQueryService.discoverRepoMapping already knows how to produce this.
- Union rdeps queries. If N modules need rdeps computed, one rdeps(//..., @@a//... + @@b//... + ... + @@n//...) costs roughly the same as one query; N separate subprocesses cost N × startup overhead.
- Add a formatVersion field to the hashes JSON metadata. Independent of the above, this would let future serialisation changes quickly fall back to
computeSimpleImpactedTargets with a fail fast clear error or instead of silently triggering pathological code paths.
Environment
- bazel-diff 18.0.0 → 18.1.0 (only change)
- Bazel 8.6.0, bzlmod, RBE with --remote_download_minimal
- ~95k targets in the graph, 90 declared bzlmod modules, 4,500+ canonical external repos materialised under
$output_base/external/
- Base graphs stored in cloud object storage between runs
## Misc
Summary
We upgraded
bazel-diff18.0.0 → 18.1.0. Our CI uses a base graph produced by a prior main commit (still on 18.0.0) and a head graph produced by the upgrading commit (now on 18.1.0).get-impacted-targetsthen spawned ~5,000 serialbazel query rdeps(//..., @@<repo>//...)subprocesses and took ~3 hours to complete. It was called twice per workflow, so CI took ~7 hours end-to-end.The cause is #330's stderr redirect fix interacting with #314's module-change detection - not #331's exec-bit hash or the
SourceFileHasherformat change as I suggested earlier.Root cause
#330 (shipped in v18.1.0) changed
BazelModService:That is correct — the old CAPTURE+CAPTURE combination hit captureAll in Process.kt, which calls
redirectErrorStream(true), merging stderr into the captured stdout. SogetModuleGraphJson()on 18.0.x returned"<INFO: Checking for file changes...>\n...<actual JSON>..."whenever bazel emitted anything on stderr during the mod graph--output=jsoncall. The new version returns clean JSON.The #330 commit message notes this explicitly: "Occasionally I see INFO: Checking for file changes...\n in the metadata when generating hashes, which is extra useless info that causes bazel-diff to think the whole graph has changed."
The fix shipped without a migration path for existing base graphs produced by the old version. In
CalculateImpactedTargetsInteractor.execute():bazel query rdeps(//..., @@<repo>//...)subprocess. JVM + bazel-client-connect ≈ 2s each, serial. ~5,000 × 2s = ~3h per get-impacted-targets call.Most of those spawns log
Loading: 0 packages loaded; the repo pattern doesn't resolve and the query fails instantly.Potential fixes
In rough priority order:
fromModuleGraphJson != toModuleGraphJsonbut one side parses to empty, treating every module as "added" is far more expensive than the per-target hash diff would have been. computeSimpleImpactedTargets on a near-total invalidation finished in 8 min in our workspace.CalculateImpactedTargetsInteractor.kt:292-295. The currentallTargets.keys.filter { it.contains(moduleName) }is O(modules x targets) with a very loose substring predicate. On workspaces with many module extensions (ours has ~4,500 canonical external repos for ~90 declared modules), a single "changed" module name can match thousands of canonical repo names. Match by canonical repo key or apparent name derived from bazel mod dump_repo_mapping instead. BazelQueryService.discoverRepoMapping already knows how to produce this.computeSimpleImpactedTargetswith a fail fast clear error or instead of silently triggering pathological code paths.Environment
$output_base/external/## Misc