feat: add CI-gated py_binary startup benchmark by xangcastle · Pull Request #1029 · aspect-build/rules_py

xangcastle · 2026-05-18T14:13:02Z

Summary

Introduces a reproducible, CI-gated benchmark that measures py_binary cold-start (launcher + Python interpreter) overhead. The workflow runs on every PR that touches py/private/** or benchmark/startup/**, compares the PR against HEAD main, and posts a sticky comment with results. If the PR regresses startup time by more than 10 % vs main, the check fails.

Motivation

rules_py performance is heavily dependent on the Bash launcher emitted by py_binary. Small changes to environment-setup logic, runfiles resolution, or interpreter flags can have outsized impact on user-visible startup time. Until now we had no automated way to detect these regressions before merge.

How it works

Three isolated slots are measured in the same CI job:
- BCR (aspect_rules_py from the Bazel Central Registry, pinned to 1.11.5) — shown as a historical baseline only.
- HEAD main — current main branch checked out side-by-side.
- This PR — the PR merge commit.
Build isolation — each slot uses its own --output_base (/tmp/bazel-{bcr,main,pr}) and an explicit empty --disk_cache= so there is zero cross-slot cache contamination.
Measurement — hyperfine --warmup 5 --runs 50 runs a no-op py_binary (main.py is just pass). Wall-clock time captures launcher overhead + Python startup; no custom instrumentation is injected into the launcher, which keeps the measurement representative of real user binaries.
Comparison — compare.py reads the three hyperfine JSON outputs plus optional *-build.json files (cold bazel build time) and emits a Markdown table.
Regression gate — the only gate that can block a PR is PR vs HEAD main (threshold: 10 %). Comparing against BCR is intentionally not used as a gate because transitive dependency versions drift between releases, which would attribute upstream changes to this project.

Files added

benchmark/startup/MODULE.bazel.template — template for generating the benchmark workspace.
benchmark/startup/generate_module.py — script that produces MODULE.bazel for either BCR or local_path_override mode.
benchmark/startup/BUILD.bazel / main.py — minimal no-op py_binary target.
benchmark/startup/compare.py — parses hyperfine JSON, computes deltas, prints Markdown table, gates on regression, and writes GITHUB_OUTPUT.
.github/workflows/startup-benchmark.yml — CI workflow (installs hyperfine, runs the three slots, posts PR comment).
docs/startup-benchmark.md — design doc and local run instructions.

Running locally

cd benchmark/startup

# BCR baseline
python3 generate_module.py bcr --version 1.11.5
bazel build //:bench
BIN=$(bazel cquery //:bench --output=starlark --starlark:expr='target.files_to_run.executable.path' | tail -n1)
hyperfine --warmup 5 --runs 50 --export-json /tmp/bcr.json "$BIN"

# Local checkout (current tree)
python3 generate_module.py local --path ../..
bazel build //:bench
BIN=$(bazel cquery //:bench --output=starlark --starlark:expr='target.files_to_run.executable.path' | tail -n1)
hyperfine --warmup 5 --runs 50 --export-json /tmp/local.json "$BIN"

python3 compare.py /tmp/bcr.json /tmp/local.json /tmp/local.json

Design decisions

No _BENCH_T0_NS injection: we intentionally avoided modifying the launcher to capture an internal timestamp. hyperfine wall-clock on a no-op binary is simpler, survives refactoring, and measures exactly what users feel.
PR vs main gating: BCR is displayed for context but never blocks, because BCR and main resolve different transitive dependency graphs (e.g. rules_python@1.0.0 vs @1.7.0), making BCR an invalid regression reference.
Isolated output bases: guarantees that the build graph and action cache of one slot cannot influence another.

aspect-workflows · 2026-05-18T14:26:47Z

Bazel 8 (Test)

All tests were cache hits

181 tests (100.0%) were fully cached saving 1m 5s.

Bazel 9 (Test)

All tests were cache hits

180 tests (100.0%) were fully cached saving 1m 7s.

Bazel 8 (Test)

e2e

All tests were cache hits

113 tests (100.0%) were fully cached saving 52s.

Bazel 9 (Test)

e2e

All tests were cache hits

107 tests (100.0%) were fully cached saving 58s.

Bazel 8 (Test)

examples/uv_pip_compile

All tests were cache hits

1 test (100.0%) was fully cached saving 444ms.

Buildifier Gazelle

github-actions · 2026-05-18T16:47:32Z

py_binary startup benchmark

Version	Mean (ms)	Median (ms)	± stddev	vs BCR	vs main	Build (s)
BCR 1.11.5 (baseline)	306.746	305.620	±6.201	—	—	41.09
HEAD main	128.797	127.705	±3.767	-58.0%	—	15.02
This PR	128.938	128.446	±1.667	-58.0%	+0.1%	11.74

Measured with hyperfine --warmup 5 --runs 50 on Linux
Gate: PR vs HEAD main (threshold: 10%). BCR is shown only as a historical baseline.
Build time: cold bazel build //:bench with isolated output base, no disk cache.

jbedard · 2026-05-19T18:41:57Z

+
+    steps:
+      - name: Checkout PR
+        uses: actions/checkout@v4


isn't there a more modern version?

I don't think so

I see actions/checkout@v6 all over the place...

github-actions · 2026-05-20T17:01:29Z

py_binary startup benchmark

Version	Mean (ms)	Median (ms)	± stddev	vs BCR	vs main	Build (s)
BCR 1.11.5 (baseline)	330.399	328.437	±8.244	—	—	38.70
HEAD main	138.087	137.869	±2.121	-58.2%	—	13.62
This PR	138.597	138.476	±2.880	-58.1%	+0.4%	11.20

Measured with hyperfine --warmup 5 --runs 50 on Linux
Gate: PR vs HEAD main (threshold: 10%). BCR is shown only as a historical baseline.
Build time: cold bazel build //:bench with isolated output base, no disk cache.

jbedard · 2026-05-20T17:46:39Z

Should we make this GHA manual and not do it on 100% of PRs? I wonder if most of the time it will just be noise and cause confusion for people and we'll only want it when we are interested?

xangcastle force-pushed the xangcastle/performance-benchmarks branch from b85404a to da35f85 Compare May 18, 2026 14:26

xangcastle force-pushed the xangcastle/performance-benchmarks branch 4 times, most recently from 8e060e5 to 5b6543e Compare May 18, 2026 16:45

xangcastle changed the title ~~feat: performance benchmarks~~ feat: add CI-gated py_binary startup benchmark May 19, 2026

xangcastle marked this pull request as ready for review May 19, 2026 16:11

jbedard reviewed May 19, 2026

View reviewed changes

Comment thread benchmark/startup/MODULE.bazel Outdated

xangcastle force-pushed the xangcastle/performance-benchmarks branch from 5b6543e to 06429c3 Compare May 20, 2026 15:50

performance benchmarks

fe38de1

xangcastle force-pushed the xangcastle/performance-benchmarks branch from 06429c3 to fe38de1 Compare May 20, 2026 16:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add CI-gated py_binary startup benchmark#1029

feat: add CI-gated py_binary startup benchmark#1029
xangcastle wants to merge 1 commit into
mainfrom
xangcastle/performance-benchmarks

xangcastle commented May 18, 2026 •

edited

Loading

Uh oh!

aspect-workflows Bot commented May 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 18, 2026 •

edited

Loading

Uh oh!

jbedard May 19, 2026

Uh oh!

xangcastle May 20, 2026

Uh oh!

jbedard May 20, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

jbedard commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

xangcastle commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

How it works

Files added

Running locally

Design decisions

Uh oh!

aspect-workflows Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

e2e

e2e

examples/uv_pip_compile

Buildifier Gazelle

Uh oh!

github-actions Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

py_binary startup benchmark

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 20, 2026

py_binary startup benchmark

Uh oh!

jbedard commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xangcastle commented May 18, 2026 •

edited

Loading

aspect-workflows Bot commented May 18, 2026 •

edited

Loading

github-actions Bot commented May 18, 2026 •

edited

Loading