Skip to content

feat: merge-train/barretenberg#22147

Open
AztecBot wants to merge 6 commits intonextfrom
merge-train/barretenberg
Open

feat: merge-train/barretenberg#22147
AztecBot wants to merge 6 commits intonextfrom
merge-train/barretenberg

Conversation

@AztecBot
Copy link
Copy Markdown
Collaborator

@AztecBot AztecBot commented Mar 30, 2026

BEGIN_COMMIT_OVERRIDE
chore: bench phase breakdown + thread sweep for MSM reduction (#21885)
END_COMMIT_OVERRIDE

## Context

This PR addresses `AztecProtocol/barretenberg#1656` by making the
`MSM::batch_multi_scalar_mul(...)` phase breakdown measurable and by
adding a benchmark case that targets the exact regime called out in the
issue (`2^16` points with `256` threads).

The goal is to answer: is the single-threaded final reduction
(`accumulate_results`) actually a bottleneck relative to the full MSM?

## What Changed

### 1) Phase Breakdown (`BB_BENCH`) inside `batch_multi_scalar_mul`

Added `BB_BENCH` scopes for:
- `MSM::batch_multi_scalar_mul/evaluate_work_units`
- `MSM::batch_multi_scalar_mul/accumulate_results`
- `MSM::batch_multi_scalar_mul/batch_normalize`
- `MSM::batch_multi_scalar_mul/scalars_to_montgomery`

These are surfaced in google-benchmark output via
`GOOGLE_BB_BENCH_REPORTER(state)`.

### 2) New Benchmark Case: `BatchMSM_1656`

Added a dedicated benchmark:
- `PippengerBench/BatchMSM_1656/256/{msm_size}`
- Single MSM (`num_polys = 1`)
- Sizes:
  - `msm_size ∈ {2^16, 2^20}`

This uses `bb::set_parallel_for_concurrency(256)` to force the intended
partitioning and restores the original value after the benchmark
finishes.

## Results (Local)

Machine: 4 vCPU laptop (so `256 threads` is intentionally
oversubscribed; the key point is the *absolute* reduction overhead).

Key takeaways (from `BB_BENCH` counters; time counters are nanoseconds):
- `2^16, 256 threads`:
  - `accumulate_results ≈ 139k ns` (~0.139 ms)
  - total `≈ 521 ms`
  - reduction fraction `~0.027%`
- `2^20, 256 threads`:
  - `accumulate_results ≈ 141k ns` (~0.141 ms)
  - total `≈ 3457 ms`
  - reduction fraction `~0.004%`

So the final reduction appears negligible in these regimes on my setup,
and the benchmark now makes it easy to validate on a 64+ core machine
where the original concern is most relevant.

## Notes / Background

- `MSM` here is a Pippenger-style MSM. For background reading on
multiexponentiation / multi-product methods:
- Nicholas Pippenger, *On the evaluation of powers and related problems*
(1976).
- Jurjen Bos and Matthijs Coster, *Addition Chain Heuristics* (CRYPTO
1989).
- Ryan Henry, *Pippenger's Multiproduct and Multiexponentiation
Algorithms* (2010).

## How To Run

```
cmake --build <build-dir> --target pippenger_bench
CRS_PATH=<path-to-bn254_g1.dat> <build-dir>/bin/pippenger_bench \
  --benchmark_filter='BatchMSM_1656' \
  --benchmark_min_time=0.1s \
  --benchmark_counters_tabular=true
```

Fixes AztecProtocol/barretenberg#1656

---------

Co-authored-by: Peter <peter@users.noreply.github.com>
Co-authored-by: Jonathan Hao <jonathan@aztec-labs.com>
Copy link
Copy Markdown
Collaborator

@ludamad ludamad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Auto-approved

@AztecBot
Copy link
Copy Markdown
Collaborator Author

🤖 Auto-merge enabled after 4 hours of inactivity. This PR will be merged automatically once all checks pass.

@AztecBot AztecBot added this pull request to the merge queue Mar 30, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants