Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
f30f54c
feat: add ECFFT2 domain precomputation script for BaseFold
Mar 27, 2026
46134a2
feat: group-valued BaseFold native prover + verifier (eprint 2025/132…
Mar 27, 2026
f23b521
refactor: move BaseFold from commitment_schemes/ to stdlib/msm_verifi…
Mar 27, 2026
9056847
docs: thorough documentation for BaseFold + circuit cost benchmark
Mar 27, 2026
8d7d515
docs: add OPTIMIZATIONS.md with recursive verifier cost analysis
Mar 27, 2026
2224fb4
docs: add NATIVE_OPTIMIZATIONS.md with proof size reduction analysis
Mar 27, 2026
0f20f72
refactor: benchmark α,β fold reformulation, document why it's slower
Mar 28, 2026
0184ca0
refactor: template BaseFold verifier on Curve for native + recursive …
Mar 28, 2026
d56c7c1
feat: recursive BaseFold verifier with concrete gate count measurement
Mar 28, 2026
56f31bb
feat: full-size recursive verifier benchmark (3.47M gates for 2^15 MSM)
Mar 28, 2026
178e8f4
refactor: remove binary domain from git, auto-generate on first test run
Mar 28, 2026
ab1f123
docs: correct optimization analysis against concrete 3.47M gate measu…
Mar 28, 2026
00b5e22
refactor: move msm_verification from stdlib/ to top-level barretenberg/
Mar 28, 2026
ea0a199
fix: check basefold_final matches last fold result (soundness fix)
Mar 28, 2026
cb313e9
fix: use conditional_assign for Merkle branch direction (fixed circuit)
Mar 28, 2026
9dcc301
docs: update gate counts after conditional_assign Merkle fix
Mar 28, 2026
394747a
feat: measure blowup 16 and 32 — 3.25M gates at blowup 32
Mar 28, 2026
9dd4114
docs: update PROBLEMS.md with accurate gate counts and ROM cost analysis
Mar 28, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions barretenberg/cpp/src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ add_subdirectory(barretenberg/flavor)
add_subdirectory(barretenberg/goblin)
add_subdirectory(barretenberg/goblin_avm)
add_subdirectory(barretenberg/grumpkin_srs_gen)
add_subdirectory(barretenberg/msm_verification)
add_subdirectory(barretenberg/multilinear_batching)
add_subdirectory(barretenberg/numeric)
add_subdirectory(barretenberg/op_queue)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
barretenberg_module(msm_verification common transcript ecc numeric crypto_merkle_tree)
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ecfft_domain_2_*.bin
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
# BaseFold Native Proof Size: Optimization Analysis

This document analyzes the native proof size of the BaseFold protocol and
identifies concrete optimizations to reduce it.

## Baseline

Parameters: 2^15 MSM over Grumpkin, blowup factor 8, domain size 2^18,
18 fold rounds, 43 queries (~128-bit security).

### Where the bytes go

```
Fixed:
- 18 Merkle roots: 18 × 32 = 576 bytes
- 1 final group element (x, y): 2 × 32 = 64 bytes
Fixed total: 640 bytes

Per query, per round r (oracle size 2^{18-r}, Merkle depth = 18-r):
- 2 group element openings (P_r, Q_r): 2 × 64 = 128 bytes
- 2 Merkle sibling paths: 2 × (18-r) × 32 bytes
- 1 fold result (F_r): 64 = 64 bytes
Per round subtotal: 192 + 64·(18-r) bytes

Per query total = Σ_{r=0}^{17} [192 + 64·(18-r)]
= 18 × 192 + 64 × (18 + 17 + ... + 1)
= 3,456 + 64 × 171
= 14,400 bytes

43 queries × 14,400 = 619,200 bytes
Grand total: 640 + 619,200 = 619,840 bytes ≈ 605 KiB
```

### Breakdown by category

| Category | Size | Share |
|---------------------|----------|-------|
| Merkle paths | 460 KiB | 74% |
| Group elements | 148 KiB | 24% |
| Fixed (roots+final) | 0.6 KiB | <1% |
| **Total** | **605 KiB** | 100% |

Merkle paths dominate at 74% of the proof.

---

## Optimization 1: eliminate redundant fold results (-48 KiB)

The fold result F_r at pair index j in round r becomes oracle[r+1][j], which the
verifier already opens as one of the pair elements at round r+1. Specifically,
when the query traces from round r to round r+1, the fold output at index j is
opened as P_{r+1} or Q_{r+1} in the next round's Merkle opening.

So F_r need not be sent separately — it is already present in the proof as an
opened element of round r+1. The verifier just needs to know which element of
the next pair corresponds to the fold result (determined by the query index trace).

```
Savings: 43 queries × 18 rounds × 64 bytes = 49,536 bytes ≈ 48 KiB
```

**Result: ~557 KiB.**

This is a pure protocol simplification with no security impact.

---

## Optimization 2: paired Merkle paths (-254 KiB)

The two openings per round are at indices j and j + half, which are **siblings
at the bottom level** of the Merkle tree (they share the same parent). Their
Merkle paths of depth d therefore share d-1 sibling nodes.

The verifier knows both leaf values (P_r, Q_r), so it can:
1. Hash both leaves: h_left = hash(P_r), h_right = hash(Q_r)
2. Compute the parent: parent = hash(h_left, h_right)
3. Walk the common path from parent to root using d-1 siblings

Instead of sending 2 × d siblings, the prover sends only d-1 siblings.

```
Current Merkle per query: Σ_{r=0}^{17} 2·(18-r)·32 = 10,944 bytes
Optimized per query: Σ_{r=0}^{17} (18-r-1)·32 = 32 × (17+16+...+0) = 4,896 bytes
Savings per query: 10,944 - 4,896 = 6,048 bytes
Total savings: 43 × 6,048 = 260,064 bytes ≈ 254 KiB
```

**Result: ~303 KiB** (after Opt 1+2).

This works because the protocol always opens pairs — it never opens a single
element in isolation.

Note: this also saves circuit gates in the recursive verifier (~16% savings,
see OPTIMIZATIONS.md).

---

## Optimization 3: x-only group elements (-50 KiB)

Each Grumpkin affine point is (x, y) = 64 bytes. If the Merkle tree commits
to hash(x) instead of hash(x, y), the prover only needs to send x (32 bytes)
plus a sign bit for y. The verifier recovers y from the curve equation:

```
y² = x³ + b (b is the Grumpkin curve constant)
y = ±√(x³ + b) (sign bit disambiguates)
```

In the native verifier this is a field square root + comparison (cheap). In the
recursive verifier, it's a constraint y² = x³ + b (3 gates — essentially free).

After Opt 1 (eliminating F_r), each query opens 2 group elements per round:

```
Current: 43 × 18 × 2 × 64 = 98,304 bytes ≈ 96 KiB
X-only: 43 × 18 × 2 × 33 = 51,084 bytes ≈ 50 KiB (32 bytes + 1 sign bit, rounded up)
Savings: ≈ 46 KiB
```

**Result: ~257 KiB** (after Opt 1+2+3).

Gate count impact: negligible. The on-curve check (y² = x³ + b) costs 3 gates,
and Poseidon2(x) costs the same 73 gates as Poseidon2(x, y) (both fit in one
permutation). The circuit cost benchmark confirmed hash(x,y) and hash(x-only)
give essentially identical gate counts.

---

## Optimization 4: batch Merkle opening (FRI-style, -50+ KiB)

When multiple queries open the same Merkle tree, their paths share upper-level
siblings. A batch opening proof (as used in Plonky2 and standard FRI
implementations) deduplicates these shared nodes.

For q queries opening 2q leaves in a depth-d tree, the batch proof sends only
the minimal set of tree nodes needed to reconstruct all q roots:

```
Worst case (no sharing): 2q × d nodes
Best case (full sharing): 2q + d nodes
Expected for random queries: roughly 2q × d - (overlap savings)
```

For 86 leaves (43 queries × 2) in a depth-18 tree, the expected overlap at
level k is:

```
Probability two paths share a level-k node ≈ 86 / 2^{18-k}
Significant sharing begins at level k ≈ 18 - log2(86) ≈ 11
Levels 0-10: essentially no sharing (all 86 nodes unique)
Levels 11-18: increasing sharing, saving ~1 node per level per collision
```

Estimated savings: ~30% of Merkle path data in early rounds (large trees),
less in later rounds (trees are already small).

```
Estimated total Merkle after batch opening: ~140 KiB (vs ~210 KiB with Opt 2)
Additional savings: ~70 KiB
```

**Result: ~190 KiB** (after Opt 1+2+3+4).

This is more complex to implement (requires a batch Merkle proof structure)
but is well-understood from FRI implementations.

---

## Optimization 5: increase blowup factor

Blowup 16 (= 2^4) gives 4 bits of security per query, requiring only
128/4 = 32 queries instead of 43. Trade-offs:

- Domain grows from 2^18 to 2^19 (one more fold round)
- Prover does 2× more work (larger oracle to commit)
- Query-proportional proof data shrinks by 32/43 ≈ 74%

```
Proof size with blowup 16 (after Opt 1+2+3): 32/43 × query_data + fixed
≈ 0.74 × (257 - 0.6) + 0.6 ≈ 190 KiB
```

**Result: ~190 KiB** (after Opt 1+2+3+5, comparable to Opt 4).

Blowup 32 (= 2^5): 26 queries, domain 2^20. Further reduction to ~155 KiB
but prover overhead grows 4× and the domain precomputation becomes heavier.

---

## Summary

| Configuration | Proof size | Savings from baseline |
|-----------------------------------------------|------------|----------------------|
| Baseline (blowup 8, 43 queries) | **605 KiB** | — |
| + Opt 1: remove redundant F_r | 557 KiB | 48 KiB |
| + Opt 2: paired Merkle paths | 303 KiB | 302 KiB (cumulative) |
| + Opt 3: x-only group elements | 257 KiB | 348 KiB |
| + Opt 4: batch Merkle opening | ~190 KiB | ~415 KiB |
| + Opt 5: blowup 16 (32 queries) | ~150 KiB | ~455 KiB |

Opts 1 and 2 are easy to implement and give the biggest bang (605 → 303 KiB).
Opt 3 is straightforward. Opt 4 requires more implementation effort but is
standard FRI machinery. Opt 5 is a parameter choice with prover cost trade-offs.

All optimizations are compatible with each other and with the recursive verifier
circuit optimizations described in OPTIMIZATIONS.md.
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# BaseFold Recursive Verifier: Cost Analysis

This document records the circuit cost of the BaseFold recursive verifier
and analyzes potential optimizations.

## Concrete measurement

Parameters: 2^15 MSM over Grumpkin, blowup factor 8 = 2^3, domain size 2^18,
18 fold rounds, 43 queries (~128-bit security).

Measured by `basefold_circuit_cost.test.cpp::FullSizeRecursiveVerifier`, which
builds the actual `RecursiveBaseFoldVerifier<UltraCircuitBuilder>` circuit:

| Metric | Value |
|-------------------------------|---------------|
| **Total gates** | **4,600,813** |
| **Log2(gates)** | **22.13** |
| Gates per query | 106,995 |
| Gates per query per round | 5,944 |
| Native proof size | 605 KiB |
| Raw batch_mul MSM (comparison)| ~12M gates |
| **Improvement over raw MSM** | **~2.6×** |

Note: the Merkle path verification uses `conditional_assign` to compute BOTH
hash orderings at each level (needed for a fixed circuit — see PROBLEMS.md).
This doubles the Merkle hashing cost compared to a branching implementation.

### Cost breakdown per query per round

Each round for each query does:
1. **Fold check**: 4 group operations (3 constant-scalar muls + 1 witness-scalar mul)
2. **Merkle verification**: 2 paths, each of depth = (18 - round), with 2×
Poseidon2 hashes per level (both orderings + conditional_assign)

Average Merkle depth across 18 rounds: (18 + 17 + ... + 1) / 18 = 9.5.
Average Merkle cost per round: 2 paths × (1 leaf hash + 9.5 × 2 path hashes) × ~74 gates ≈ 2,960 gates.
Average fold cost per round: 5,944 - 2,960 ≈ **2,984 gates**.

### Isolated per-operation costs (for reference only)

These were measured by constructing each operation in its own fresh circuit.
They significantly overestimate the cost in a real circuit:

| Component | Gates (isolated) |
|---------------------------------|------------------|
| Fold check, e > 0 (4 ops) | 6,513 |
| Fold check, e = 0 (2 ops) | 5,111 |
| Merkle path, depth 18 | 1,407 |
| Merkle path, depth 1 | 149 |

---

## Potential optimizations (circuit cost)

### Optimization 1: reduce number of queries (increase blowup)

**Impact: linear reduction in gates and proof size. Easy to implement.**

| Blowup | Bits/query | Queries | Rounds | Estimated gates | Proof size |
|--------|-----------|---------|--------|----------------|------------|
| 8 | 3 | 43 | 18 | 4.60M (measured)| 605 KiB |
| 16 | 4 | 32 | 19 | ~3.6M | ~470 KiB |
| 32 | 5 | 26 | 20 | ~3.0M | ~400 KiB |

Trade-off: larger blowup → fewer queries (cheaper verifier) but larger initial
domain (more prover work, bigger precomputed domain data, one more fold round).
Since prover work is native and one-time, this favors the recursive setting.

### Optimization 2: single-hash Merkle paths (if witness-dependent topology is OK)

**Impact: ~1.1M gate savings (~24% of total).**

The current implementation computes BOTH Poseidon2 hash orderings at each Merkle
level and selects with `conditional_assign`. If the domain lookup issue (see
PROBLEMS.md) is resolved via ROM tables, the Merkle index bits would be proper
circuit witnesses and we could use a single conditional hash instead of two.

This would bring the gate count back to ~3.5M.

### Optimization 3: paired Merkle paths

**Impact: reduces BOTH proof size (-254 KiB) AND gate count (~10-15% savings).**

The two openings per round are siblings at the bottom level of the Merkle tree.
Instead of 2 independent paths, send 1 common path from the parent to the root.
The verifier hashes both leaves to get the parent, then walks one shared path.

Saves: (d+1) Poseidon2 calls per round per query (or (d+1)×2 with the current
double-hash approach).

### Optimization 4: eliminate redundant fold result openings

**Impact: reduces proof size by ~48 KiB. Small gate savings.**

The fold result F_r at round r is already opened as one of the pair elements
at round r+1. Removing the redundant send saves 43 × 18 × 64 bytes ≈ 48 KiB.

---

## Potential optimizations (proof size only)

See NATIVE_OPTIMIZATIONS.md for detailed analysis. Summary:

| Optimization | Proof size savings |
|---------------------------------------|-------------------|
| Remove redundant F_r | 48 KiB |
| Paired Merkle paths | 254 KiB |
| X-only group elements | 46 KiB |
| Batch Merkle opening (FRI-style) | ~70 KiB |
| Increase blowup (fewer queries) | proportional |

---

## What does NOT help (surprising finding)

**The α,β reformulation makes things WORSE, not better.**

The fold formula can be algebraically rewritten as:

```
result = G_0 · α + G_1 · β
where α = s_0^{-e} · (s_1 - z) / (s_1 - s_0)
β = s_1^{-e} · (z - s_0) / (s_1 - s_0)
```

This looks like it should halve the cost: 2 scalar muls instead of 4.
We benchmarked this (isolated) and found:

| Formulation | Scalar muls | Field arith | Total gates |
|--------------------|-------------|-------------|-------------|
| Original (4 ops) | 6,513 | 0 | **6,513** |
| α,β (2 muls) | 5,111 | 5,122 | **10,233** |

The α,β version is **57% more expensive**. The reasons:

1. **Non-native field arithmetic is expensive.** α and β live in Fq (BN254
base field), which is non-native in a BN254 circuit. Computing them requires
`bigfield` multiplication (CRT reduction + range checks) at ~2,500 gates per
mul.

2. **Constant-scalar muls are cheap.** In the original formulation, 3 of the 4
group operations multiply a witness point by a **constant** scalar.
`cycle_group`'s Straus implementation bakes constant scalars directly into
ROM table entries.

3. **Witness-scalar muls are expensive.** In the α,β version, both muls use
witness scalars, forcing full variable-base Straus.

**Takeaway**: in the Grumpkin-in-BN254 circuit, optimizing the number of group
operations at the expense of introducing non-native field arithmetic is a bad
trade. The bottleneck is the non-native field, not the number of group ops.

---

## Notes for integration

- **SRS generators**: In production, the group elements will come from the Aztec
Grumpkin SRS, loaded via `srs::init_file_crs_factory` / `CommitmentKey<curve::Grumpkin>`.

- **ECFFT domain binary**: The log_n=18 domain data (~25 MB) is NOT checked into
git. The test generates it on first run via `ecfft_precompute.py` (~2 min).

- **Origin tags**: The recursive verifier uses a "native hint" approach to avoid
origin tag conflicts. Needs refactoring for production.

- **Fixed circuit**: See PROBLEMS.md for remaining witness-dependent topology
issues and their estimated cost to fix (~300K additional gates).
Loading
Loading