AztecProtocol · notnotraju · Mar 27, 2026 · Mar 27, 2026 · Mar 27, 2026 · Mar 27, 2026
diff --git a/barretenberg/cpp/src/CMakeLists.txt b/barretenberg/cpp/src/CMakeLists.txt
@@ -98,6 +98,7 @@ add_subdirectory(barretenberg/flavor)
 add_subdirectory(barretenberg/goblin)
 add_subdirectory(barretenberg/goblin_avm)
 add_subdirectory(barretenberg/grumpkin_srs_gen)
+add_subdirectory(barretenberg/msm_verification)
 add_subdirectory(barretenberg/multilinear_batching)
 add_subdirectory(barretenberg/numeric)
 add_subdirectory(barretenberg/op_queue)

diff --git a/barretenberg/cpp/src/barretenberg/msm_verification/CMakeLists.txt b/barretenberg/cpp/src/barretenberg/msm_verification/CMakeLists.txt
@@ -0,0 +1 @@
+barretenberg_module(msm_verification common transcript ecc numeric crypto_merkle_tree)
diff --git a/barretenberg/cpp/src/barretenberg/msm_verification/basefold/.gitignore b/barretenberg/cpp/src/barretenberg/msm_verification/basefold/.gitignore
@@ -0,0 +1 @@
+ecfft_domain_2_*.bin
diff --git a/...etenberg/cpp/src/barretenberg/msm_verification/basefold/NATIVE_OPTIMIZATIONS.md b/...etenberg/cpp/src/barretenberg/msm_verification/basefold/NATIVE_OPTIMIZATIONS.md
@@ -0,0 +1,206 @@
+# BaseFold Native Proof Size: Optimization Analysis
+
+This document analyzes the native proof size of the BaseFold protocol and
+identifies concrete optimizations to reduce it.
+
+## Baseline
+
+Parameters: 2^15 MSM over Grumpkin, blowup factor 8, domain size 2^18,
+18 fold rounds, 43 queries (~128-bit security).
+
+### Where the bytes go
+
+```
+Fixed:
+  - 18 Merkle roots:                      18 × 32 =      576 bytes
+  - 1 final group element (x, y):          2 × 32 =       64 bytes
+  Fixed total:                                           640 bytes
+
+Per query, per round r (oracle size 2^{18-r}, Merkle depth = 18-r):
+  - 2 group element openings (P_r, Q_r):  2 × 64 =      128 bytes
+  - 2 Merkle sibling paths:               2 × (18-r) × 32 bytes
+  - 1 fold result (F_r):                      64 =        64 bytes
+  Per round subtotal:                      192 + 64·(18-r) bytes
+
+Per query total = Σ_{r=0}^{17} [192 + 64·(18-r)]
+               = 18 × 192  +  64 × (18 + 17 + ... + 1)
+               = 3,456  +  64 × 171
+               = 14,400 bytes
+
+43 queries × 14,400 = 619,200 bytes
+Grand total: 640 + 619,200 = 619,840 bytes ≈ 605 KiB
+```
+
+### Breakdown by category
+
+| Category            | Size     | Share |
+|---------------------|----------|-------|
+| Merkle paths        | 460 KiB  |  74%  |
+| Group elements      | 148 KiB  |  24%  |
+| Fixed (roots+final) | 0.6 KiB  |  <1%  |
+| **Total**           | **605 KiB** | 100% |
+
+Merkle paths dominate at 74% of the proof.
+
+---
+
+## Optimization 1: eliminate redundant fold results (-48 KiB)
+
+The fold result F_r at pair index j in round r becomes oracle[r+1][j], which the
+verifier already opens as one of the pair elements at round r+1.  Specifically,
+when the query traces from round r to round r+1, the fold output at index j is
+opened as P_{r+1} or Q_{r+1} in the next round's Merkle opening.
+
+So F_r need not be sent separately — it is already present in the proof as an
+opened element of round r+1.  The verifier just needs to know which element of
+the next pair corresponds to the fold result (determined by the query index trace).
+
+```
+Savings: 43 queries × 18 rounds × 64 bytes = 49,536 bytes ≈ 48 KiB
+```
+
+**Result: ~557 KiB.**
+
+This is a pure protocol simplification with no security impact.
+
+---
+
+## Optimization 2: paired Merkle paths (-254 KiB)
+
+The two openings per round are at indices j and j + half, which are **siblings
+at the bottom level** of the Merkle tree (they share the same parent).  Their
+Merkle paths of depth d therefore share d-1 sibling nodes.
+
+The verifier knows both leaf values (P_r, Q_r), so it can:
+1. Hash both leaves: h_left = hash(P_r), h_right = hash(Q_r)
+2. Compute the parent: parent = hash(h_left, h_right)
+3. Walk the common path from parent to root using d-1 siblings
+
+Instead of sending 2 × d siblings, the prover sends only d-1 siblings.
+
+```
+Current Merkle per query:    Σ_{r=0}^{17} 2·(18-r)·32  =  10,944 bytes
+Optimized per query:         Σ_{r=0}^{17} (18-r-1)·32   =  32 × (17+16+...+0) = 4,896 bytes
+Savings per query: 10,944 - 4,896 = 6,048 bytes
+Total savings: 43 × 6,048 = 260,064 bytes ≈ 254 KiB
+```
+
+**Result: ~303 KiB** (after Opt 1+2).
+
+This works because the protocol always opens pairs — it never opens a single
+element in isolation.
+
+Note: this also saves circuit gates in the recursive verifier (~16% savings,
+see OPTIMIZATIONS.md).
+
+---
+
+## Optimization 3: x-only group elements (-50 KiB)
+
+Each Grumpkin affine point is (x, y) = 64 bytes.  If the Merkle tree commits
+to hash(x) instead of hash(x, y), the prover only needs to send x (32 bytes)
+plus a sign bit for y.  The verifier recovers y from the curve equation:
+
+```
+y² = x³ + b      (b is the Grumpkin curve constant)
+y = ±√(x³ + b)   (sign bit disambiguates)
+```
+
+In the native verifier this is a field square root + comparison (cheap).  In the
+recursive verifier, it's a constraint y² = x³ + b (3 gates — essentially free).
+
+After Opt 1 (eliminating F_r), each query opens 2 group elements per round:
+
+```
+Current: 43 × 18 × 2 × 64 = 98,304 bytes ≈ 96 KiB
+X-only:  43 × 18 × 2 × 33 = 51,084 bytes ≈ 50 KiB  (32 bytes + 1 sign bit, rounded up)
+Savings: ≈ 46 KiB
+```
+
+**Result: ~257 KiB** (after Opt 1+2+3).
+
+Gate count impact: negligible.  The on-curve check (y² = x³ + b) costs 3 gates,
+and Poseidon2(x) costs the same 73 gates as Poseidon2(x, y) (both fit in one
+permutation).  The circuit cost benchmark confirmed hash(x,y) and hash(x-only)
+give essentially identical gate counts.
+
+---
+
+## Optimization 4: batch Merkle opening (FRI-style, -50+ KiB)
+
+When multiple queries open the same Merkle tree, their paths share upper-level
+siblings.  A batch opening proof (as used in Plonky2 and standard FRI
+implementations) deduplicates these shared nodes.
+
+For q queries opening 2q leaves in a depth-d tree, the batch proof sends only
+the minimal set of tree nodes needed to reconstruct all q roots:
+
+```
+Worst case (no sharing): 2q × d nodes
+Best case (full sharing): 2q + d nodes
+Expected for random queries: roughly 2q × d - (overlap savings)
+```
+
+For 86 leaves (43 queries × 2) in a depth-18 tree, the expected overlap at
+level k is:
+
+```
+Probability two paths share a level-k node ≈ 86 / 2^{18-k}
+Significant sharing begins at level k ≈ 18 - log2(86) ≈ 11
+Levels 0-10: essentially no sharing (all 86 nodes unique)
+Levels 11-18: increasing sharing, saving ~1 node per level per collision
+```
+
+Estimated savings: ~30% of Merkle path data in early rounds (large trees),
+less in later rounds (trees are already small).
+
+```
+Estimated total Merkle after batch opening: ~140 KiB (vs ~210 KiB with Opt 2)
+Additional savings: ~70 KiB
+```
+
+**Result: ~190 KiB** (after Opt 1+2+3+4).
+
+This is more complex to implement (requires a batch Merkle proof structure)
+but is well-understood from FRI implementations.
+
+---
+
+## Optimization 5: increase blowup factor
+
+Blowup 16 (= 2^4) gives 4 bits of security per query, requiring only
+128/4 = 32 queries instead of 43.  Trade-offs:
+
+- Domain grows from 2^18 to 2^19 (one more fold round)
+- Prover does 2× more work (larger oracle to commit)
+- Query-proportional proof data shrinks by 32/43 ≈ 74%
+
+```
+Proof size with blowup 16 (after Opt 1+2+3): 32/43 × query_data + fixed
+  ≈ 0.74 × (257 - 0.6) + 0.6 ≈ 190 KiB
+```
+
+**Result: ~190 KiB** (after Opt 1+2+3+5, comparable to Opt 4).
+
+Blowup 32 (= 2^5): 26 queries, domain 2^20.  Further reduction to ~155 KiB
+but prover overhead grows 4× and the domain precomputation becomes heavier.
+
+---
+
+## Summary
+
+| Configuration                                 | Proof size | Savings from baseline |
+|-----------------------------------------------|------------|----------------------|
+| Baseline (blowup 8, 43 queries)              | **605 KiB** | —                    |
+| + Opt 1: remove redundant F_r                | 557 KiB    | 48 KiB               |
+| + Opt 2: paired Merkle paths                  | 303 KiB    | 302 KiB (cumulative) |
+| + Opt 3: x-only group elements                | 257 KiB    | 348 KiB              |
+| + Opt 4: batch Merkle opening                  | ~190 KiB   | ~415 KiB             |
+| + Opt 5: blowup 16 (32 queries)               | ~150 KiB   | ~455 KiB             |
+
+Opts 1 and 2 are easy to implement and give the biggest bang (605 → 303 KiB).
+Opt 3 is straightforward.  Opt 4 requires more implementation effort but is
+standard FRI machinery.  Opt 5 is a parameter choice with prover cost trade-offs.
+
+All optimizations are compatible with each other and with the recursive verifier
+circuit optimizations described in OPTIMIZATIONS.md.
diff --git a/barretenberg/cpp/src/barretenberg/msm_verification/basefold/OPTIMIZATIONS.md b/barretenberg/cpp/src/barretenberg/msm_verification/basefold/OPTIMIZATIONS.md
@@ -0,0 +1,167 @@
+# BaseFold Recursive Verifier: Cost Analysis
+
+This document records the circuit cost of the BaseFold recursive verifier
+and analyzes potential optimizations.
+
+## Concrete measurement
+
+Parameters: 2^15 MSM over Grumpkin, blowup factor 8 = 2^3, domain size 2^18,
+18 fold rounds, 43 queries (~128-bit security).
+
+Measured by `basefold_circuit_cost.test.cpp::FullSizeRecursiveVerifier`, which
+builds the actual `RecursiveBaseFoldVerifier<UltraCircuitBuilder>` circuit:
+
+| Metric                        | Value         |
+|-------------------------------|---------------|
+| **Total gates**               | **4,600,813** |
+| **Log2(gates)**               | **22.13**     |
+| Gates per query               | 106,995       |
+| Gates per query per round     | 5,944         |
+| Native proof size             | 605 KiB       |
+| Raw batch_mul MSM (comparison)| ~12M gates    |
+| **Improvement over raw MSM**  | **~2.6×**     |
+
+Note: the Merkle path verification uses `conditional_assign` to compute BOTH
+hash orderings at each level (needed for a fixed circuit — see PROBLEMS.md).
+This doubles the Merkle hashing cost compared to a branching implementation.
+
+### Cost breakdown per query per round
+
+Each round for each query does:
+1. **Fold check**: 4 group operations (3 constant-scalar muls + 1 witness-scalar mul)
+2. **Merkle verification**: 2 paths, each of depth = (18 - round), with 2×
+   Poseidon2 hashes per level (both orderings + conditional_assign)
+
+Average Merkle depth across 18 rounds: (18 + 17 + ... + 1) / 18 = 9.5.
+Average Merkle cost per round: 2 paths × (1 leaf hash + 9.5 × 2 path hashes) × ~74 gates ≈ 2,960 gates.
+Average fold cost per round: 5,944 - 2,960 ≈ **2,984 gates**.
+
+### Isolated per-operation costs (for reference only)
+
+These were measured by constructing each operation in its own fresh circuit.
+They significantly overestimate the cost in a real circuit:
+
+| Component                       | Gates (isolated) |
+|---------------------------------|------------------|
+| Fold check, e > 0 (4 ops)      | 6,513            |
+| Fold check, e = 0 (2 ops)      | 5,111            |
+| Merkle path, depth 18           | 1,407            |
+| Merkle path, depth 1            | 149              |
+
+---
+
+## Potential optimizations (circuit cost)
+
+### Optimization 1: reduce number of queries (increase blowup)
+
+**Impact: linear reduction in gates and proof size.  Easy to implement.**
+
+| Blowup | Bits/query | Queries | Rounds | Estimated gates | Proof size |
+|--------|-----------|---------|--------|----------------|------------|
+| 8      | 3         | 43      | 18     | 4.60M (measured)| 605 KiB    |
+| 16     | 4         | 32      | 19     | ~3.6M           | ~470 KiB   |
+| 32     | 5         | 26      | 20     | ~3.0M           | ~400 KiB   |
+
+Trade-off: larger blowup → fewer queries (cheaper verifier) but larger initial
+domain (more prover work, bigger precomputed domain data, one more fold round).
+Since prover work is native and one-time, this favors the recursive setting.
+
+### Optimization 2: single-hash Merkle paths (if witness-dependent topology is OK)
+
+**Impact: ~1.1M gate savings (~24% of total).**
+
+The current implementation computes BOTH Poseidon2 hash orderings at each Merkle
+level and selects with `conditional_assign`.  If the domain lookup issue (see
+PROBLEMS.md) is resolved via ROM tables, the Merkle index bits would be proper
+circuit witnesses and we could use a single conditional hash instead of two.
+
+This would bring the gate count back to ~3.5M.
+
+### Optimization 3: paired Merkle paths
+
+**Impact: reduces BOTH proof size (-254 KiB) AND gate count (~10-15% savings).**
+
+The two openings per round are siblings at the bottom level of the Merkle tree.
+Instead of 2 independent paths, send 1 common path from the parent to the root.
+The verifier hashes both leaves to get the parent, then walks one shared path.
+
+Saves: (d+1) Poseidon2 calls per round per query (or (d+1)×2 with the current
+double-hash approach).
+
+### Optimization 4: eliminate redundant fold result openings
+
+**Impact: reduces proof size by ~48 KiB.  Small gate savings.**
+
+The fold result F_r at round r is already opened as one of the pair elements
+at round r+1.  Removing the redundant send saves 43 × 18 × 64 bytes ≈ 48 KiB.
+
+---
+
+## Potential optimizations (proof size only)
+
+See NATIVE_OPTIMIZATIONS.md for detailed analysis.  Summary:
+
+| Optimization                          | Proof size savings |
+|---------------------------------------|-------------------|
+| Remove redundant F_r                  | 48 KiB            |
+| Paired Merkle paths                    | 254 KiB           |
+| X-only group elements                  | 46 KiB            |
+| Batch Merkle opening (FRI-style)       | ~70 KiB           |
+| Increase blowup (fewer queries)        | proportional      |
+
+---
+
+## What does NOT help (surprising finding)
+
+**The α,β reformulation makes things WORSE, not better.**
+
+The fold formula can be algebraically rewritten as:
+
+```
+result = G_0 · α  +  G_1 · β
+where  α = s_0^{-e} · (s_1 - z) / (s_1 - s_0)
+       β = s_1^{-e} · (z - s_0) / (s_1 - s_0)
+```
+
+This looks like it should halve the cost: 2 scalar muls instead of 4.
+We benchmarked this (isolated) and found:
+
+| Formulation        | Scalar muls | Field arith | Total gates |
+|--------------------|-------------|-------------|-------------|
+| Original (4 ops)   | 6,513       | 0           | **6,513**   |
+| α,β (2 muls)       | 5,111       | 5,122       | **10,233**  |
+
+The α,β version is **57% more expensive**.  The reasons:
+
+1. **Non-native field arithmetic is expensive.**  α and β live in Fq (BN254
+   base field), which is non-native in a BN254 circuit.  Computing them requires
+   `bigfield` multiplication (CRT reduction + range checks) at ~2,500 gates per
+   mul.
+
+2. **Constant-scalar muls are cheap.**  In the original formulation, 3 of the 4
+   group operations multiply a witness point by a **constant** scalar.
+   `cycle_group`'s Straus implementation bakes constant scalars directly into
+   ROM table entries.
+
+3. **Witness-scalar muls are expensive.**  In the α,β version, both muls use
+   witness scalars, forcing full variable-base Straus.
+
+**Takeaway**: in the Grumpkin-in-BN254 circuit, optimizing the number of group
+operations at the expense of introducing non-native field arithmetic is a bad
+trade.  The bottleneck is the non-native field, not the number of group ops.
+
+---
+
+## Notes for integration
+
+- **SRS generators**: In production, the group elements will come from the Aztec
+  Grumpkin SRS, loaded via `srs::init_file_crs_factory` / `CommitmentKey<curve::Grumpkin>`.
+
+- **ECFFT domain binary**: The log_n=18 domain data (~25 MB) is NOT checked into
+  git.  The test generates it on first run via `ecfft_precompute.py` (~2 min).
+
+- **Origin tags**: The recursive verifier uses a "native hint" approach to avoid
+  origin tag conflicts.  Needs refactoring for production.
+
+- **Fixed circuit**: See PROBLEMS.md for remaining witness-dependent topology
+  issues and their estimated cost to fix (~300K additional gates).
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		barretenberg_module(msm_verification common transcript ecc numeric crypto_merkle_tree)