Skip to content

Process tables sequentially to reduce peak memory#299

Closed
gabrielbosio wants to merge 13 commits into
mainfrom
seq-proving-memory-opt
Closed

Process tables sequentially to reduce peak memory#299
gabrielbosio wants to merge 13 commits into
mainfrom
seq-proving-memory-opt

Conversation

@gabrielbosio

Copy link
Copy Markdown
Collaborator

Instead of collecting all LDE evaluations upfront across all tables, process each table sequentially: recompute LDE evaluations from polynomials on demand and free them after the table's proof is generated. Verifier ordering updated to match.

@github-actions

github-actions Bot commented Feb 6, 2026

Copy link
Copy Markdown

Kimi Code Review

Security Vulnerabilities

  1. Potential Memory Safety Issue (Medium):

    • In prover.rs, the main_commitments vector was changed from storing MainCommitment<Field> to Round1CommitmentData<Field>. The previous MainCommitment<Field> type stored an additional Vec<Vec<FieldElement<Field>>> which was used for evaluations. This change might lead to memory safety issues if the evaluations are accessed later in the code without being recomputed or stored elsewhere. This could potentially lead to use-after-free or dereferencing uninitialized memory if the evaluations are accessed after being dropped.

    Line: prover.rs:1211-1250

Potential Bugs

  1. Logic Error (Medium):

    • The change from returning (main, evaluations) to just main in round_1_commit_main_trace and round_1_commit_preprocessed_trace functions might cause issues in the logic where the evaluations are expected to be used later in the code. Since the evaluations are not returned, they will be dropped, and any subsequent use will result in undefined behavior.

    Line: prover.rs:458-489

  2. Logic Error (Medium):

    • The comment // _evaluations dropped here — will be recomputed from trace_polys later suggests that evaluations will be recomputed later. However, there is no code in the provided diff that shows the re-computation of evaluations. This could lead to a logic error if the evaluations are needed later in the process.

    Line: prover.rs:458

Performance Issues

  1. Potential Performance Regression (Low):

    • The change from storing Vec<Vec<FieldElement<Field>>> in main_commitments to not storing it at all might lead to a performance regression if the evaluations need to be recomputed multiple times. Re-computation could be more expensive than just storing the evaluations, depending on the complexity of the computation.

    Line: prover.rs:1211-1250

Simplicity

  1. Code Simplification (High):

    • The removal of the Vec<Vec<FieldElement<Field>>> from MainCommitment<Field> and main_commitments simplifies the code structure. However, this change should be accompanied by a clear explanation or documentation of why this change is safe and does not affect the correctness of the protocol.

    Line: prover.rs:43-489

Conclusion

The changes in the PR diff introduce potential memory safety issues and logic errors due to the removal of the evaluations from the main commitment data structure. It's crucial to ensure that the evaluations are either recomputed when needed or stored safely to avoid undefined behavior. Additionally, the performance implications of recomputing evaluations should be considered. The code simplification is a positive aspect, but it should be justified with clear documentation or comments.


Automated review by Kimi (Moonshot AI)

@gabrielbosio gabrielbosio force-pushed the seq-proving-memory-opt branch from 6b78585 to 5ec422f Compare February 6, 2026 22:19
@github-actions

github-actions Bot commented Feb 6, 2026

Copy link
Copy Markdown

Codex Code Review

Findings

  • Medium — Crypto/Transcript binding risk: The verifier and prover now interleave Phase C and Rounds 2–4 per table. This changes the Fiat–Shamir transcript ordering so that challenges for table N no longer depend on auxiliary roots of later tables. If the security model expects a single transcript that binds all tables before any round‑2 challenges, this weakens cross‑table binding and could enable malleability across tables. Please confirm the protocol’s intended ordering for multi‑table proofs. crypto/stark/src/prover.rs crypto/stark/src/verifier.rs
  • Medium — Performance regression: LDE evaluations are computed in interpolate_and_commit_main(...) and then recomputed later from trace_polys for every table. That doubles the LDE cost on the hot path (O(n log n) per column) and may negate the intended memory savings for typical proof sizes. Consider reusing the original evaluations when memory allows or making this configurable. crypto/stark/src/prover.rs

Notes

  • No new unsafe blocks, panics, or memory‑safety issues spotted in this diff.

@gabrielbosio gabrielbosio added bench-prove Run prove benchmarks on this PR and removed bench-prove Run prove benchmarks on this PR labels Feb 6, 2026
@claude

claude Bot commented Feb 6, 2026

Copy link
Copy Markdown

Security & Correctness Review

Reviewed PR #299 focusing on security vulnerabilities, bugs, performance, and simplicity.


✅ No Critical Issues in Changes

The refactoring is cryptographically sound and correct:

1. Transcript ordering preserved

  • Both prover and verifier now process tables sequentially in the same order
  • Phase C (auxiliary commitment) + Rounds 2-4 are interleaved per table
  • Fiat-Shamir transcript remains consistent between prover/verifier

2. LogUp challenges still secure

  • Challenges sampled once after ALL main commitments (Phase B)
  • Sequential processing happens AFTER challenge sampling
  • Maintains security property: challenges depend on all tables before any table is fully processed

3. Memory optimization achieved

  • LDE evaluations dropped immediately after main commitment
  • Recomputed on-demand from polynomials when needed (line 1253-1254)
  • Peak memory: O(all_tables) → O(single_table) ✓

4. No timing attacks

  • Recomputation is deterministic (same input → same output)
  • No data-dependent branching in the changed code

Performance ✓

Reasonable tradeoff:

  • Win: Significant peak memory reduction
  • Cost: Recompute LDE via FFT once per table (O(n log n))
  • Net: Good for memory-constrained environments; compute cost is acceptable

Code Quality ✓

  • Removed unused MainCommitment type alias
  • Sequential flow easier to follow than split phases
  • Comments accurately describe behavior

Pre-existing Note (not blocking)

compute_lde_trace_evaluations (line 453) has an .unwrap() on FFT operations that could panic. This isn't introduced by this PR, but worth noting for future hardening. Consider propagating the error in a follow-up.


✅ Approved - Changes are correct and achieve the stated goal.

@gabrielbosio

Copy link
Copy Markdown
Collaborator Author

we should merge #296 to see hyperfine's results

@github-actions

github-actions Bot commented Feb 6, 2026

Copy link
Copy Markdown

📊 Memory Profile Results

Program: loop_4096 (4097 instructions)

Metric Value
Peak Memory 198.24 MB
Total Allocated 4327.08 MB

Peak Allocation Stack Trace:

0x563d8e13d972: <dhat::Alloc as core::alloc::global::GlobalAlloc>::alloc (???:0:0)
0x563d8dfcbda0: lambda_vm_prover::tables::bitwise::generate_bitwise_trace (???:0:0)
0x563d8dfd2412: lambda_vm_prover::tables::trace_builder::Traces::from_logs (???:0:0)
0x563d8dfd358e: lambda_vm_prover::tables::trace_builder::Traces::from_logs_minimal (???:0:0)
0x563d8dfe783e: lambda_vm_prover::tests::prove_elfs_tests::test_dhat_memory_profile (???:0:0)

Comparison with main

Metric main This PR Change
Peak Memory 231.71 MB 198.24 MB ✅ -33.47 MB (-14.44%)
Total Allocated 4251.33 MB 4327.08 MB ⚠️ +75.75 MB (+1.78%)
View detailed profile
  1. Download the artifact
  2. Extract dhat-heap.json from the zip
  3. Upload to dh_view

@yetanotherco yetanotherco deleted a comment from github-actions Bot Feb 9, 2026
@github-actions

github-actions Bot commented Feb 9, 2026

Copy link
Copy Markdown

Benchmark Results for unmodified programs 🚀

Command Mean [ms] Min [ms] Max [ms] Relative
base binary_search 81.3 ± 12.6 63.0 92.7 1.16 ± 0.25
head binary_search 70.4 ± 11.0 63.7 92.0 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base bitwise_ops 64.9 ± 3.1 62.8 70.5 1.00 ± 0.06
head bitwise_ops 64.8 ± 2.9 63.1 70.3 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base fibonacci_26 69.5 ± 3.8 66.1 79.0 1.00 ± 0.07
head fibonacci_26 69.4 ± 2.5 66.6 73.0 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base hashmap 142.1 ± 1.7 138.9 143.8 1.00
head hashmap 146.1 ± 2.2 143.6 150.7 1.03 ± 0.02
Command Mean [ms] Min [ms] Max [ms] Relative
base keccak 139.1 ± 4.9 133.0 146.5 1.00
head keccak 145.0 ± 4.0 139.8 151.1 1.04 ± 0.05
Command Mean [ms] Min [ms] Max [ms] Relative
base matrix_multiply 68.8 ± 1.4 67.3 72.3 1.00
head matrix_multiply 70.4 ± 2.1 67.8 74.9 1.02 ± 0.04
Command Mean [ms] Min [ms] Max [ms] Relative
base modular_exp 64.2 ± 2.2 62.7 70.3 1.00
head modular_exp 65.6 ± 3.0 63.1 70.3 1.02 ± 0.06
Command Mean [ms] Min [ms] Max [ms] Relative
base quicksort 71.8 ± 7.0 66.2 88.5 1.07 ± 0.11
head quicksort 67.3 ± 1.5 66.0 71.2 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base sieve 70.7 ± 3.1 67.8 74.4 1.00
head sieve 71.2 ± 4.0 68.1 81.2 1.01 ± 0.07
Command Mean [ms] Min [ms] Max [ms] Relative
base sum_array 80.4 ± 2.4 78.2 84.5 1.00
head sum_array 100.8 ± 13.9 78.2 119.7 1.25 ± 0.18
Command Mean [ms] Min [ms] Max [ms] Relative
base syscall_commit 103.9 ± 5.5 98.1 111.1 1.02 ± 0.07
head syscall_commit 101.5 ± 3.7 98.2 108.4 1.00

@yetanotherco yetanotherco deleted a comment from greptile-apps Bot Feb 10, 2026
@gabrielbosio

Copy link
Copy Markdown
Collaborator Author

Method: Poll /proc/PID/status VmHWM (kernel-tracked peak RSS) every 5 seconds while the CLI prover runs.

Machine: admin@vm-tests-avx512 — Debian 12, x86_64, 62 GB RAM, 16 cores, AVX-512.

Results (2^19 instructions, bench_512k):

  • main: 50.4 GB peak RSS, ~6 min
  • seq-proving-memory-opt: 40.0 GB peak RSS, ~5.4 min

seq-proving-memory-opt saves ~10.4 GB (20% reduction). 9 AIR tables, 247 total trace columns.

@github-actions

Copy link
Copy Markdown

Prover Profile (arith_8.elf)

Metric main This PR Change
Peak heap 3253 MB 3254 MB +1 MB (+0.0%)
Prove time 58.760s 61.251s +4.2%

✅ No significant regression detected.

@MauroToscano

Copy link
Copy Markdown
Contributor

With bigger programs there is an improvement:

image

@MauroToscano MauroToscano left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra computation needs to be further measured, it seems like with big programs it will get a big penalty.

@gabrielbosio

Copy link
Copy Markdown
Collaborator Author

/bench

@github-actions

Copy link
Copy Markdown

Benchmark — fib_iterative_372k (median of 3)

Metric main PR Δ
Peak heap 17698 MB 13570 MB -4128 MB (-23.3%) 🟢
Prove time 157.155s 158.193s +1.038s (+0.7%) ⚪

🎉 Improvement detected — heap or time decreased by more than 5%.

✅ Low variance (time: 0.5%, heap: 1.0%)

Commit: a352047 · Baseline: built from main · Runner: self-hosted bench

@gabrielbosio

Copy link
Copy Markdown
Collaborator Author

superseded by #349

@diegokingston diegokingston deleted the seq-proving-memory-opt branch March 4, 2026 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants