-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
🚀 [Performance] Bottlenecks in call_kernel: Repetitive Hashing & Redundant Evaluation
Problem Description
Profiling of the call_kernel execution path reveals two significant performance bottlenecks that together account for the majority of execution time during circuit building.
1. Pool::add(kernel) rehashes entire KernelPrimitive on every call
call_kernel calls self.kernel_primitives.add(kernel), which performs a HashMap::get(v) lookup. This requires hashing the entire KernelPrimitive struct—including both ir_for_later_compilation and ir_for_calling RootCircuit IRs with all their instructions—on every invocation, even when the kernel is already registered.
Profiling Data
| Kernel | kernel_primitives.add() |
Total call_kernel |
|---|---|---|
pre_attn_sln |
1.17s | 1.27s |
freivalds_shared_x_qkv |
3.38s | 4.22s |
Proposed Fix
- Short-term: Cache the kernel ID on the caller side or use pointer-based identity for fast-path lookup to avoid repeated full-struct hashing.
- Long-term: Introduce a
register_kernelmethod that returns a reusable ID, and acall_kernel_by_idvariant that skips thePool::addoverhead.
2. eval_safe_simd runs unconditionally for pure-constraint kernels
call_kernel always evaluates kernel.ir_for_calling().eval_safe_simd(...) for every parallel instance. This is often unnecessary:
- Pure-constraint kernels: For kernels where all IO specs are inputs (no outputs),
ir_for_callinghas its constraints stripped and produces no meaningful output. - Known outputs: In many use cases, output values are already computed externally (e.g., during model inference).
call_kernelis invoked only to register the kernel call for later proving, making the SIMD re-computation redundant.
Profiling Data (After fixing issue 1)
| Kernel | eval_safe_simd |
Total call_kernel |
|---|---|---|
freivalds_shared_x_qkv |
74ms | ~100ms |
freivalds_shared_x_qkv |
579ms | ~840ms |
Proposed Fix
- Optimization: Skip
eval_safe_simdfor kernels with no output specs:!kernel.io_specs().iter().any(|s| s.is_output)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels