Skip to content

call_kernel performance: Pool hashing and unnecessary eval_safe_simd dominate runtime #202

@hczphn

Description

@hczphn

🚀 [Performance] Bottlenecks in call_kernel: Repetitive Hashing & Redundant Evaluation

Problem Description

Profiling of the call_kernel execution path reveals two significant performance bottlenecks that together account for the majority of execution time during circuit building.


1. Pool::add(kernel) rehashes entire KernelPrimitive on every call

call_kernel calls self.kernel_primitives.add(kernel), which performs a HashMap::get(v) lookup. This requires hashing the entire KernelPrimitive struct—including both ir_for_later_compilation and ir_for_calling RootCircuit IRs with all their instructions—on every invocation, even when the kernel is already registered.

Profiling Data

Kernel kernel_primitives.add() Total call_kernel
pre_attn_sln 1.17s 1.27s
freivalds_shared_x_qkv 3.38s 4.22s

Proposed Fix

  • Short-term: Cache the kernel ID on the caller side or use pointer-based identity for fast-path lookup to avoid repeated full-struct hashing.
  • Long-term: Introduce a register_kernel method that returns a reusable ID, and a call_kernel_by_id variant that skips the Pool::add overhead.

2. eval_safe_simd runs unconditionally for pure-constraint kernels

call_kernel always evaluates kernel.ir_for_calling().eval_safe_simd(...) for every parallel instance. This is often unnecessary:

  1. Pure-constraint kernels: For kernels where all IO specs are inputs (no outputs), ir_for_calling has its constraints stripped and produces no meaningful output.
  2. Known outputs: In many use cases, output values are already computed externally (e.g., during model inference). call_kernel is invoked only to register the kernel call for later proving, making the SIMD re-computation redundant.

Profiling Data (After fixing issue 1)

Kernel eval_safe_simd Total call_kernel
freivalds_shared_x_qkv 74ms ~100ms
freivalds_shared_x_qkv 579ms ~840ms

Proposed Fix

  • Optimization: Skip eval_safe_simd for kernels with no output specs:
    !kernel.io_specs().iter().any(|s| s.is_output)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions