Skip to content

Multi-vector MaxSim benchmark with BYOTE factory#1027

Open
suri-kumkaran wants to merge 17 commits into
mainfrom
users/suryangupta/multi-vector-benchmark
Open

Multi-vector MaxSim benchmark with BYOTE factory#1027
suri-kumkaran wants to merge 17 commits into
mainfrom
users/suryangupta/multi-vector-benchmark

Conversation

@suri-kumkaran
Copy link
Copy Markdown
Contributor

@suri-kumkaran suri-kumkaran commented May 6, 2026

Adds the multi-vector MaxSim distance benchmark to diskann-benchmark behind
a multi-vector Cargo feature (off by default). Public library API stays
minimal: kernel internals (Kernel<A>, tiled_reduce, layouts, TileBudget)
remain private; the only external surface is the generic build_max_sim<T>
factory (gated by the sealed MaxSimElement trait) plus the
MaxSimKernel<T> / Erase<T> traits.

The factory follows the BYOTE ("Bring your own type erasure") pattern from
RFC #1068: the library hands the caller a concrete kernel; the caller's
Erase visitor decides how to package it.

JSON shape

{
  "type": "multi-vector-op",
  "content": {
    "element_type": "float32",
    "isa": "x86-64-v3",
    "runs": [ { "num_query_vectors": 8, "num_doc_vectors": 32, "dim": 128, "loops_per_measurement": 200, "num_measurements": 50 } ]
  }
}

Suggested review order

  1. diskann-quantization/src/multi_vector/distance/isa.rsMaxSimIsa
    enum + NotSupported. The public selector.
  2. diskann-quantization/src/multi_vector/distance/kernel.rs
    MaxSimKernel<T> (object-safe boundary) + Erase<T> (BYOTE visitor) +
    BoxErase (default).
  3. diskann-quantization/src/multi_vector/distance/factory.rs
    sealed MaxSimElement trait (f32 + f16 impls carry the per-T dispatch
    bodies) and the generic build_max_sim<T> wrapper; Prepared (SIMD
    path) and ReferenceKernel (fallback path) with their MaxSimKernel<T>
    impls; BuildAndErase (internal Target1 carrier shared by Auto and
    the pinned-ISA arms); factory↔fallback parity tests at the bottom.
  4. diskann-quantization/src/multi_vector/{mod,distance/mod}.rs — public
    re-exports (including MaxSimElement and build_max_sim); verify
    QueryComputer is gone and no new types leak from inside kernels.
  5. diskann-benchmark/src/inputs/multi_vector.rsMultiVectorOp +
    BenchIsa shadow enum + From<BenchIsa> for MaxSimIsa. The library's
    MaxSimIsa is deliberately serde-free; the shadow owns the JSON shape.
  6. diskann-benchmark/src/backend/multi_vector/{mod,driver,kernels}.rs
    module-level docs at the top of mod.rs describe the three-step
    workflow for adding an in-tree experimental kernel; cfg_if! gate;
    MultiVectorTolerance + manual Input impl; one generic Kernel<T>
    carrier whose impl<T> Benchmark / impl<T> Regression blocks are
    registered for f32 and f16 — gated by the library's MaxSimElement so
    adding a new element type Just Works.
  7. Glue + testsbackend/mod.rs, inputs/mod.rs, Cargo.toml
    feature, main.rs integration tests, JSON fixtures.

@suri-kumkaran suri-kumkaran self-assigned this May 6, 2026
@suri-kumkaran suri-kumkaran requested review from a team and Copilot May 6, 2026 21:17
@suri-kumkaran suri-kumkaran linked an issue May 6, 2026 that may be closed by this pull request
4 tasks
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new diskann-benchmark-multi-vector crate to benchmark and regression-check multi-vector distance ops (Chamfer, MaxSim) across f32/f16 and optimized/reference implementations, integrated with diskann-benchmark-runner.

Changes:

  • Introduces benchmark library with dispatcher registration, benchmark execution, and regression checking logic.
  • Adds benchmark-multi-vector CLI + integration tests and example job/tolerance configs used by the runner workflow.
  • Updates workspace to include the new crate.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
diskann-benchmark-multi-vector/src/lib.rs Core benchmark/regression logic, dispatch rules, fixtures, and unit tests.
diskann-benchmark-multi-vector/src/bin.rs CLI entrypoint wiring registry + basic integration tests.
diskann-benchmark-multi-vector/examples/tolerance.json Default tolerance config consumed by check verify/run.
diskann-benchmark-multi-vector/examples/test.json Small smoke input for integration tests.
diskann-benchmark-multi-vector/examples/multi-vector.json Full benchmark matrix input example.
diskann-benchmark-multi-vector/README.md Documents kernels, metric normalization, and runner workflows.
diskann-benchmark-multi-vector/Cargo.toml New crate manifest + deps/bin target.
Cargo.toml Adds new crate to workspace members.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-benchmark-multi-vector/src/lib.rs Outdated
Comment thread diskann-benchmark-multi-vector/src/lib.rs Outdated
Comment thread diskann-benchmark-multi-vector/src/lib.rs Outdated
Comment thread diskann-benchmark-multi-vector/src/lib.rs Outdated
Comment thread diskann-benchmark-multi-vector/src/lib.rs Outdated
Comment thread diskann-benchmark-multi-vector/src/bin.rs Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 7, 2026

Codecov Report

❌ Patch coverage is 54.87805% with 148 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.33%. Comparing base (12d1a15) to head (03d6119).

Files with missing lines Patch % Lines
...-quantization/src/multi_vector/distance/factory.rs 62.03% 82 Missing ⚠️
diskann-benchmark/src/inputs/multi_vector.rs 19.67% 49 Missing ⚠️
...kann-quantization/src/multi_vector/distance/isa.rs 0.00% 13 Missing ⚠️
diskann-quantization/src/multi_vector/matrix.rs 0.00% 4 Missing ⚠️

❌ Your patch status has failed because the patch coverage (54.87%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1027      +/-   ##
==========================================
- Coverage   89.45%   89.33%   -0.13%     
==========================================
  Files         458      460       +2     
  Lines       85398    85534     +136     
==========================================
+ Hits        76395    76409      +14     
- Misses       9003     9125     +122     
Flag Coverage Δ
miri 89.33% <54.87%> (-0.13%) ⬇️
unittests 88.95% <54.87%> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark/src/backend/mod.rs 100.00% <100.00%> (ø)
diskann-benchmark/src/backend/multi_vector/mod.rs 100.00% <100.00%> (ø)
diskann-benchmark/src/inputs/mod.rs 81.25% <ø> (ø)
diskann-benchmark/src/main.rs 91.89% <100.00%> (+0.57%) ⬆️
...n-quantization/src/multi_vector/distance/kernel.rs 100.00% <100.00%> (ø)
...ntization/src/multi_vector/distance/kernels/mod.rs 100.00% <ø> (ø)
diskann-quantization/src/multi_vector/matrix.rs 95.08% <0.00%> (-0.51%) ⬇️
...kann-quantization/src/multi_vector/distance/isa.rs 0.00% <0.00%> (ø)
diskann-benchmark/src/inputs/multi_vector.rs 19.67% <19.67%> (ø)
...-quantization/src/multi_vector/distance/factory.rs 62.03% <62.03%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Suryansh, this will be useful for performance optimization. One big question I had is whether or not this needs to be its own crate, or whether it can be folded into diskann-benchmark (which will need some work to put the normal indexing code behind a feature gate to keep compile times low) or as a binary in diskann-quantization directly.

The pattern established by diskann-benchmark-simd is a little unfortunate so now is the time to think about our approach before we have to deal with a crate proliferation. We also might be able to bury all benchmark related binary crates in a subdirectory to keep the top level organized.

In either case, I don't think we should publish this to crates.io - and we should probably stop publishing diskann-benchmark-simd as well.

Comment thread diskann-benchmark-multi-vector/src/lib.rs Outdated
Comment thread diskann-benchmark-multi-vector/src/lib.rs Outdated
Comment thread diskann-benchmark-multi-vector/src/lib.rs Outdated
Comment thread diskann-benchmark-multi-vector/src/lib.rs Outdated
Comment thread diskann-benchmark-multi-vector/src/lib.rs Outdated
Comment thread diskann-benchmark-multi-vector/src/lib.rs Outdated
Comment thread diskann-benchmark-multi-vector/src/lib.rs Outdated
Comment thread diskann-benchmark-multi-vector/README.md Outdated
Comment thread diskann-benchmark-multi-vector/README.md Outdated
Comment thread diskann-benchmark-multi-vector/README.md Outdated
@harsha-simhadri harsha-simhadri requested a review from suhasjs May 8, 2026 19:33
Comment thread Cargo.toml Outdated
Comment thread diskann-benchmark-multi-vector/README.md Outdated
@suri-kumkaran suri-kumkaran changed the title Add diskann-benchmark-multi-vector crate Multi-Vector Distance Benchmarks May 12, 2026
Copy link
Copy Markdown
Contributor

@arkrishn94 arkrishn94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Suryansh, I left some relatively minor comments. Can we run this benchmark and post some example numbers in the PR description?

Comment thread diskann-benchmark-multi-vector/README.md Outdated
Comment thread diskann-benchmark-multi-vector/src/lib.rs Outdated
Comment thread diskann-benchmark-multi-vector/src/lib.rs Outdated
Comment thread diskann-benchmark-multi-vector/src/lib.rs Outdated
Comment thread diskann-quantization/src/multi_vector/matrix.rs Outdated
Comment thread diskann-benchmark/src/inputs/multi_vector.rs Outdated
@suri-kumkaran suri-kumkaran changed the title Multi-Vector Distance Benchmarks Multi-vector MaxSim benchmark + public kernel-research seam May 14, 2026
@suri-kumkaran suri-kumkaran changed the title Multi-vector MaxSim benchmark + public kernel-research seam Multi-vector MaxSim benchmark with BYOTE factory May 18, 2026
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick pass.

Comment thread diskann-quantization/src/multi_vector/distance/factory.rs Outdated
Comment thread diskann-benchmark/src/backend/multi_vector/driver.rs Outdated
Comment thread diskann-benchmark/src/backend/multi_vector/driver.rs Outdated
Comment thread diskann-benchmark/src/backend/multi_vector/README.md Outdated
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Suryansh - this is headed in the right direction, but I'm finding little bits of impedance mismatch in places, for example the Distance trait additing additional assertions that should probably go in the MaxSimKernel already.

One other comment I had is that I feel a large number of module and struct level documentation adds noise rather than value. For example, explicitly listing structs in a module is something that rustdoc already does far better than manual maintenance ever could. I encourage you to try building the docs and see what rustdoc will generate. For this PR, I experimented with stripping down the documentation, particularly when is simply restates the type-signature, and found the result much easier to understand and review. This isn't a blocker for this PR, but I encourage you to consider a "less is more" approach. Terser and more direct documentation means the reader spends less time filtering and more time understanding. It also lowers the chance of documentation going out of date.

let b: Box<[T]> = (0..repr.num_elements()).map(|_| f()).collect();
// SAFETY: `b` has length `repr.num_elements()` by construction.
unsafe { repr.box_to_mat(b) }
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely needs tests.

///
/// # Contract
///
/// - `scores.len() == self.nrows()` (caller's precondition).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document what happens if this doesn't hold. Preferably return an error. Right now, the SIMD kernels panic - the reference kernels silently don't do anything and leave whatever the initial values of scores happens to be. Behavior should be consistent.

/// - The implementation must populate **all** `nrows()` entries of `scores`.
/// Callers that derive quantities from the full score vector (e.g. sums)
/// would silently corrupt their result if any trailing entry were left
/// unwritten.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? Hopefully it's obvious that if the sizing contract is upheld from the caller, the implementation really should populate the whole thing.

let query: QueryMatRef<'_, Standard<T>> = self.query.as_view().into();
let Ok(mut max_sim) = MaxSim::new(scores) else {
return;
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move this higher and have it pull double duty for the empty scores check?

let Ok(mut max_sim) = MaxSim::new(scores) else {
return;
};
let _ = max_sim.evaluate(query, doc);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This silently doesn't do anything if there is a size mismatch between query and doc, which is worse than panicking.

let repr = *query.repr();
let src = query.as_slice();
let mut idx = 0usize;
let owned = Mat::<Standard<T>>::from_fn(repr, || {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MatRef::to_owned exists.

/// generically through the timing harness.
pub(super) trait Distance<T: Copy> {
fn max_sim(&self, doc: MatRef<'_, Standard<T>>, scores: &mut [f32]);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments to make here:

  1. This is the first user of the BYOTE pattern introduced in this PR, and it immediately doesn't use the BYOTE pattern and instead double dispatches 😉. Would probably be a good stress test of how ergonomic it is in practice (and set a good example).
  2. What additional assertions are needed here that MaxSimKernel is not doing. Why are additional assertions needed, and if benchmarking is already having to work around limitations in the underlying implementation introduced in this PR, should this not be fixed at the source?

pub(super) fn new(run: &Run) -> Self {
let mut rng = StdRng::seed_from_u64(0x12345);
let queries = Mat::from_fn(
Standard::new(run.num_query_vectors.get(), run.dim.get()).unwrap(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should propagate an error.

"type": "multi-vector-op",
"content": {
"element_type": "float32",
"isa": "x86-64-v4",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including AVX-512 in the perf test will run into an issue I constantly run into in diskann-benchmark-simd and haven't yet fixed and that is including an option to skip if the arch is unsupported rather than failing. Mainly as an ergonomics thing.

Also, running this on a non-AVX-512 supported machine fails at at runtime, i.e.:

######################
# Running Job 4 of 8 #
######################

Multi-Vector Operation

               tag: multi-vector-op
      element type: float32
               isa: x86-64-v4
    number of runs: 9

Error: x86-64-v4 not supported: AVX-512 unavailable on this CPU

instead of at load or match time. Please bake the check into Benchmark::try_match as much as possible so we error out as early as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Add benchmarking suite for multi-vector operations

7 participants