Multi-vector MaxSim benchmark with BYOTE factory by suri-kumkaran · Pull Request #1027 · microsoft/DiskANN

suri-kumkaran · 2026-05-06T21:17:18Z

Adds the multi-vector MaxSim distance benchmark to diskann-benchmark behind
a multi-vector Cargo feature (off by default). Public library API stays
minimal: kernel internals (Kernel<A>, tiled_reduce, layouts, TileBudget)
remain private; the only external surface is the generic build_max_sim<T>
factory (gated by the sealed MaxSimElement trait) plus the
MaxSimKernel<T> / Erase<T> traits.

The factory follows the BYOTE ("Bring your own type erasure") pattern from
RFC #1068: the library hands the caller a concrete kernel; the caller's
Erase visitor decides how to package it.

JSON shape

{
  "type": "multi-vector-op",
  "content": {
    "element_type": "float32",
    "isa": "x86-64-v3",
    "runs": [ { "num_query_vectors": 8, "num_doc_vectors": 32, "dim": 128, "loops_per_measurement": 200, "num_measurements": 50 } ]
  }
}

Suggested review order

diskann-quantization/src/multi_vector/distance/isa.rs — MaxSimIsa
enum + NotSupported. The public selector.
diskann-quantization/src/multi_vector/distance/kernel.rs —
MaxSimKernel<T> (object-safe boundary) + Erase<T> (BYOTE visitor) +
BoxErase (default).
diskann-quantization/src/multi_vector/distance/factory.rs —
sealed MaxSimElement trait (f32 + f16 impls carry the per-T dispatch
bodies) and the generic build_max_sim<T> wrapper; Prepared (SIMD
path) and ReferenceKernel (fallback path) with their MaxSimKernel<T>
impls; BuildAndErase (internal Target1 carrier shared by Auto and
the pinned-ISA arms); factory↔fallback parity tests at the bottom.
diskann-quantization/src/multi_vector/{mod,distance/mod}.rs — public
re-exports (including MaxSimElement and build_max_sim); verify
QueryComputer is gone and no new types leak from inside kernels.
diskann-benchmark/src/inputs/multi_vector.rs — MultiVectorOp +
BenchIsa shadow enum + From<BenchIsa> for MaxSimIsa. The library's
MaxSimIsa is deliberately serde-free; the shadow owns the JSON shape.
diskann-benchmark/src/backend/multi_vector/{mod,driver,kernels}.rs —
module-level docs at the top of mod.rs describe the three-step
workflow for adding an in-tree experimental kernel; cfg_if! gate;
MultiVectorTolerance + manual Input impl; one generic Kernel<T>
carrier whose impl<T> Benchmark / impl<T> Regression blocks are
registered for f32 and f16 — gated by the library's MaxSimElement so
adding a new element type Just Works.
Glue + tests — backend/mod.rs, inputs/mod.rs, Cargo.toml
feature, main.rs integration tests, JSON fixtures.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new diskann-benchmark-multi-vector crate to benchmark and regression-check multi-vector distance ops (Chamfer, MaxSim) across f32/f16 and optimized/reference implementations, integrated with diskann-benchmark-runner.

Changes:

Introduces benchmark library with dispatcher registration, benchmark execution, and regression checking logic.
Adds benchmark-multi-vector CLI + integration tests and example job/tolerance configs used by the runner workflow.
Updates workspace to include the new crate.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
diskann-benchmark-multi-vector/src/lib.rs	Core benchmark/regression logic, dispatch rules, fixtures, and unit tests.
diskann-benchmark-multi-vector/src/bin.rs	CLI entrypoint wiring registry + basic integration tests.
diskann-benchmark-multi-vector/examples/tolerance.json	Default tolerance config consumed by `check verify/run`.
diskann-benchmark-multi-vector/examples/test.json	Small smoke input for integration tests.
diskann-benchmark-multi-vector/examples/multi-vector.json	Full benchmark matrix input example.
diskann-benchmark-multi-vector/README.md	Documents kernels, metric normalization, and runner workflows.
diskann-benchmark-multi-vector/Cargo.toml	New crate manifest + deps/bin target.
Cargo.toml	Adds new crate to workspace members.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codecov-commenter · 2026-05-07T06:57:02Z

Codecov Report

❌ Patch coverage is 54.87805% with 148 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.33%. Comparing base (12d1a15) to head (03d6119).

Files with missing lines	Patch %	Lines
...-quantization/src/multi_vector/distance/factory.rs	62.03%	82 Missing ⚠️
diskann-benchmark/src/inputs/multi_vector.rs	19.67%	49 Missing ⚠️
...kann-quantization/src/multi_vector/distance/isa.rs	0.00%	13 Missing ⚠️
diskann-quantization/src/multi_vector/matrix.rs	0.00%	4 Missing ⚠️

❌ Your patch status has failed because the patch coverage (54.87%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1027      +/-   ##
==========================================
- Coverage   89.45%   89.33%   -0.13%     
==========================================
  Files         458      460       +2     
  Lines       85398    85534     +136     
==========================================
+ Hits        76395    76409      +14     
- Misses       9003     9125     +122

Flag	Coverage Δ
miri	`89.33% <54.87%> (-0.13%)`	⬇️
unittests	`88.95% <54.87%> (-0.13%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
diskann-benchmark/src/backend/mod.rs	`100.00% <100.00%> (ø)`
diskann-benchmark/src/backend/multi_vector/mod.rs	`100.00% <100.00%> (ø)`
diskann-benchmark/src/inputs/mod.rs	`81.25% <ø> (ø)`
diskann-benchmark/src/main.rs	`91.89% <100.00%> (+0.57%)`	⬆️
...n-quantization/src/multi_vector/distance/kernel.rs	`100.00% <100.00%> (ø)`
...ntization/src/multi_vector/distance/kernels/mod.rs	`100.00% <ø> (ø)`
diskann-quantization/src/multi_vector/matrix.rs	`95.08% <0.00%> (-0.51%)`	⬇️
...kann-quantization/src/multi_vector/distance/isa.rs	`0.00% <0.00%> (ø)`
diskann-benchmark/src/inputs/multi_vector.rs	`19.67% <19.67%> (ø)`
...-quantization/src/multi_vector/distance/factory.rs	`62.03% <62.03%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This reverts commit 96d17b3.

hildebrandmw

Thanks Suryansh, this will be useful for performance optimization. One big question I had is whether or not this needs to be its own crate, or whether it can be folded into diskann-benchmark (which will need some work to put the normal indexing code behind a feature gate to keep compile times low) or as a binary in diskann-quantization directly.

The pattern established by diskann-benchmark-simd is a little unfortunate so now is the time to think about our approach before we have to deal with a crate proliferation. We also might be able to bury all benchmark related binary crates in a subdirectory to keep the top level organized.

In either case, I don't think we should publish this to crates.io - and we should probably stop publishing diskann-benchmark-simd as well.

arkrishn94

Thanks Suryansh, I left some relatively minor comments. Can we run this benchmark and post some example numbers in the PR description?

hildebrandmw

I took a quick pass.

…ti-vector-benchmark

hildebrandmw

Thanks Suryansh - this is headed in the right direction, but I'm finding little bits of impedance mismatch in places, for example the Distance trait additing additional assertions that should probably go in the MaxSimKernel already.

One other comment I had is that I feel a large number of module and struct level documentation adds noise rather than value. For example, explicitly listing structs in a module is something that rustdoc already does far better than manual maintenance ever could. I encourage you to try building the docs and see what rustdoc will generate. For this PR, I experimented with stripping down the documentation, particularly when is simply restates the type-signature, and found the result much easier to understand and review. This isn't a blocker for this PR, but I encourage you to consider a "less is more" approach. Terser and more direct documentation means the reader spends less time filtering and more time understanding. It also lowers the chance of documentation going out of date.

hildebrandmw · 2026-05-22T15:32:12Z

+        let b: Box<[T]> = (0..repr.num_elements()).map(|_| f()).collect();
+        // SAFETY: `b` has length `repr.num_elements()` by construction.
+        unsafe { repr.box_to_mat(b) }
+    }


This definitely needs tests.

hildebrandmw · 2026-05-22T15:34:37Z

+///
+/// # Contract
+///
+/// - `scores.len() == self.nrows()` (caller's precondition).


We should document what happens if this doesn't hold. Preferably return an error. Right now, the SIMD kernels panic - the reference kernels silently don't do anything and leave whatever the initial values of scores happens to be. Behavior should be consistent.

hildebrandmw · 2026-05-22T15:36:33Z

+/// - The implementation must populate **all** `nrows()` entries of `scores`.
+///   Callers that derive quantities from the full score vector (e.g. sums)
+///   would silently corrupt their result if any trailing entry were left
+///   unwritten.


Is this needed? Hopefully it's obvious that if the sizing contract is upheld from the caller, the implementation really should populate the whole thing.

hildebrandmw · 2026-05-22T15:37:18Z

+        let query: QueryMatRef<'_, Standard<T>> = self.query.as_view().into();
+        let Ok(mut max_sim) = MaxSim::new(scores) else {
+            return;
+        };


Why not move this higher and have it pull double duty for the empty scores check?

hildebrandmw · 2026-05-22T15:37:47Z

+        let Ok(mut max_sim) = MaxSim::new(scores) else {
+            return;
+        };
+        let _ = max_sim.evaluate(query, doc);


This silently doesn't do anything if there is a size mismatch between query and doc, which is worse than panicking.

hildebrandmw · 2026-05-22T15:40:08Z

+        let repr = *query.repr();
+        let src = query.as_slice();
+        let mut idx = 0usize;
+        let owned = Mat::<Standard<T>>::from_fn(repr, || {


MatRef::to_owned exists.

hildebrandmw · 2026-05-22T15:46:41Z

+/// generically through the timing harness.
+pub(super) trait Distance<T: Copy> {
+    fn max_sim(&self, doc: MatRef<'_, Standard<T>>, scores: &mut [f32]);
+}


Two comments to make here:

This is the first user of the BYOTE pattern introduced in this PR, and it immediately doesn't use the BYOTE pattern and instead double dispatches 😉. Would probably be a good stress test of how ergonomic it is in practice (and set a good example).

What additional assertions are needed here that MaxSimKernel is not doing. Why are additional assertions needed, and if benchmarking is already having to work around limitations in the underlying implementation introduced in this PR, should this not be fixed at the source?

hildebrandmw · 2026-05-22T15:48:07Z

+    pub(super) fn new(run: &Run) -> Self {
+        let mut rng = StdRng::seed_from_u64(0x12345);
+        let queries = Mat::from_fn(
+            Standard::new(run.num_query_vectors.get(), run.dim.get()).unwrap(),


This should propagate an error.

hildebrandmw · 2026-05-22T15:58:59Z

+      "type": "multi-vector-op",
+      "content": {
+        "element_type": "float32",
+        "isa": "x86-64-v4",


Including AVX-512 in the perf test will run into an issue I constantly run into in diskann-benchmark-simd and haven't yet fixed and that is including an option to skip if the arch is unsupported rather than failing. Mainly as an ergonomics thing.

Also, running this on a non-AVX-512 supported machine fails at at runtime, i.e.:

###################### # Running Job 4 of 8 # ###################### Multi-Vector Operation tag: multi-vector-op element type: float32 isa: x86-64-v4 number of runs: 9 Error: x86-64-v4 not supported: AVX-512 unavailable on this CPU

instead of at load or match time. Please bake the check into Benchmark::try_match as much as possible so we error out as early as possible.

suri-kumkaran and others added 4 commits May 7, 2026 01:18

Add benchmark crate for multi-vector

77cc936

Move some repetetive code to macros and add more benchmark cases

54a21ec

Move some repetetive code to macros and add more benchmark cases

f3a5d9f

Move some repetetive code to macros and add more benchmark cases

8efdbcd

suri-kumkaran self-assigned this May 6, 2026

suri-kumkaran requested review from a team and Copilot May 6, 2026 21:17

suri-kumkaran added this to DiskANN backlog May 6, 2026

suri-kumkaran linked an issue May 6, 2026 that may be closed by this pull request

Add benchmarking suite for multi-vector operations #993

Open

4 tasks

Copilot AI reviewed May 6, 2026

View reviewed changes

Copilot started reviewing on behalf of suri-kumkaran May 6, 2026 21:57 View session

suri-kumkaran added 2 commits May 7, 2026 11:54

Merge branch 'main' into users/suryangupta/multi-vector-benchmark

74d1cae

Add Cargo.lock

3a89c37

suri-kumkaran added 2 commits May 8, 2026 01:48

Remove unused scalar benchmark config file

96d17b3

Revert "Remove unused scalar benchmark config file"

6b33719

This reverts commit 96d17b3.

hildebrandmw reviewed May 8, 2026

View reviewed changes

harsha-simhadri requested a review from suhasjs May 8, 2026 19:33

suhasjs reviewed May 11, 2026

View reviewed changes

Comment thread Cargo.toml Outdated

Comment thread diskann-benchmark-multi-vector/README.md Outdated

suri-kumkaran added 2 commits May 12, 2026 20:47

Fold the new crate to existing diskann-benchmark crate

d06df7e

Merge branch 'main' into users/suryangupta/multi-vector-benchmark

41ee380

suri-kumkaran changed the title ~~Add diskann-benchmark-multi-vector crate~~ Multi-Vector Distance Benchmarks May 12, 2026

arkrishn94 reviewed May 12, 2026

View reviewed changes

Revamp the benchmark to be also kernel-research seam

a64279e

suri-kumkaran changed the title ~~Multi-Vector Distance Benchmarks~~ Multi-vector MaxSim benchmark + public kernel-research seam May 14, 2026

Suryansh Gupta and others added 3 commits May 14, 2026 21:23

Minor doc fix

94fd8de

Merge branch 'main' into users/suryangupta/multi-vector-benchmark

9624da4

MaxSim benchmark with BYOTE factory

f51bc2a

suri-kumkaran changed the title ~~Multi-vector MaxSim benchmark + public kernel-research seam~~ Multi-vector MaxSim benchmark with BYOTE factory May 18, 2026

hildebrandmw reviewed May 18, 2026

View reviewed changes

Address review comments

597330a

suri-kumkaran requested a review from hildebrandmw May 19, 2026 17:44

Suryansh Gupta added 2 commits May 19, 2026 23:27

Merge remote-tracking branch 'origin/main' into users/suryangupta/mul…

725ad08

…ti-vector-benchmark

Fix after main merge

03d6119

hildebrandmw reviewed May 22, 2026

View reviewed changes

Conversation

suri-kumkaran commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JSON shape

Suggested review order

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hildebrandmw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arkrishn94 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hildebrandmw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hildebrandmw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

suri-kumkaran commented May 6, 2026 •

edited

Loading

codecov-commenter commented May 7, 2026 •

edited

Loading