Multi-vector MaxSim benchmark with BYOTE factory#1027
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new diskann-benchmark-multi-vector crate to benchmark and regression-check multi-vector distance ops (Chamfer, MaxSim) across f32/f16 and optimized/reference implementations, integrated with diskann-benchmark-runner.
Changes:
- Introduces benchmark library with dispatcher registration, benchmark execution, and regression checking logic.
- Adds
benchmark-multi-vectorCLI + integration tests and example job/tolerance configs used by the runner workflow. - Updates workspace to include the new crate.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-benchmark-multi-vector/src/lib.rs | Core benchmark/regression logic, dispatch rules, fixtures, and unit tests. |
| diskann-benchmark-multi-vector/src/bin.rs | CLI entrypoint wiring registry + basic integration tests. |
| diskann-benchmark-multi-vector/examples/tolerance.json | Default tolerance config consumed by check verify/run. |
| diskann-benchmark-multi-vector/examples/test.json | Small smoke input for integration tests. |
| diskann-benchmark-multi-vector/examples/multi-vector.json | Full benchmark matrix input example. |
| diskann-benchmark-multi-vector/README.md | Documents kernels, metric normalization, and runner workflows. |
| diskann-benchmark-multi-vector/Cargo.toml | New crate manifest + deps/bin target. |
| Cargo.toml | Adds new crate to workspace members. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (54.87%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1027 +/- ##
==========================================
- Coverage 89.45% 89.33% -0.13%
==========================================
Files 458 460 +2
Lines 85398 85534 +136
==========================================
+ Hits 76395 76409 +14
- Misses 9003 9125 +122
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks Suryansh, this will be useful for performance optimization. One big question I had is whether or not this needs to be its own crate, or whether it can be folded into diskann-benchmark (which will need some work to put the normal indexing code behind a feature gate to keep compile times low) or as a binary in diskann-quantization directly.
The pattern established by diskann-benchmark-simd is a little unfortunate so now is the time to think about our approach before we have to deal with a crate proliferation. We also might be able to bury all benchmark related binary crates in a subdirectory to keep the top level organized.
In either case, I don't think we should publish this to crates.io - and we should probably stop publishing diskann-benchmark-simd as well.
diskann-benchmark-multi-vector crate
arkrishn94
left a comment
There was a problem hiding this comment.
Thanks Suryansh, I left some relatively minor comments. Can we run this benchmark and post some example numbers in the PR description?
hildebrandmw
left a comment
There was a problem hiding this comment.
I took a quick pass.
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks Suryansh - this is headed in the right direction, but I'm finding little bits of impedance mismatch in places, for example the Distance trait additing additional assertions that should probably go in the MaxSimKernel already.
One other comment I had is that I feel a large number of module and struct level documentation adds noise rather than value. For example, explicitly listing structs in a module is something that rustdoc already does far better than manual maintenance ever could. I encourage you to try building the docs and see what rustdoc will generate. For this PR, I experimented with stripping down the documentation, particularly when is simply restates the type-signature, and found the result much easier to understand and review. This isn't a blocker for this PR, but I encourage you to consider a "less is more" approach. Terser and more direct documentation means the reader spends less time filtering and more time understanding. It also lowers the chance of documentation going out of date.
| let b: Box<[T]> = (0..repr.num_elements()).map(|_| f()).collect(); | ||
| // SAFETY: `b` has length `repr.num_elements()` by construction. | ||
| unsafe { repr.box_to_mat(b) } | ||
| } |
There was a problem hiding this comment.
This definitely needs tests.
| /// | ||
| /// # Contract | ||
| /// | ||
| /// - `scores.len() == self.nrows()` (caller's precondition). |
There was a problem hiding this comment.
We should document what happens if this doesn't hold. Preferably return an error. Right now, the SIMD kernels panic - the reference kernels silently don't do anything and leave whatever the initial values of scores happens to be. Behavior should be consistent.
| /// - The implementation must populate **all** `nrows()` entries of `scores`. | ||
| /// Callers that derive quantities from the full score vector (e.g. sums) | ||
| /// would silently corrupt their result if any trailing entry were left | ||
| /// unwritten. |
There was a problem hiding this comment.
Is this needed? Hopefully it's obvious that if the sizing contract is upheld from the caller, the implementation really should populate the whole thing.
| let query: QueryMatRef<'_, Standard<T>> = self.query.as_view().into(); | ||
| let Ok(mut max_sim) = MaxSim::new(scores) else { | ||
| return; | ||
| }; |
There was a problem hiding this comment.
Why not move this higher and have it pull double duty for the empty scores check?
| let Ok(mut max_sim) = MaxSim::new(scores) else { | ||
| return; | ||
| }; | ||
| let _ = max_sim.evaluate(query, doc); |
There was a problem hiding this comment.
This silently doesn't do anything if there is a size mismatch between query and doc, which is worse than panicking.
| let repr = *query.repr(); | ||
| let src = query.as_slice(); | ||
| let mut idx = 0usize; | ||
| let owned = Mat::<Standard<T>>::from_fn(repr, || { |
There was a problem hiding this comment.
MatRef::to_owned exists.
| /// generically through the timing harness. | ||
| pub(super) trait Distance<T: Copy> { | ||
| fn max_sim(&self, doc: MatRef<'_, Standard<T>>, scores: &mut [f32]); | ||
| } |
There was a problem hiding this comment.
Two comments to make here:
- This is the first user of the BYOTE pattern introduced in this PR, and it immediately doesn't use the BYOTE pattern and instead double dispatches 😉. Would probably be a good stress test of how ergonomic it is in practice (and set a good example).
- What additional assertions are needed here that
MaxSimKernelis not doing. Why are additional assertions needed, and if benchmarking is already having to work around limitations in the underlying implementation introduced in this PR, should this not be fixed at the source?
| pub(super) fn new(run: &Run) -> Self { | ||
| let mut rng = StdRng::seed_from_u64(0x12345); | ||
| let queries = Mat::from_fn( | ||
| Standard::new(run.num_query_vectors.get(), run.dim.get()).unwrap(), |
There was a problem hiding this comment.
This should propagate an error.
| "type": "multi-vector-op", | ||
| "content": { | ||
| "element_type": "float32", | ||
| "isa": "x86-64-v4", |
There was a problem hiding this comment.
Including AVX-512 in the perf test will run into an issue I constantly run into in diskann-benchmark-simd and haven't yet fixed and that is including an option to skip if the arch is unsupported rather than failing. Mainly as an ergonomics thing.
Also, running this on a non-AVX-512 supported machine fails at at runtime, i.e.:
######################
# Running Job 4 of 8 #
######################
Multi-Vector Operation
tag: multi-vector-op
element type: float32
isa: x86-64-v4
number of runs: 9
Error: x86-64-v4 not supported: AVX-512 unavailable on this CPU
instead of at load or match time. Please bake the check into Benchmark::try_match as much as possible so we error out as early as possible.
Adds the multi-vector MaxSim distance benchmark to
diskann-benchmarkbehinda
multi-vectorCargo feature (off by default). Public library API staysminimal: kernel internals (
Kernel<A>,tiled_reduce, layouts,TileBudget)remain private; the only external surface is the generic
build_max_sim<T>factory (gated by the sealed
MaxSimElementtrait) plus theMaxSimKernel<T>/Erase<T>traits.The factory follows the BYOTE ("Bring your own type erasure") pattern from
RFC #1068: the library hands the caller a concrete kernel; the caller's
Erasevisitor decides how to package it.JSON shape
{ "type": "multi-vector-op", "content": { "element_type": "float32", "isa": "x86-64-v3", "runs": [ { "num_query_vectors": 8, "num_doc_vectors": 32, "dim": 128, "loops_per_measurement": 200, "num_measurements": 50 } ] } }Suggested review order
diskann-quantization/src/multi_vector/distance/isa.rs—MaxSimIsaenum +
NotSupported. The public selector.diskann-quantization/src/multi_vector/distance/kernel.rs—MaxSimKernel<T>(object-safe boundary) +Erase<T>(BYOTE visitor) +BoxErase(default).diskann-quantization/src/multi_vector/distance/factory.rs—sealed
MaxSimElementtrait (f32 + f16 impls carry the per-T dispatchbodies) and the generic
build_max_sim<T>wrapper;Prepared(SIMDpath) and
ReferenceKernel(fallback path) with theirMaxSimKernel<T>impls;
BuildAndErase(internalTarget1carrier shared by Auto andthe pinned-ISA arms); factory↔fallback parity tests at the bottom.
diskann-quantization/src/multi_vector/{mod,distance/mod}.rs— publicre-exports (including
MaxSimElementandbuild_max_sim); verifyQueryComputeris gone and no new types leak from insidekernels.diskann-benchmark/src/inputs/multi_vector.rs—MultiVectorOp+BenchIsashadow enum +From<BenchIsa> for MaxSimIsa. The library'sMaxSimIsais deliberately serde-free; the shadow owns the JSON shape.diskann-benchmark/src/backend/multi_vector/{mod,driver,kernels}.rs—module-level docs at the top of
mod.rsdescribe the three-stepworkflow for adding an in-tree experimental kernel;
cfg_if!gate;MultiVectorTolerance+ manualInputimpl; one genericKernel<T>carrier whose
impl<T> Benchmark/impl<T> Regressionblocks areregistered for f32 and f16 — gated by the library's
MaxSimElementsoadding a new element type Just Works.
backend/mod.rs,inputs/mod.rs,Cargo.tomlfeature,
main.rsintegration tests, JSON fixtures.