Cleanup `diskann-benchmark-runner` and friends. by hildebrandmw · Pull Request #865 · microsoft/DiskANN

hildebrandmw · 2026-03-26T18:56:46Z

Simplify benchmark running and registration by:

Moving the Input away from a dyn-compatible trait to a simple trait with methods that don't receive by &self. That is, instead of
```
trait Input {
    fn tag(&self) -> &'static str;
    fn try_deserialize(&self, ...);
    fn example(&self);
}
```
we get
```
trait Input {
    fn tag() -> &'static str;
    fn try_deserialize(...);
    fn example();
}
```
The original pattern required an awkward proxy type for input registration (see the Input type in diskann-benchmark). This new pattern allows the actual input structs to be used directly.

Ditching the whole overly engineered dispatcher (which is cool, but inappropriate for this task) and instead introducing a Benchmark trait with the signature (see the code for doc-comments)

pub trait Benchmark {
    type Input: Input + 'static;
    type Output: Serialize;

    fn try_match(input: &Self::Input) -> Result<MatchScore, FailureScore>;

    fn description(
        f: &mut std::fmt::Formatter<'_>,
        input: Option<&Self::Input>,
    ) -> std::fmt::Result;

    fn run(
        input: &Self::Input,
        checkpoint: Checkpoint<'_>,
        output: &mut dyn Output,
    ) -> anyhow::Result<Self::Output>;
}

The functions try_match and description match the original DispatchRule technique, but instead of relying on a convert method and a corresponding closure - we just run the benchmark directly.

This almost entirely removes the need for the various register! macros as registration is significantly more straightforward. Additionally, the input and output types are concrete!

These changes remove a lot of boilerplate from diskann-benchmark. No need to match on the Any type anymore as this happens automatically. While updating diskann-benchmark, I was able to replace some of the macros for stamping out benchmarks with generics. Scalar and spherically-quantized index builds are still reliant on macros.

Finally, the backend of diskann-benchmark-runner (namely registry::Benchmarks) is cleaned up to not rely on the complicated dispatcher machinery for matching inputs and providing mismatch diagnostics. Instead, the logic is greatly simplified and just implemented inline.

Why

While reviewing #857, I was pondering what it would take to add A/B comparison logic directly to our benchmark framework. My thinking is that it would be nice to have something like

trait Compare: Benchmark {
    type Tolerance: Input;
    fn compare(
        tolerances: &Self::Tolerance,
        before: &Self::Output,
        after: &Self::Output,
    ) -> SomeResult;
}

Benchmarks that want A/B comparisons could implement Compare and use a registry::Benchmarks::register_compared method to register a benchmark that participates in A/B comparison. The backend of diskann-benchmark-runner could then use this to inspect the benchmarks that support comparison and this becomes an engine for A/B testing with concretely typed outputs!

The problem, of course, is the current way of benchmark registration is a little overly complicated and makes this kind of extension way too messy. So view this PR as serving two purposes. First, we clean up an overly complicated portion of benchmarking and second, we enable evolution of the benchmarking framework.

Suggested Reviewing Order

I would actually recommend looking at the changes in diskann-benchmark before diskann-benchmark-runner. The amount of code that got chopped is quite satisfying. The changes here aren't perfect (the use of the 'static lifetime for benchmarks is a remnant of the DispatchRule approach and fully cleaning that up would be a larger change).

Then, inside diskann-benchmark-runner, I suggest:

input.rs: New home of the Input trait plus the machinery for dynamically dispatching to inputs.
benchmark.rs: The new benchmark trait (plus associated dynamic dispatch machinery).
registry.rs: Updates to the registry to reflect the new traits. In particular, Benchmarks is updated so the matching logic is much clear (and not hidden behind a dispatcher). I expanded the tests a little and added a few more baselines to exercise the matching logic more.
dispatcher/: This directory has largely been made obsolete. The DispatchRule trait is preserved because it still has some utility in matching data types and composes somewhat with the Benchmark trait. We could probably get rid of this directory entirely in the future, but for now I'm preserving it for the sake of a less insane diff.

Feel free to skip diskann-benchmark-simd.

Copilot

Pull request overview

Refactors the benchmark framework to simplify input/benchmark registration and remove the previously generic multi-dispatch “dispatcher” machinery from the runner, replacing it with a Benchmark trait + type-erased wrappers and a simpler registry selection/mismatch diagnostic flow.

Changes:

Convert Input to a static-method trait and update input registration to be type-driven (register::<T>()).
Introduce Benchmark + DynBenchmark wrapper and simplify benchmark selection/debugging in registry::Benchmarks.
Migrate existing benchmarks (DiskANN benchmark crates + simd) and update/add golden tests for matching/mismatch behavior.

Reviewed changes

Copilot reviewed 41 out of 41 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
diskann-benchmark/src/utils/mod.rs	Updates `stub_impl!` to register stub benchmarks via the new `Benchmark` trait API.
diskann-benchmark/src/inputs/mod.rs	Removes proxy `Input<T>` helper and implements runner `Input` directly for input structs via macro.
diskann-benchmark/src/inputs/filters.rs	Updates input registration to `registry.register::<T>()`.
diskann-benchmark/src/inputs/exhaustive.rs	Updates input registration to `registry.register::<T>()`.
diskann-benchmark/src/inputs/disk.rs	Updates input registration to `registry.register::<T>()`.
diskann-benchmark/src/inputs/async_.rs	Updates input registration to `registry.register::<T>()`.
diskann-benchmark/src/backend/index/spherical.rs	Migrates spherical-quant benchmarks to `Benchmark` trait and simplifies registration.
diskann-benchmark/src/backend/index/scalar.rs	Migrates scalar-quant benchmarks to `Benchmark` trait and simplifies registration.
diskann-benchmark/src/backend/index/product.rs	Migrates product-quant benchmarks to `Benchmark` trait and reduces macro usage.
diskann-benchmark/src/backend/index/benchmarks.rs	Removes old registration macros and migrates full-precision/dynamic benchmarks to `Benchmark`.
diskann-benchmark/src/backend/filters/benchmark.rs	Migrates metadata-index filter benchmark to `Benchmark`.
diskann-benchmark/src/backend/exhaustive/spherical.rs	Migrates exhaustive spherical benchmark to `Benchmark`.
diskann-benchmark/src/backend/exhaustive/product.rs	Migrates exhaustive PQ benchmark to `Benchmark`.
diskann-benchmark/src/backend/exhaustive/minmax.rs	Migrates exhaustive minmax benchmark to `Benchmark`.
diskann-benchmark/src/backend/disk_index/benchmarks.rs	Migrates disk-index benchmarks to `Benchmark` and simplifies registration.
diskann-benchmark-simd/src/lib.rs	Migrates simd benchmarks and input to new `Input`/`Benchmark` APIs; removes old adapter types.
diskann-benchmark-simd/src/bin.rs	Updates simd binary to register `SimdOp` input via `register::<T>()`.
diskann-benchmark-runner/tests/test-overload-0/stdout.txt	Adds golden output for overload resolution (best `MatchScore` wins).
diskann-benchmark-runner/tests/test-overload-0/stdin.txt	Adds runner invocation fixture for overload resolution test.
diskann-benchmark-runner/tests/test-overload-0/output.json	Adds expected JSON output for overload resolution test.
diskann-benchmark-runner/tests/test-overload-0/input.json	Adds input file fixture for overload resolution test.
diskann-benchmark-runner/tests/test-overload-0/README.md	Documents overload-resolution expectations for the new matching logic.
diskann-benchmark-runner/tests/test-mismatch-1/stdout.txt	Adds golden output for mismatch diagnostics (new description paths).
diskann-benchmark-runner/tests/test-mismatch-1/stdin.txt	Adds runner invocation fixture for mismatch test.
diskann-benchmark-runner/tests/test-mismatch-1/input.json	Adds input file fixture for mismatch test.
diskann-benchmark-runner/tests/test-mismatch-1/README.md	Documents mismatch diagnostic expectations.
diskann-benchmark-runner/tests/test-mismatch-0/stdout.txt	Updates mismatch diagnostics ordering/content to match new implementation.
diskann-benchmark-runner/tests/test-4/stdout.txt	Updates benchmark listing output to reflect new description-based listing.
diskann-benchmark-runner/src/utils/datatype.rs	Removes `dispatcher::Map` usage; keeps dispatch-rule-based matching utilities.
diskann-benchmark-runner/src/test.rs	Updates test inputs/benchmarks to new `Input`/`Benchmark` APIs; adds exact-match benchmark.
diskann-benchmark-runner/src/result.rs	Removes `Checkpoint::empty()` and its test now that registry no longer needs it.
diskann-benchmark-runner/src/registry.rs	Replaces dispatcher-backed benchmark registry with a simpler vector of type-erased benchmarks + scoring logic.
diskann-benchmark-runner/src/lib.rs	Moves `Input` into its own module and re-exports `Benchmark`/`Input`.
diskann-benchmark-runner/src/jobs.rs	Switches from `&dyn Input` to `input::Registered` wrapper for formatting/examples.
diskann-benchmark-runner/src/input.rs	New static-method `Input` trait plus type-erased wrapper for dynamic registry storage.
diskann-benchmark-runner/src/dispatcher/mod.rs	Shrinks dispatcher module to “dispatch rules” only; removes dynamic dispatcher API/docs/tests.
diskann-benchmark-runner/src/dispatcher/examples.rs	Deletes now-obsolete examples used by the removed dispatcher docs.
diskann-benchmark-runner/src/dispatcher/dispatch.rs	Deletes the generic multi-dispatch implementation.
diskann-benchmark-runner/src/dispatcher/api.rs	Makes score types `Eq/Ord` and removes `Map`/dispatcher-related APIs.
diskann-benchmark-runner/src/benchmark.rs	New `Benchmark` trait plus `DynBenchmark` wrapper for type-erased storage and JSON serialization.
diskann-benchmark-runner/src/app.rs	Updates benchmark listing and mismatch reporting to use the new registry APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T19:01:57Z

diskann-benchmark-runner/tests/test-mismatch-1/stdout.txt

+    2. "type-bench-i8": expected "int8" but found "float16"
+    3. "exact-type-bench-f32-1000": expected "float32" but found "float16"
+
+could not find find a benchmark for all inputs


The stdout fixture includes the message "could not find find a benchmark for all inputs" (duplicated "find"). If this is meant to be user-facing CLI output, consider fixing the typo in the application error message and updating this golden output accordingly.

Suggested change

could not find find a benchmark for all inputs

could not find a benchmark for all inputs

Copilot · 2026-03-26T19:01:58Z

diskann-benchmark-runner/src/app.rs

-                for (name, method) in benchmarks.methods() {
-                    writeln!(output, "    {}: {}", name, method.signatures()[0])?;
+                for (name, description) in benchmarks.names() {
+                    writeln!(output, "    {}: {}", name, description)?;


description can be multi-line (e.g., DynBenchmark::description writes a newline after the tag). Printing it inline with " {}: {}" causes subsequent lines to lose indentation and makes benchmarks output hard to read. Consider either making Benchmarks::names() return a single-line description, or splitting/indenting all lines when printing.

Suggested change

writeln!(output, " {}: {}", name, description)?;

let mut lines = description.lines();

if let Some(first) = lines.next() {

writeln!(output, " {}: {}", name, first)?;

for line in lines {

writeln!(output, " {}", line)?;

}

}

Copilot · 2026-03-26T19:01:58Z

diskann-benchmark-runner/src/benchmark.rs

+                ),
+            },
+            None => {
+                writeln!(f, "tag \"{}\"", <T::Input as Input>::tag())?;


DynBenchmark::description uses writeln! for the tag line, which bakes a newline into the captured description string used by Benchmarks::names(). This leads to multi-line descriptions without consistent indentation in the CLI. Consider using write! (and letting T::description manage newlines) or otherwise ensuring callers can format multi-line descriptions cleanly.

Suggested change

writeln!(f, "tag \"{}\"", <T::Input as Input>::tag())?;

write!(f, "tag \"{}\"", <T::Input as Input>::tag())?;

Copilot · 2026-03-26T19:01:59Z

diskann-benchmark-runner/src/test.rs

-        Type::<T>::try_match(&from.data_type)
+    fn try_match(input: &TypeInput) -> Result<MatchScore, FailureScore> {
+        // Try to match based on data type.
+        // Add a small penalty to `ExactTypeBench` can be more specific if it hits.


The comment reads "Add a small penalty to ExactTypeBench can be more specific if it hits." This looks like a grammatical mix-up and also seems to describe the opposite of what the code does (the penalty is applied to TypeBench so ExactTypeBench wins when both match). Consider rewording for clarity.

Suggested change

// Add a small penalty to `ExactTypeBench` can be more specific if it hits.

// Add a small penalty to `TypeBench` so `ExactTypeBench` is preferred when both match.

Copilot · 2026-03-26T19:01:59Z

diskann-benchmark-runner/src/registry.rs

    /// Register `input` in the registry.
    ///
    /// Returns an error if any other input with the same [`Input::tag()`] has been registered
    /// while leaving the underlying registry unchanged.
-    pub fn register<T>(&mut self, input: T) -> anyhow::Result<()>
+    pub fn register<T>(&mut self) -> anyhow::Result<()>


Inputs::register no longer takes an input value, but the doc comment still says "Register input in the registry". This is misleading now that registration is type-driven; update the wording to reflect register::<T>() (or reintroduce a parameter if that was intended).

codecov-commenter · 2026-03-26T19:12:17Z

Codecov Report

❌ Patch coverage is 75.24752% with 100 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.29%. Comparing base (b616aa3) to head (6d2ce10).

Files with missing lines	Patch %	Lines
diskann-benchmark/src/backend/index/benchmarks.rs	46.72%	65 Missing ⚠️
diskann-benchmark-runner/src/input.rs	78.37%	8 Missing ⚠️
diskann-benchmark-simd/src/lib.rs	73.07%	7 Missing ⚠️
diskann-benchmark/src/utils/mod.rs	45.45%	6 Missing ⚠️
diskann-benchmark-runner/src/dispatcher/mod.rs	33.33%	4 Missing ⚠️
diskann-benchmark-runner/src/registry.rs	93.54%	4 Missing ⚠️
diskann-benchmark-runner/src/test.rs	96.15%	3 Missing ⚠️
diskann-benchmark-runner/src/benchmark.rs	97.22%	1 Missing ⚠️
diskann-benchmark/src/backend/filters/benchmark.rs	88.88%	1 Missing ⚠️
diskann-benchmark/src/inputs/mod.rs	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #865      +/-   ##
==========================================
- Coverage   90.45%   89.29%   -1.16%     
==========================================
  Files         442      442              
  Lines       83248    82914     -334     
==========================================
- Hits        75301    74039    -1262     
- Misses       7947     8875     +928

Flag	Coverage Δ
miri	`89.29% <75.24%> (-1.16%)`	⬇️
unittests	`89.13% <75.24%> (-1.28%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
diskann-benchmark-runner/src/app.rs	`79.32% <100.00%> (+1.28%)`	⬆️
diskann-benchmark-runner/src/dispatcher/api.rs	`100.00% <ø> (ø)`
diskann-benchmark-runner/src/jobs.rs	`96.49% <100.00%> (ø)`
diskann-benchmark-runner/src/result.rs	`97.81% <ø> (-1.14%)`	⬇️
diskann-benchmark-runner/src/utils/datatype.rs	`98.98% <ø> (ø)`
diskann-benchmark-simd/src/bin.rs	`81.57% <100.00%> (ø)`
diskann-benchmark/src/backend/exhaustive/minmax.rs	`100.00% <ø> (ø)`
...iskann-benchmark/src/backend/exhaustive/product.rs	`100.00% <ø> (ø)`
...kann-benchmark/src/backend/exhaustive/spherical.rs	`100.00% <ø> (ø)`
diskann-benchmark/src/backend/index/product.rs	`100.00% <ø> (ø)`
... and 16 more

... and 38 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Cleanup diskann-benchmark-runner and friends.

6d2ce10

hildebrandmw requested review from a team and Copilot March 26, 2026 18:56

Copilot started reviewing on behalf of hildebrandmw March 26, 2026 18:57 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup `diskann-benchmark-runner` and friends.#865

Cleanup `diskann-benchmark-runner` and friends.#865
hildebrandmw wants to merge 1 commit intomainfrom
mhildebr/simplify-benchmark

hildebrandmw commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

codecov-commenter commented Mar 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	could not find find a benchmark for all inputs
	could not find a benchmark for all inputs

-                    writeln!(output, "    {}: {}", name, description)?;
+                    let mut lines = description.lines();
+                    if let Some(first) = lines.next() {
+                        writeln!(output, "    {}: {}", name, first)?;
+                        for line in lines {
+                            writeln!(output, "        {}", line)?;
+                        }
+                    }

	writeln!(f, "tag \"{}\"", <T::Input as Input>::tag())?;
	write!(f, "tag \"{}\"", <T::Input as Input>::tag())?;

	// Add a small penalty to `ExactTypeBench` can be more specific if it hits.
	// Add a small penalty to `TypeBench` so `ExactTypeBench` is preferred when both match.

Conversation

hildebrandmw commented Mar 26, 2026

Why

Suggested Reviewing Order

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Mar 26, 2026 •

edited

Loading