Skip to content

Cleanup diskann-benchmark-runner and friends.#865

Open
hildebrandmw wants to merge 1 commit intomainfrom
mhildebr/simplify-benchmark
Open

Cleanup diskann-benchmark-runner and friends.#865
hildebrandmw wants to merge 1 commit intomainfrom
mhildebr/simplify-benchmark

Conversation

@hildebrandmw
Copy link
Contributor

Simplify benchmark running and registration by:

  • Moving the Input away from a dyn-compatible trait to a simple trait with methods that don't receive by &self. That is, instead of

    trait Input {
        fn tag(&self) -> &'static str;
        fn try_deserialize(&self, ...);
        fn example(&self);
    }

    we get

    trait Input {
        fn tag() -> &'static str;
        fn try_deserialize(...);
        fn example();
    }

    The original pattern required an awkward proxy type for input registration (see the Input type in diskann-benchmark). This new pattern allows the actual input structs to be used directly.

  • Ditching the whole overly engineered dispatcher (which is cool, but inappropriate for this task) and instead introducing a Benchmark trait with the signature (see the code for doc-comments)

    pub trait Benchmark {
        type Input: Input + 'static;
        type Output: Serialize;
    
        fn try_match(input: &Self::Input) -> Result<MatchScore, FailureScore>;
    
        fn description(
            f: &mut std::fmt::Formatter<'_>,
            input: Option<&Self::Input>,
        ) -> std::fmt::Result;
    
        fn run(
            input: &Self::Input,
            checkpoint: Checkpoint<'_>,
            output: &mut dyn Output,
        ) -> anyhow::Result<Self::Output>;
    }

    The functions try_match and description match the original DispatchRule technique, but instead of relying on a convert method and a corresponding closure - we just run the benchmark directly.

    This almost entirely removes the need for the various register! macros as registration is significantly more straightforward. Additionally, the input and output types are concrete!

These changes remove a lot of boilerplate from diskann-benchmark. No need to match on the Any type anymore as this happens automatically. While updating diskann-benchmark, I was able to replace some of the macros for stamping out benchmarks with generics. Scalar and spherically-quantized index builds are still reliant on macros.

Finally, the backend of diskann-benchmark-runner (namely registry::Benchmarks) is cleaned up to not rely on the complicated dispatcher machinery for matching inputs and providing mismatch diagnostics. Instead, the logic is greatly simplified and just implemented inline.

Why

While reviewing #857, I was pondering what it would take to add A/B comparison logic directly to our benchmark framework. My thinking is that it would be nice to have something like

trait Compare: Benchmark {
    type Tolerance: Input;
    fn compare(
        tolerances: &Self::Tolerance,
        before: &Self::Output,
        after: &Self::Output,
    ) -> SomeResult;
}

Benchmarks that want A/B comparisons could implement Compare and use a registry::Benchmarks::register_compared method to register a benchmark that participates in A/B comparison. The backend of diskann-benchmark-runner could then use this to inspect the benchmarks that support comparison and this becomes an engine for A/B testing with concretely typed outputs!

The problem, of course, is the current way of benchmark registration is a little overly complicated and makes this kind of extension way too messy. So view this PR as serving two purposes. First, we clean up an overly complicated portion of benchmarking and second, we enable evolution of the benchmarking framework.

Suggested Reviewing Order

I would actually recommend looking at the changes in diskann-benchmark before diskann-benchmark-runner. The amount of code that got chopped is quite satisfying. The changes here aren't perfect (the use of the 'static lifetime for benchmarks is a remnant of the DispatchRule approach and fully cleaning that up would be a larger change).

Then, inside diskann-benchmark-runner, I suggest:

  • input.rs: New home of the Input trait plus the machinery for dynamically dispatching to inputs.
  • benchmark.rs: The new benchmark trait (plus associated dynamic dispatch machinery).
  • registry.rs: Updates to the registry to reflect the new traits. In particular, Benchmarks is updated so the matching logic is much clear (and not hidden behind a dispatcher). I expanded the tests a little and added a few more baselines to exercise the matching logic more.
  • dispatcher/: This directory has largely been made obsolete. The DispatchRule trait is preserved because it still has some utility in matching data types and composes somewhat with the Benchmark trait. We could probably get rid of this directory entirely in the future, but for now I'm preserving it for the sake of a less insane diff.

Feel free to skip diskann-benchmark-simd.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the benchmark framework to simplify input/benchmark registration and remove the previously generic multi-dispatch “dispatcher” machinery from the runner, replacing it with a Benchmark trait + type-erased wrappers and a simpler registry selection/mismatch diagnostic flow.

Changes:

  • Convert Input to a static-method trait and update input registration to be type-driven (register::<T>()).
  • Introduce Benchmark + DynBenchmark wrapper and simplify benchmark selection/debugging in registry::Benchmarks.
  • Migrate existing benchmarks (DiskANN benchmark crates + simd) and update/add golden tests for matching/mismatch behavior.

Reviewed changes

Copilot reviewed 41 out of 41 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
diskann-benchmark/src/utils/mod.rs Updates stub_impl! to register stub benchmarks via the new Benchmark trait API.
diskann-benchmark/src/inputs/mod.rs Removes proxy Input<T> helper and implements runner Input directly for input structs via macro.
diskann-benchmark/src/inputs/filters.rs Updates input registration to registry.register::<T>().
diskann-benchmark/src/inputs/exhaustive.rs Updates input registration to registry.register::<T>().
diskann-benchmark/src/inputs/disk.rs Updates input registration to registry.register::<T>().
diskann-benchmark/src/inputs/async_.rs Updates input registration to registry.register::<T>().
diskann-benchmark/src/backend/index/spherical.rs Migrates spherical-quant benchmarks to Benchmark trait and simplifies registration.
diskann-benchmark/src/backend/index/scalar.rs Migrates scalar-quant benchmarks to Benchmark trait and simplifies registration.
diskann-benchmark/src/backend/index/product.rs Migrates product-quant benchmarks to Benchmark trait and reduces macro usage.
diskann-benchmark/src/backend/index/benchmarks.rs Removes old registration macros and migrates full-precision/dynamic benchmarks to Benchmark.
diskann-benchmark/src/backend/filters/benchmark.rs Migrates metadata-index filter benchmark to Benchmark.
diskann-benchmark/src/backend/exhaustive/spherical.rs Migrates exhaustive spherical benchmark to Benchmark.
diskann-benchmark/src/backend/exhaustive/product.rs Migrates exhaustive PQ benchmark to Benchmark.
diskann-benchmark/src/backend/exhaustive/minmax.rs Migrates exhaustive minmax benchmark to Benchmark.
diskann-benchmark/src/backend/disk_index/benchmarks.rs Migrates disk-index benchmarks to Benchmark and simplifies registration.
diskann-benchmark-simd/src/lib.rs Migrates simd benchmarks and input to new Input/Benchmark APIs; removes old adapter types.
diskann-benchmark-simd/src/bin.rs Updates simd binary to register SimdOp input via register::<T>().
diskann-benchmark-runner/tests/test-overload-0/stdout.txt Adds golden output for overload resolution (best MatchScore wins).
diskann-benchmark-runner/tests/test-overload-0/stdin.txt Adds runner invocation fixture for overload resolution test.
diskann-benchmark-runner/tests/test-overload-0/output.json Adds expected JSON output for overload resolution test.
diskann-benchmark-runner/tests/test-overload-0/input.json Adds input file fixture for overload resolution test.
diskann-benchmark-runner/tests/test-overload-0/README.md Documents overload-resolution expectations for the new matching logic.
diskann-benchmark-runner/tests/test-mismatch-1/stdout.txt Adds golden output for mismatch diagnostics (new description paths).
diskann-benchmark-runner/tests/test-mismatch-1/stdin.txt Adds runner invocation fixture for mismatch test.
diskann-benchmark-runner/tests/test-mismatch-1/input.json Adds input file fixture for mismatch test.
diskann-benchmark-runner/tests/test-mismatch-1/README.md Documents mismatch diagnostic expectations.
diskann-benchmark-runner/tests/test-mismatch-0/stdout.txt Updates mismatch diagnostics ordering/content to match new implementation.
diskann-benchmark-runner/tests/test-4/stdout.txt Updates benchmark listing output to reflect new description-based listing.
diskann-benchmark-runner/src/utils/datatype.rs Removes dispatcher::Map usage; keeps dispatch-rule-based matching utilities.
diskann-benchmark-runner/src/test.rs Updates test inputs/benchmarks to new Input/Benchmark APIs; adds exact-match benchmark.
diskann-benchmark-runner/src/result.rs Removes Checkpoint::empty() and its test now that registry no longer needs it.
diskann-benchmark-runner/src/registry.rs Replaces dispatcher-backed benchmark registry with a simpler vector of type-erased benchmarks + scoring logic.
diskann-benchmark-runner/src/lib.rs Moves Input into its own module and re-exports Benchmark/Input.
diskann-benchmark-runner/src/jobs.rs Switches from &dyn Input to input::Registered wrapper for formatting/examples.
diskann-benchmark-runner/src/input.rs New static-method Input trait plus type-erased wrapper for dynamic registry storage.
diskann-benchmark-runner/src/dispatcher/mod.rs Shrinks dispatcher module to “dispatch rules” only; removes dynamic dispatcher API/docs/tests.
diskann-benchmark-runner/src/dispatcher/examples.rs Deletes now-obsolete examples used by the removed dispatcher docs.
diskann-benchmark-runner/src/dispatcher/dispatch.rs Deletes the generic multi-dispatch implementation.
diskann-benchmark-runner/src/dispatcher/api.rs Makes score types Eq/Ord and removes Map/dispatcher-related APIs.
diskann-benchmark-runner/src/benchmark.rs New Benchmark trait plus DynBenchmark wrapper for type-erased storage and JSON serialization.
diskann-benchmark-runner/src/app.rs Updates benchmark listing and mismatch reporting to use the new registry APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

2. "type-bench-i8": expected "int8" but found "float16"
3. "exact-type-bench-f32-1000": expected "float32" but found "float16"

could not find find a benchmark for all inputs No newline at end of file
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stdout fixture includes the message "could not find find a benchmark for all inputs" (duplicated "find"). If this is meant to be user-facing CLI output, consider fixing the typo in the application error message and updating this golden output accordingly.

Suggested change
could not find find a benchmark for all inputs
could not find a benchmark for all inputs

Copilot uses AI. Check for mistakes.
for (name, method) in benchmarks.methods() {
writeln!(output, " {}: {}", name, method.signatures()[0])?;
for (name, description) in benchmarks.names() {
writeln!(output, " {}: {}", name, description)?;
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

description can be multi-line (e.g., DynBenchmark::description writes a newline after the tag). Printing it inline with " {}: {}" causes subsequent lines to lose indentation and makes benchmarks output hard to read. Consider either making Benchmarks::names() return a single-line description, or splitting/indenting all lines when printing.

Suggested change
writeln!(output, " {}: {}", name, description)?;
let mut lines = description.lines();
if let Some(first) = lines.next() {
writeln!(output, " {}: {}", name, first)?;
for line in lines {
writeln!(output, " {}", line)?;
}
}

Copilot uses AI. Check for mistakes.
),
},
None => {
writeln!(f, "tag \"{}\"", <T::Input as Input>::tag())?;
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DynBenchmark::description uses writeln! for the tag line, which bakes a newline into the captured description string used by Benchmarks::names(). This leads to multi-line descriptions without consistent indentation in the CLI. Consider using write! (and letting T::description manage newlines) or otherwise ensuring callers can format multi-line descriptions cleanly.

Suggested change
writeln!(f, "tag \"{}\"", <T::Input as Input>::tag())?;
write!(f, "tag \"{}\"", <T::Input as Input>::tag())?;

Copilot uses AI. Check for mistakes.
Type::<T>::try_match(&from.data_type)
fn try_match(input: &TypeInput) -> Result<MatchScore, FailureScore> {
// Try to match based on data type.
// Add a small penalty to `ExactTypeBench` can be more specific if it hits.
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment reads "Add a small penalty to ExactTypeBench can be more specific if it hits." This looks like a grammatical mix-up and also seems to describe the opposite of what the code does (the penalty is applied to TypeBench so ExactTypeBench wins when both match). Consider rewording for clarity.

Suggested change
// Add a small penalty to `ExactTypeBench` can be more specific if it hits.
// Add a small penalty to `TypeBench` so `ExactTypeBench` is preferred when both match.

Copilot uses AI. Check for mistakes.
Comment on lines 35 to +39
/// Register `input` in the registry.
///
/// Returns an error if any other input with the same [`Input::tag()`] has been registered
/// while leaving the underlying registry unchanged.
pub fn register<T>(&mut self, input: T) -> anyhow::Result<()>
pub fn register<T>(&mut self) -> anyhow::Result<()>
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inputs::register no longer takes an input value, but the doc comment still says "Register input in the registry". This is misleading now that registration is type-driven; update the wording to reflect register::<T>() (or reintroduce a parameter if that was intended).

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link

codecov-commenter commented Mar 26, 2026

Codecov Report

❌ Patch coverage is 75.24752% with 100 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.29%. Comparing base (b616aa3) to head (6d2ce10).

Files with missing lines Patch % Lines
diskann-benchmark/src/backend/index/benchmarks.rs 46.72% 65 Missing ⚠️
diskann-benchmark-runner/src/input.rs 78.37% 8 Missing ⚠️
diskann-benchmark-simd/src/lib.rs 73.07% 7 Missing ⚠️
diskann-benchmark/src/utils/mod.rs 45.45% 6 Missing ⚠️
diskann-benchmark-runner/src/dispatcher/mod.rs 33.33% 4 Missing ⚠️
diskann-benchmark-runner/src/registry.rs 93.54% 4 Missing ⚠️
diskann-benchmark-runner/src/test.rs 96.15% 3 Missing ⚠️
diskann-benchmark-runner/src/benchmark.rs 97.22% 1 Missing ⚠️
diskann-benchmark/src/backend/filters/benchmark.rs 88.88% 1 Missing ⚠️
diskann-benchmark/src/inputs/mod.rs 50.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #865      +/-   ##
==========================================
- Coverage   90.45%   89.29%   -1.16%     
==========================================
  Files         442      442              
  Lines       83248    82914     -334     
==========================================
- Hits        75301    74039    -1262     
- Misses       7947     8875     +928     
Flag Coverage Δ
miri 89.29% <75.24%> (-1.16%) ⬇️
unittests 89.13% <75.24%> (-1.28%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark-runner/src/app.rs 79.32% <100.00%> (+1.28%) ⬆️
diskann-benchmark-runner/src/dispatcher/api.rs 100.00% <ø> (ø)
diskann-benchmark-runner/src/jobs.rs 96.49% <100.00%> (ø)
diskann-benchmark-runner/src/result.rs 97.81% <ø> (-1.14%) ⬇️
diskann-benchmark-runner/src/utils/datatype.rs 98.98% <ø> (ø)
diskann-benchmark-simd/src/bin.rs 81.57% <100.00%> (ø)
diskann-benchmark/src/backend/exhaustive/minmax.rs 100.00% <ø> (ø)
...iskann-benchmark/src/backend/exhaustive/product.rs 100.00% <ø> (ø)
...kann-benchmark/src/backend/exhaustive/spherical.rs 100.00% <ø> (ø)
diskann-benchmark/src/backend/index/product.rs 100.00% <ø> (ø)
... and 16 more

... and 38 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants