Cleanup diskann-benchmark-runner and friends.#865
Conversation
There was a problem hiding this comment.
Pull request overview
Refactors the benchmark framework to simplify input/benchmark registration and remove the previously generic multi-dispatch “dispatcher” machinery from the runner, replacing it with a Benchmark trait + type-erased wrappers and a simpler registry selection/mismatch diagnostic flow.
Changes:
- Convert
Inputto a static-method trait and update input registration to be type-driven (register::<T>()). - Introduce
Benchmark+DynBenchmarkwrapper and simplify benchmark selection/debugging inregistry::Benchmarks. - Migrate existing benchmarks (DiskANN benchmark crates + simd) and update/add golden tests for matching/mismatch behavior.
Reviewed changes
Copilot reviewed 41 out of 41 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-benchmark/src/utils/mod.rs | Updates stub_impl! to register stub benchmarks via the new Benchmark trait API. |
| diskann-benchmark/src/inputs/mod.rs | Removes proxy Input<T> helper and implements runner Input directly for input structs via macro. |
| diskann-benchmark/src/inputs/filters.rs | Updates input registration to registry.register::<T>(). |
| diskann-benchmark/src/inputs/exhaustive.rs | Updates input registration to registry.register::<T>(). |
| diskann-benchmark/src/inputs/disk.rs | Updates input registration to registry.register::<T>(). |
| diskann-benchmark/src/inputs/async_.rs | Updates input registration to registry.register::<T>(). |
| diskann-benchmark/src/backend/index/spherical.rs | Migrates spherical-quant benchmarks to Benchmark trait and simplifies registration. |
| diskann-benchmark/src/backend/index/scalar.rs | Migrates scalar-quant benchmarks to Benchmark trait and simplifies registration. |
| diskann-benchmark/src/backend/index/product.rs | Migrates product-quant benchmarks to Benchmark trait and reduces macro usage. |
| diskann-benchmark/src/backend/index/benchmarks.rs | Removes old registration macros and migrates full-precision/dynamic benchmarks to Benchmark. |
| diskann-benchmark/src/backend/filters/benchmark.rs | Migrates metadata-index filter benchmark to Benchmark. |
| diskann-benchmark/src/backend/exhaustive/spherical.rs | Migrates exhaustive spherical benchmark to Benchmark. |
| diskann-benchmark/src/backend/exhaustive/product.rs | Migrates exhaustive PQ benchmark to Benchmark. |
| diskann-benchmark/src/backend/exhaustive/minmax.rs | Migrates exhaustive minmax benchmark to Benchmark. |
| diskann-benchmark/src/backend/disk_index/benchmarks.rs | Migrates disk-index benchmarks to Benchmark and simplifies registration. |
| diskann-benchmark-simd/src/lib.rs | Migrates simd benchmarks and input to new Input/Benchmark APIs; removes old adapter types. |
| diskann-benchmark-simd/src/bin.rs | Updates simd binary to register SimdOp input via register::<T>(). |
| diskann-benchmark-runner/tests/test-overload-0/stdout.txt | Adds golden output for overload resolution (best MatchScore wins). |
| diskann-benchmark-runner/tests/test-overload-0/stdin.txt | Adds runner invocation fixture for overload resolution test. |
| diskann-benchmark-runner/tests/test-overload-0/output.json | Adds expected JSON output for overload resolution test. |
| diskann-benchmark-runner/tests/test-overload-0/input.json | Adds input file fixture for overload resolution test. |
| diskann-benchmark-runner/tests/test-overload-0/README.md | Documents overload-resolution expectations for the new matching logic. |
| diskann-benchmark-runner/tests/test-mismatch-1/stdout.txt | Adds golden output for mismatch diagnostics (new description paths). |
| diskann-benchmark-runner/tests/test-mismatch-1/stdin.txt | Adds runner invocation fixture for mismatch test. |
| diskann-benchmark-runner/tests/test-mismatch-1/input.json | Adds input file fixture for mismatch test. |
| diskann-benchmark-runner/tests/test-mismatch-1/README.md | Documents mismatch diagnostic expectations. |
| diskann-benchmark-runner/tests/test-mismatch-0/stdout.txt | Updates mismatch diagnostics ordering/content to match new implementation. |
| diskann-benchmark-runner/tests/test-4/stdout.txt | Updates benchmark listing output to reflect new description-based listing. |
| diskann-benchmark-runner/src/utils/datatype.rs | Removes dispatcher::Map usage; keeps dispatch-rule-based matching utilities. |
| diskann-benchmark-runner/src/test.rs | Updates test inputs/benchmarks to new Input/Benchmark APIs; adds exact-match benchmark. |
| diskann-benchmark-runner/src/result.rs | Removes Checkpoint::empty() and its test now that registry no longer needs it. |
| diskann-benchmark-runner/src/registry.rs | Replaces dispatcher-backed benchmark registry with a simpler vector of type-erased benchmarks + scoring logic. |
| diskann-benchmark-runner/src/lib.rs | Moves Input into its own module and re-exports Benchmark/Input. |
| diskann-benchmark-runner/src/jobs.rs | Switches from &dyn Input to input::Registered wrapper for formatting/examples. |
| diskann-benchmark-runner/src/input.rs | New static-method Input trait plus type-erased wrapper for dynamic registry storage. |
| diskann-benchmark-runner/src/dispatcher/mod.rs | Shrinks dispatcher module to “dispatch rules” only; removes dynamic dispatcher API/docs/tests. |
| diskann-benchmark-runner/src/dispatcher/examples.rs | Deletes now-obsolete examples used by the removed dispatcher docs. |
| diskann-benchmark-runner/src/dispatcher/dispatch.rs | Deletes the generic multi-dispatch implementation. |
| diskann-benchmark-runner/src/dispatcher/api.rs | Makes score types Eq/Ord and removes Map/dispatcher-related APIs. |
| diskann-benchmark-runner/src/benchmark.rs | New Benchmark trait plus DynBenchmark wrapper for type-erased storage and JSON serialization. |
| diskann-benchmark-runner/src/app.rs | Updates benchmark listing and mismatch reporting to use the new registry APIs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 2. "type-bench-i8": expected "int8" but found "float16" | ||
| 3. "exact-type-bench-f32-1000": expected "float32" but found "float16" | ||
|
|
||
| could not find find a benchmark for all inputs No newline at end of file |
There was a problem hiding this comment.
The stdout fixture includes the message "could not find find a benchmark for all inputs" (duplicated "find"). If this is meant to be user-facing CLI output, consider fixing the typo in the application error message and updating this golden output accordingly.
| could not find find a benchmark for all inputs | |
| could not find a benchmark for all inputs |
| for (name, method) in benchmarks.methods() { | ||
| writeln!(output, " {}: {}", name, method.signatures()[0])?; | ||
| for (name, description) in benchmarks.names() { | ||
| writeln!(output, " {}: {}", name, description)?; |
There was a problem hiding this comment.
description can be multi-line (e.g., DynBenchmark::description writes a newline after the tag). Printing it inline with " {}: {}" causes subsequent lines to lose indentation and makes benchmarks output hard to read. Consider either making Benchmarks::names() return a single-line description, or splitting/indenting all lines when printing.
| writeln!(output, " {}: {}", name, description)?; | |
| let mut lines = description.lines(); | |
| if let Some(first) = lines.next() { | |
| writeln!(output, " {}: {}", name, first)?; | |
| for line in lines { | |
| writeln!(output, " {}", line)?; | |
| } | |
| } |
| ), | ||
| }, | ||
| None => { | ||
| writeln!(f, "tag \"{}\"", <T::Input as Input>::tag())?; |
There was a problem hiding this comment.
DynBenchmark::description uses writeln! for the tag line, which bakes a newline into the captured description string used by Benchmarks::names(). This leads to multi-line descriptions without consistent indentation in the CLI. Consider using write! (and letting T::description manage newlines) or otherwise ensuring callers can format multi-line descriptions cleanly.
| writeln!(f, "tag \"{}\"", <T::Input as Input>::tag())?; | |
| write!(f, "tag \"{}\"", <T::Input as Input>::tag())?; |
| Type::<T>::try_match(&from.data_type) | ||
| fn try_match(input: &TypeInput) -> Result<MatchScore, FailureScore> { | ||
| // Try to match based on data type. | ||
| // Add a small penalty to `ExactTypeBench` can be more specific if it hits. |
There was a problem hiding this comment.
The comment reads "Add a small penalty to ExactTypeBench can be more specific if it hits." This looks like a grammatical mix-up and also seems to describe the opposite of what the code does (the penalty is applied to TypeBench so ExactTypeBench wins when both match). Consider rewording for clarity.
| // Add a small penalty to `ExactTypeBench` can be more specific if it hits. | |
| // Add a small penalty to `TypeBench` so `ExactTypeBench` is preferred when both match. |
| /// Register `input` in the registry. | ||
| /// | ||
| /// Returns an error if any other input with the same [`Input::tag()`] has been registered | ||
| /// while leaving the underlying registry unchanged. | ||
| pub fn register<T>(&mut self, input: T) -> anyhow::Result<()> | ||
| pub fn register<T>(&mut self) -> anyhow::Result<()> |
There was a problem hiding this comment.
Inputs::register no longer takes an input value, but the doc comment still says "Register input in the registry". This is misleading now that registration is type-driven; update the wording to reflect register::<T>() (or reintroduce a parameter if that was intended).
Simplify benchmark running and registration by:
Moving the
Inputaway from adyn-compatible trait to a simple trait with methods that don't receive by&self. That is, instead ofwe get
The original pattern required an awkward proxy type for input registration (see the
Inputtype indiskann-benchmark). This new pattern allows the actual input structs to be used directly.Ditching the whole overly engineered
dispatcher(which is cool, but inappropriate for this task) and instead introducing aBenchmarktrait with the signature (see the code for doc-comments)The functions
try_matchanddescriptionmatch the originalDispatchRuletechnique, but instead of relying on aconvertmethod and a corresponding closure - we justrunthe benchmark directly.This almost entirely removes the need for the various
register!macros as registration is significantly more straightforward. Additionally, the input and output types are concrete!These changes remove a lot of boilerplate from
diskann-benchmark. No need to match on theAnytype anymore as this happens automatically. While updatingdiskann-benchmark, I was able to replace some of the macros for stamping out benchmarks with generics. Scalar and spherically-quantized index builds are still reliant on macros.Finally, the backend of
diskann-benchmark-runner(namelyregistry::Benchmarks) is cleaned up to not rely on the complicateddispatchermachinery for matching inputs and providing mismatch diagnostics. Instead, the logic is greatly simplified and just implemented inline.Why
While reviewing #857, I was pondering what it would take to add A/B comparison logic directly to our benchmark framework. My thinking is that it would be nice to have something like
Benchmarks that want A/B comparisons could implement
Compareand use aregistry::Benchmarks::register_comparedmethod to register a benchmark that participates in A/B comparison. The backend ofdiskann-benchmark-runnercould then use this to inspect the benchmarks that support comparison and this becomes an engine for A/B testing with concretely typed outputs!The problem, of course, is the current way of benchmark registration is a little overly complicated and makes this kind of extension way too messy. So view this PR as serving two purposes. First, we clean up an overly complicated portion of benchmarking and second, we enable evolution of the benchmarking framework.
Suggested Reviewing Order
I would actually recommend looking at the changes in
diskann-benchmarkbeforediskann-benchmark-runner. The amount of code that got chopped is quite satisfying. The changes here aren't perfect (the use of the'staticlifetime for benchmarks is a remnant of theDispatchRuleapproach and fully cleaning that up would be a larger change).Then, inside
diskann-benchmark-runner, I suggest:input.rs: New home of theInputtrait plus the machinery for dynamically dispatching to inputs.benchmark.rs: The newbenchmarktrait (plus associated dynamic dispatch machinery).registry.rs: Updates to the registry to reflect the new traits. In particular,Benchmarksis updated so the matching logic is much clear (and not hidden behind a dispatcher). I expanded the tests a little and added a few more baselines to exercise the matching logic more.dispatcher/: This directory has largely been made obsolete. TheDispatchRuletrait is preserved because it still has some utility in matching data types and composes somewhat with theBenchmarktrait. We could probably get rid of this directory entirely in the future, but for now I'm preserving it for the sake of a less insane diff.Feel free to skip
diskann-benchmark-simd.