Skip to content

feat(beir-bench): within-query-threaded sign baseline + DistL2 HNSW#244

Merged
project-navi-bot merged 5 commits into
mainfrom
feat/beir-threaded-sign-distl2
Jun 16, 2026
Merged

feat(beir-bench): within-query-threaded sign baseline + DistL2 HNSW#244
project-navi-bot merged 5 commits into
mainfrom
feat/beir-threaded-sign-distl2

Conversation

@Fieldnote-Echo

Copy link
Copy Markdown
Owner

Summary

Two benchmark-harness improvements, both scoped to benchmarks/ (the core ordvec crate is untouched):

1. New method sign-rq2-threaded

A within-query-threaded variant of the sign-rq2 two-stage baseline. Doc-major sign codes are scanned in parallel over doc-stripes with hardware VPOPCNTDQ (parallel agreement histogram → global top-M threshold), then an exact-m candidate selection (count desc, id asc tie-break) reproduces the serial sign-rq2 candidate set byte-for-byte.

This isolates the within-query threading speedup from any quality change — the two methods rerank the identical candidate set, so nDCG/recall are unchanged and only latency differs. The serial sign-rq2 scans single-threaded per query; this version spreads one query across the rayon pool.

Verified candidate-equivalence on trec-covid: top-100 overlap = 1.0000 (min 1.0000) vs sign-rq2.

2. Harden HNSW distance to DistL2

run_hnsw previously used DistDot, whose anndists implementation asserts on 1 - dot and panics on near-duplicate pairs whose unit-vector float dot product rounds just past 1.0 — rare at 171K docs, frequent approaching ~1M, where it corrupts the graph build.

Embeddings are unit-normalized, so min-L2 ≡ max-dot ≡ max-cosine (identical neighbors); DistL2 gives the same ranking without the fragile assertion. Score is now -distance (nearer = smaller L2 = higher score), preserving nearest-first ordering in the eval. HNSW nDCG@10 is unchanged within noise (0.7593 vs prior 0.7584).

Test plan

  • cargo build --manifest-path benchmarks/beir-bench/Cargo.toml --release
  • cargo clippy --manifest-path benchmarks/beir-bench/Cargo.toml --release --all-targets -- -D warnings (clean)
  • cargo fmt --manifest-path benchmarks/beir-bench/Cargo.toml --check (clean)
  • Ran --methods sign-rq2,sign-rq2-threaded on trec-covid → candidate sets byte-identical (top-100 overlap 1.0000)
  • HNSW DistL2 build no longer panics; nDCG@10 unchanged within bootstrap noise
  • Reviewer: re-run make benchmark-beir end-to-end on a clean machine

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@qodo-code-review

qodo-code-review Bot commented Jun 16, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0) 📜 Skill insights (0)

Grey Divider


Remediation recommended

1. Unchecked size overflow ✓ Resolved 🐞 Bug ☼ Reliability
Description
run_sign_threaded sizes hists_buf using unchecked usize multiplication (`threads *
(wpd*64+1)); in release builds this can wrap on extreme threads/dim` values and allocate a
too-small buffer. sign_scan_topm_par then computes used = stripes * hlen and slices
hists_buf[..used], which can panic if the allocation wrapped smaller than used.
Code

benchmarks/beir-bench/src/main.rs[R1359-1361]

+    let mut agree = vec![0u32; n_docs];
+    let mut hists_buf = vec![0u32; threads.max(1) * (wpd * 64 + 1)];
+    let mut poolv: Vec<(u32, u32)> = Vec::with_capacity(m * 2);
Evidence
The allocation uses unchecked multiplication, while later code derives a required prefix length
(used) and slices the buffer; if the allocation wraps smaller than used, the slice panics.

benchmarks/beir-bench/src/main.rs[1356-1361]
benchmarks/beir-bench/src/main.rs[1479-1485]
benchmarks/beir-bench/src/main.rs[1509-1514]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`hists_buf` is allocated with unchecked `usize` arithmetic (`threads * (wpd*64+1)`). In release builds, `usize` overflow wraps, which can create an undersized buffer and later panic when `sign_scan_topm_par` slices `hists_buf[..used]`.

## Issue Context
This is a benchmark harness, but `threads` and `dim` ultimately come from config/data; failing loudly with a clear message (instead of wrapping then panicking later) makes the method more robust.

## Fix Focus Areas
- benchmarks/beir-bench/src/main.rs[1356-1361]
- benchmarks/beir-bench/src/main.rs[1479-1514]

## Suggested fix
- Compute `hlen` once as `wpd.checked_mul(64).and_then(|d| d.checked_add(1)).expect("...")`.
- Allocate `hists_buf` with `threads.max(1).checked_mul(hlen).expect("...")`.
- In `sign_scan_topm_par`, compute `used` via `stripes.checked_mul(hlen).expect("...")` (or assert `used <= hists_buf.len()` before slicing) so failures are deterministic and explanatory.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Rerank buffer-size mismatch ✓ Resolved 🐞 Bug ☼ Reliability
Description
run_sign_threaded allocates out_scores/out_indices for out_k = min(top_k, m, n_docs) but calls
RankQuant::search_asymmetric_subset_batched_serial_into with k = top_k, which asserts the
buffers are sized for top_k.min(n_docs). If m < top_k, this will panic at runtime on the
buffer-length assertion.
Code

benchmarks/beir-bench/src/main.rs[R1347-1386]

+    let out_k = top_k.min(m).min(n_docs);
+    let warmup = 5.min(n_queries);
+    let mut scratch = SubsetScratch::new();
+    let mut out_scores = vec![f32::NEG_INFINITY; batch * out_k];
+    let mut out_indices = vec![-1i64; batch * out_k];
+    let mut cand: Vec<u32> = Vec::new();
+    let mut agree = vec![0u16; n_docs];
+
+    let (samples, preds) = pool.install(|| {
+        time_and_collect(n_queries, batch, warmup, write_topk, |bs, be| {
+            let nq_batch = be - bs;
+            let needed = nq_batch * out_k;
+            if out_scores.len() != needed {
+                out_scores.resize(needed, f32::NEG_INFINITY);
+                out_indices.resize(needed, -1);
+            }
+            // Stage 1: per-query within-query-threaded sign scan -> exact-m CSR.
+            let mut offsets = Vec::with_capacity(nq_batch + 1);
+            let mut cand_flat: Vec<u32> = Vec::new();
+            offsets.push(0usize);
+            for qi in bs..be {
+                let q = &queries[qi * dim..(qi + 1) * dim];
+                let qcode = build_query_sign(q, wpd);
+                sign_scan_topm_par(
+                    &codes, wpd, n_docs, &qcode, m, threads, &mut agree, &mut cand,
+                );
+                cand_flat.extend_from_slice(&cand);
+                offsets.push(cand_flat.len());
+            }
+            // Stage 2: pooled subset rerank.
+            let batch_q = &queries[bs * dim..be * dim];
+            rq.search_asymmetric_subset_batched_serial_into(
+                batch_q,
+                &offsets,
+                &cand_flat,
+                top_k,
+                &mut scratch,
+                &mut out_scores,
+                &mut out_indices,
+            );
Evidence
The threaded baseline sizes its rerank output buffers using out_k = top_k.min(m).min(n_docs) but
calls the rerank API with k=top_k. The rerank API asserts output buffers are sized for
k.min(n_docs) (not min(k, m)), so when m < top_k the assertion fails and the benchmark panics.

benchmarks/beir-bench/src/main.rs[1347-1386]
src/quant.rs[1288-1302]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`run_sign_threaded` sizes `out_scores`/`out_indices` based on `out_k = top_k.min(m).min(n_docs)` but passes `top_k` as `k` into `RankQuant::search_asymmetric_subset_batched_serial_into`. That function computes its own `out_k = k.min(self.n_vectors)` and asserts the output buffers are exactly `nq * out_k` long; when `m < top_k`, the buffers are too short and the benchmark panics.

### Issue Context
This is a benchmark-harness runtime failure triggered by configs where candidate budget `m` is smaller than `top_k`.

### Fix Focus Areas
- benchmarks/beir-bench/src/main.rs[1347-1386]

### Proposed fix
Keep the current buffer sizing, but pass `out_k` (not `top_k`) as the `k` argument to `search_asymmetric_subset_batched_serial_into`. The surrounding padding logic already pads results up to `top_k`, so this preserves the output shape while avoiding the assertion.

(Alternative: allocate buffers for `top_k.min(n_docs)` instead, but passing `out_k` is the minimal change.)

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. u16 agreement truncation 🐞 Bug ≡ Correctness
Description
sign_scan_topm_par stores per-doc agreement counts in u16 and casts (dim - hamming) into
u16, which silently wraps for dim > 65535 and can produce wrong tau and wrong candidate sets.
The same function also allocates Vecs of length dim+1 for histograms, which becomes a
reliability/perf hazard as dim grows.
Code

benchmarks/beir-bench/src/main.rs[R1442-1485]

+fn sign_scan_topm_par(
+    codes: &[u64],
+    wpd: usize,
+    n: usize,
+    qcode: &[u64],
+    m: usize,
+    threads: usize,
+    agree: &mut [u16],
+    out: &mut Vec<u32>,
+) {
+    use rayon::prelude::*;
+    let dim = wpd * 64;
+    let t = threads.max(1).min(n.max(1));
+    let chunk = n.div_ceil(t).max(1);
+    // Phase A: ONE parallel pass over the codes -> per-doc agreement.
+    agree[..n]
+        .par_chunks_mut(chunk)
+        .enumerate()
+        .for_each(|(ci, slot)| {
+            let d0 = ci * chunk;
+            #[cfg(target_arch = "x86_64")]
+            {
+                if std::is_x86_feature_detected!("avx512vpopcntdq") {
+                    unsafe { scan_agree_avx512(codes, wpd, d0, qcode, slot) };
+                    return;
+                }
+            }
+            for (li, a) in slot.iter_mut().enumerate() {
+                let base = (d0 + li) * wpd;
+                let mut ham = 0u32;
+                for w in 0..wpd {
+                    ham += (codes[base + w] ^ qcode[w]).count_ones();
+                }
+                *a = (dim as u32 - ham) as u16;
+            }
+        });
+    // Parallel per-stripe histogram + merge -> global top-M threshold tau.
+    let hists: Vec<Vec<u32>> = agree[..n]
+        .par_chunks(chunk)
+        .map(|slot| {
+            let mut h = vec![0u32; dim + 1];
+            for &a in slot {
+                h[a as usize] += 1;
+            }
Evidence
The new benchmark path uses a u16 agreement scratch and casts computed agreement into u16, which
cannot represent the full range of agreement counts when dim is large. In contrast, the core
SignBitmap implementation and persistence bounds explicitly allow much larger dims and use
u32-based scoring, demonstrating this benchmark implementation is not dimension-safe as written.

benchmarks/beir-bench/src/main.rs[1343-1350]
benchmarks/beir-bench/src/main.rs[1442-1493]
src/rank_io.rs[121-137]
src/sign_bitmap.rs[205-241]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`sign_scan_topm_par` computes agreement counts as `dim - popcount(xor)` but stores them in a `u16` scratch and casts the computed value to `u16`. For dimensions above `u16::MAX` this truncates/wraps and can yield incorrect thresholds and candidate selection. Additionally, the current histogram strategy allocates arrays of length `dim + 1`, which scales linearly with `dim` and can cause excessive memory usage for large dimensions.

### Issue Context
The core crate supports sign-bitmap dimensions up to `MAX_SIGN_BITMAP_DIM` (16,777,216) and uses `u32` for sign scores, so the benchmark implementation should either (a) enforce a small-dimension precondition explicitly, or (b) use types/algorithms that remain correct for larger dims.

### Fix Focus Areas
- benchmarks/beir-bench/src/main.rs[1442-1526]
- src/rank_io.rs[121-137]
- src/sign_bitmap.rs[205-241]

### Suggested fix approach
- Change the agreement scratch from `Vec<u16>` / `&mut [u16]` to `Vec<u32>` / `&mut [u32]` and remove the `as u16` casts.
- Replace the `dim+1` histogram approach with a selection-based `tau` computation that does not allocate `O(dim)` memory (e.g., compute `tau` via `select_nth_unstable` on a reusable scratch copy of the agreement values), **or** add an explicit `assert!(dim <= u16::MAX as usize)` + `assert!(dim is reasonably small)` if this benchmark is intentionally limited to small embedding widths.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@qodo-code-review

Copy link
Copy Markdown

PR Summary by Qodo

BEIR bench: threaded sign→rq2 baseline and safer DistL2 HNSW
✨ Enhancement 🐞 Bug fix ⚙️ Configuration changes 🕐 40+ Minutes

Grey Divider

Walkthroughs

Description
• Add sign-rq2-threaded method: within-query parallel sign scan with deterministic exact-m
  candidates.
• Switch HNSW distance to DistL2 and rank by -distance to prevent panics.
• Register ordvec-sign-rq2-threaded in beir_eval family metadata for reporting.
Diagram
graph TD
  A["beir-bench main"] --> B["Method switch"] --> C["run_sign_threaded"] --> D["sign_scan_topm_par"] --> E["RankQuant rerank"] --> F["finalize outputs"] --> H["beir_eval.py family_meta"]
  B --> G["run_hnsw (DistL2)"] --> F
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Keep DistDot but clamp distance inputs
  • ➕ Preserves dot-product distance semantics exactly
  • ➕ Minimal behavior change (score mapping can stay as 1-dot)
  • ➖ Requires patching/overriding dependency behavior or forking
  • ➖ Still leaves numerical edge cases (NaN/Inf handling) to define explicitly
2. Use cosine distance implementation
  • ➕ Explicitly matches the unit-normalized embedding assumption
  • ➕ Commonly used in ANN libraries and literature
  • ➖ Often implemented via dot anyway; may still face precision edge cases
  • ➖ May not be available in the current HNSW backend without custom plumbing
3. Hybrid parallelism (query-parallel + within-query stripes)
  • ➕ Can improve CPU utilization across a wider range of batch sizes
  • ➕ Allows tuning for different doc/query regimes
  • ➖ More tuning knobs and scheduling complexity
  • ➖ Harder to guarantee byte-identical candidate sets vs the serial baseline

Recommendation: The PR’s approach is appropriate for the stated goals: switching HNSW to DistL2 is the simplest way to avoid DistDot’s near-duplicate precision panic while preserving neighbor ordering for unit-normalized vectors, and scoring as -distance keeps the evaluator’s “higher-is-better” convention. For the new baseline, the deterministic exact-m selection (count desc, id asc) is the right design choice to isolate latency improvements from quality changes; alternative parallelization strategies are possible, but they increase complexity and can compromise candidate equivalence.

Grey Divider

File Changes

Enhancement (1)
main.rs Add sign-rq2-threaded baseline and harden HNSW with DistL2 scoring +303/-6

Add sign-rq2-threaded baseline and harden HNSW with DistL2 scoring

• Adds a new method slug (sign-rq2-threaded) that performs within-query parallel sign-code scanning, computes a global top-M threshold via histogramming, and applies deterministic EXACT-m trimming to match the serial sign-rq2 candidate set. Introduces AVX-512 VPOPCNTDQ-assisted agreement scanning with a portable fallback. Switches HNSW from DistDot to DistL2 and updates score mapping to -distance to avoid precision-triggered panics on near-duplicate vectors.

benchmarks/beir-bench/src/main.rs


Other (1)
beir_eval.py Register ordvec-sign-rq2-threaded in method family metadata +6/-0

Register ordvec-sign-rq2-threaded in method family metadata

• Adds a family_meta entry so the new ordvec-sign-rq2-threaded method is labeled correctly in evaluation summaries and comparison matrices.

benchmarks/beir/beir_eval.py


Grey Divider

Qodo Logo

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a within-query-threaded baseline for the sign->rq2 search method in the BEIR benchmarks, and switches the HNSW distance metric from DistDot to DistL2 to prevent panics on near-duplicate vector pairs. The review feedback suggests two key improvements: parallelizing the sequential initialization of codes using Rayon to speed up index build times, and refactoring the manual chunking and merging of filtered candidates in sign_scan_topm_par to use a cleaner, more idiomatic Rayon parallel iterator pipeline.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread benchmarks/beir-bench/src/main.rs
Comment thread benchmarks/beir-bench/src/main.rs Outdated
Add `sign-rq2-threaded`: a within-query-threaded SignBitmap -> RankQuant b=2
method. Doc-major sign codes are scanned in parallel over doc-stripes with
hardware VPOPCNTDQ, then an exact-m candidate selection (count desc, id asc
tie-break) reproduces the serial `sign-rq2` set byte-for-byte (top-100 overlap
1.0000) -- isolating the within-query threading speedup from any quality change.

Harden HNSW to DistL2 (was DistDot). Embeddings are unit-normalized, so
min-L2 == max-dot == max-cosine (identical neighbors), but DistL2 avoids
anndists' DistDot `1-dot` distance assertion that panics on near-duplicate
pairs whose float dot rounds just past 1.0 -- rare at 171K docs, frequent near
1M. Score becomes `-distance` (nearer = higher), preserving nearest-first order.

beir_eval: add the family_meta entry for ordvec-sign-rq2-threaded.

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
@Fieldnote-Echo Fieldnote-Echo force-pushed the feat/beir-threaded-sign-distl2 branch from 97386e3 to 8091e26 Compare June 16, 2026 15:37
Address PR #244 bot review (gemini/qodo):
- qodo: sign_scan_topm_par stored agreement (= dim - hamming) as u16 and
  cast silently, which would wrap for dim > 65535 into a wrong tau /
  candidate set. Add a fail-loud assert on the precondition (BEIR /
  embedding dims are <= ~4096, far below this) covering both the scalar
  path and the AVX-512 kernel reached only from this fn.
- gemini: parallelize the previously sequential doc-major sign-code build
  loop with rayon (one wpd-word stripe per doc), matching the adjacent
  rq.add which is already parallel.

Benchmark harness only; core ordvec crate untouched.

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
@Fieldnote-Echo

Copy link
Copy Markdown
Owner Author

Remediation wave for the bot review, all scoped to `benchmarks/` (core crate untouched), pushed in 42ae55d:

  • qodo — `u16` agreement truncation: `sign_scan_topm_par` now asserts `dim <= u16::MAX` (fail-loud) before the `(dim - hamming) as u16` cast, covering both the scalar path and the AVX-512 kernel reached only from that function. BEIR/embedding dims are ≤ ~4096, far below 65535, so this is a precondition guard rather than a reachable wrap; the u16 agreement buffer + `dim+1` histogram are retained for the bandwidth-bound scan.
  • gemini — sequential `codes` build: parallelized with `codes.par_chunks_mut(wpd).enumerate().for_each(...)` (one `wpd`-word stripe per doc), matching the adjacent `rq.add`. (resolved inline)
  • gemini — Phase-B chunk/merge → `par_iter` pipeline: kept explicit for stripe-consistency with Phase A + the histogram phase. (resolved inline with rationale)

Gate: `cargo fmt --check` + `cargo clippy --release --all-targets -- -D warnings` + `cargo build --release` all clean on `benchmarks/beir-bench`.

@Fieldnote-Echo

Copy link
Copy Markdown
Owner Author

/gemini review

@Fieldnote-Echo

Copy link
Copy Markdown
Owner Author

/review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a within-query-threaded sign->rq2 baseline (sign-rq2-threaded) to the BEIR benchmark suite, leveraging parallelized sign-agreement scans with hardware VPOPCNTDQ and AVX-512 optimizations. It also updates the HNSW implementation to use DistL2 instead of DistDot to prevent panics on near-duplicate vector pairs. The review feedback focuses on optimizing the newly added sign_scan_topm_par function to eliminate memory allocation overhead in the benchmark hot path by pre-allocating and reusing buffers for histograms and candidate pools, as well as optimizing Phase B to use a sequential scan.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread benchmarks/beir-bench/src/main.rs Outdated
Comment thread benchmarks/beir-bench/src/main.rs
Comment thread benchmarks/beir-bench/src/main.rs
@qodo-code-review

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit 42ae55d

Address PR #244 re-review (gemini HIGH + qodo) on 42ae55d:

- qodo (u16 truncation): widen the agreement buffer + AVX-512 kernel from
  u16 to u32, eliminating the (dim - hamming) cast entirely (the prior
  assert didn't satisfy the static check). u32 is ~free here -- the
  agreement buffer is a few % of the codes traffic the scan streams.
- gemini (per-query allocs in the timed hot path): make sign_scan_topm_par
  allocation-free by threading caller-owned reusable scratch -- a per-stripe
  histogram buffer (hists_buf) and the candidate pool (poolv). Histogram tau
  now sums per-stripe columns directly (no merge buffer); Phase B stays
  PARALLEL via par_extend into the reused poolv (rejected gemini's
  sequential-Phase-B suggestion: it would serialize an O(n) pass and hurt
  the threaded baseline's latency at 1M+ docs).
- qodo (rerank buffer-size mismatch): the rerank was called with top_k but
  buffers are sized out_k = min(top_k, m/candidates, n_docs); pass out_k
  instead in BOTH the threaded and the serial sign-rq2 baselines, so the
  length assert can't panic when the candidate budget is below top_k.

Candidate-equivalence preserved (select_exact_m imposes a strict total
order, so the parallel extract order is irrelevant). Verified on scifact
full corpus: sign-rq2 vs sign-rq2-threaded top-100 overlap = 1.0000
(min 1.0000), 300/300 exact-identical ranked lists.

Benchmark harness only; core ordvec crate untouched.

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
@Fieldnote-Echo

Copy link
Copy Markdown
Owner Author

Second remediation wave (commit 750e97c), still benchmarks/-only (core crate untouched):

qodo

  • u16 agreement truncation: widened the agreement buffer + AVX-512 kernel u16u32, removing the (dim - hamming) cast entirely (the prior assert! did not satisfy the static check). u32 is ~free here — the agreement scratch is a few % of the codes traffic the scan streams.
  • Rerank buffer-size mismatch: the rerank was called with top_k while the score/index buffers are sized out_k = min(top_k, budget, n_docs); now passes out_k in both the threaded and the pre-existing serial sign-rq2 baselines, so the length assert cannot panic when the candidate budget is below top_k.

gemini (HIGH, per-query allocations)

  • sign_scan_topm_par is now allocation-free in the timed path: caller-owned reusable hists_buf + poolv threaded in, tau computed by summing per-stripe histogram columns (no merge buffer). Phase B kept parallel (par_extend) — the sequential-scan suggestion was declined because it would serialize an O(n) pass and raise latency at 1M+ docs. (resolved inline)

Equivalence preserved & verified: select_exact_m imposes a strict total order, so the parallel extract order is irrelevant. scifact full corpus, sign-rq2 vs sign-rq2-threaded: top-100 overlap = 1.0000 (min 1.0000), 300/300 exact-identical ranked lists.

Gate: cargo fmt --check + cargo clippy --release --all-targets -- -D warnings + cargo build --release all clean.

@Fieldnote-Echo

Copy link
Copy Markdown
Owner Author

/gemini review

@Fieldnote-Echo

Copy link
Copy Markdown
Owner Author

/review

@gemini-code-assist

Copy link
Copy Markdown

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@qodo-code-review

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit 750e97c

qodo re-review (750e97c): guard the hists_buf allocation length with
checked_mul so a pathological threads/dim can't wrap usize into a
too-small buffer in release (unreachable on 64-bit native, but matches
the core crate's util::checked_* convention and fails loud).

Benchmark harness only; core ordvec crate untouched.

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
@Fieldnote-Echo

Copy link
Copy Markdown
Owner Author

Final wave (commit f95f294), benchmarks/-only:

  • qodo "Unchecked size overflow" (⭐ new): the hists_buf length is now built with checked_mul(...).expect(...), so a pathological threads/dim fails loud instead of wrapping usize into a too-small allocation in release. (Unreachable on 64-bit native — ≈256×4097 vs usize::MAX — but it matches the core crate's util::checked_* convention and is a one-liner.)
  • qodo "u16 agreement truncation": this is stale from the 42ae55d iteration — the threaded sign path was widened to u32 in 750e97c and there is no u16 left in it (the only u16 in the file is the NumPy header reader). No action needed.
  • qodo "Rerank buffer-size mismatch": ✓ resolved in 750e97c (passes out_k in both baselines).

Gate clean (fmt / clippy -D warnings / release build). Candidate-equivalence re-verified earlier on scifact (top-100 overlap 1.0000, 300/300 exact). Closing the bot-review loop here.

Codex stop-gate: the dispatch only checked is_x86_feature_detected!(
"avx512vpopcntdq") but scan_agree_avx512 is #[target_feature(enable =
"avx512f,avx512vpopcntdq")]. Calling a target_feature fn is only sound
when the caller verifies EVERY enabled feature at runtime, so the
avx512f check was missing. Guard both, matching the core crate's
dispatch convention (lib.rs / multi_bucket.rs).

Re-verified candidate-equivalence on scifact (top-100 overlap 1.0000,
300/300 exact). Benchmark harness only; core crate untouched.

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
@project-navi-bot project-navi-bot merged commit f865d19 into main Jun 16, 2026
30 checks passed
@project-navi-bot project-navi-bot deleted the feat/beir-threaded-sign-distl2 branch June 16, 2026 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants