BET 5 (SepRAG #534): PQ/IVFADC within-list pruning vs tuned IVF nprobe — scale-gated WIN (ADR-206) by shaal · Pull Request #542 · ruvnet/RuVector

shaal · 2026-06-05T15:33:17Z

BET 5 (SepRAG #534) — PQ/IVFADC within-list pruning vs tuned plain IVF `nprobe` — scale-gated WIN

A research finding, run under the thread's prove-not-hype protocol (one claim+number, beat a
tuned in-repo incumbent + a steelman, public data, pre-registered frozen gate, adversarial check,
honest caveats). No merge urgency — it adds a self-contained benchmark crate and an ADR; it does
not change any shipping code.

Stacked on #540 (BET 4): reuses the crates/ruvector-bet4-ivf-bench harness, so this PR's diff
includes #540's commits. The new work is the BET-5 commits (1d920b3a→87bf2d5e): src/pq.rs,
examples/pq_pruning_sweep.rs, tests/pq_gate.rs, the pre-registration, and ADR-206.

The lever

ADR-205 killed cluster-level triangle-inequality pruning vs tuned nprobe (the bound was
redundant with nprobe's centroid cutoff — 1.00× in every cell) and named one open lever:
within-list pruning via product-quantized / IVFADC asymmetric distance — a different mechanism.
This is that head-to-head.

Claim (frozen before any run, `docs/plans/bet5-ivf-pq/PRE-REGISTRATION.md`)

PQ/IVFADC within-list pruning (cheap ADC scan of the nprobe lists + exact-L2 re-rank of the
top-R) reaches matched recall@10 = 0.95 at ≥ 2× fewer full-L2-equivalent member-evals than the
strongest PQ-free incumbent, and wins wall-clock, across nclusters ∈ {64,256,1024} at ≥ one N ≥ 50k.

Result — WIN at N=100k (all three nclusters); ≥2× at nclusters∈{64,256} from n≈20–50k

Total full-L2-equivalent ratio (routing charged to both; best m per cell; wall-clock wins everywhere):

N	nc=64	nc=256	nc=1024
20k	2.51×	1.95×	1.33×
50k	3.20×	2.50×	1.65×
100k	3.38×	2.80×	2.03×

The win grows with N and the crossover n* rises with nclusters — a clean amortization
signature (routing ∝ nclusters, working set ∝ n/nclusters). Unlike ADR-205, the mechanism is
orthogonal to nprobe (cheapens the per-member distance, not the list selection).

Honesty (none buried — see ADR-206)

Win rides on the exact re-rank, not pure ADC (ADC ceiling ~0.48–0.52 at m=16). This is
IVFADC + refine (FAISS's standard design) validated to beat ruvector-rairs::IvfFlat — not a
new algorithm. The honest takeaway: add an IVFPQ + rerank path to IvfFlat.
Scale-gated: the full "all three nclusters" sweep only at n=100k; nc=1024/100k is a knife-edge 2.03×.
Steelman mattered: the early-abandon exact-L2 incumbent prunes 40–53% of dims and was the
cheaper baseline in every cell — it ~halved the naive-L2 ratio (would have read ~6×).
A routing-charge bug in my own harness (omitting nclusters centroid evals) was caught by the
pre-registered "no free routing" check — it had inflated nc=1024/50k from a true 1.65× to 2.24×.
Fixed; the table above charges routing throughout.

Correctness gates (before any claim)

PQ@full-rerank exact (recall ≥ 0.999); early-abandon steelman exact vs full-L2; PQ shares the
k-means index with the incumbent by construction. 6 tests green, clippy clean.

Scoreboard: 3 WINS (ADR-200/202, ADR-204, ADR-206) / 4 KILLS (ADR-199, ADR-201, ADR-203, ADR-205).

Closes the ADR-205 "within-list / PQ" open lever. Reproduce: cargo run --release -p ruvector-bet4-ivf-bench --example pq_pruning_sweep -- 20000 50000 100000 (public ogbn-arxiv 128-d).

Closes the BET 4 caveat left open by ADR-201: the region-pruning IVF kernel was only run against ACORN (BET 2), never against its natural incumbent, plain IVF nprobe, on unfiltered ANN. Frozen gate: WIN = >=2x member-scan reduction at matched recall@10 (R=0.95) AND wall-clock win across nclusters in {64,256,1024}; KILL = <1.5x or wall-clock reverses. Two controls: exact-vs-exact pruning-fraction probe + low-d (PCA-8) soundness control. Honest prior: NO-GO lean (128-d concentration makes the triangle-inequality bound loose) — the IVF-level companion to ADR-199. Branch off clean main; B&B kernel rebuilt self-contained (BET 2's lives only on ruvnet#536).

…s certified) New crate ruvector-bet4-ivf-bench (deps: ruvector-rairs, rand). - data.rs: aligned arxiv 128-d feature CSV loader. - kernel.rs: BnBIvf — IVF probed in ascending lower-bound order with B&B early termination (break when LB >= kth-best); LB(q,c)=max(0,|q-mu_c|-r_c), r_c=max member radius. Full budget = exact; max_probe cap = nprobe analogue. Built on ruvector-rairs kmeans so it shares centroids with the IvfFlat incumbent (shared-index pre-reg requirement). - oracle.rs: brute-force exact kNN + recall@k + shared true-L2 helper. - M0 gate test PASSES on real arxiv slice: full-budget B&B == oracle (recall@10 >= 0.999) → B&B invariant certified. clippy clean. Frozen gate: docs/plans/bet4-ivf-pruning/PRE-REGISTRATION.md. Off clean main.

…aithfulness gate BnBIvf::search_nprobe: the plain-IVF incumbent strategy (nprobe nearest centroids, scan all members, no B&B) on the SAME centroids/lists as the B&B contender, with member-eval counting. Refactored top-k accumulation into shared consider()/finalize() so both strategies accumulate identically and only the probe loop differs (shared-index pre-reg requirement). New gate instrumented_nprobe_matches_rairs PASSES: recall matches ruvector-rairs::IvfFlat within 0.01 at matched params → the cost-measured incumbent is algorithmically the real one. 3 tests green.

- kernel: search_bnb_skip — the STEELMAN. Centroid-distance order (the effective nprobe ordering) + per-cluster LB-skip (correctness-safe in any order, unlike the LB-order global break). The strongest cluster-level B&B: if it can't beat tuned nprobe, the bound doesn't pay. - pca: minimal power-iteration top-m PCA (no linalg dep) for the low-dim control — projects real arxiv features to 8-d where the bound is tight. - examples/ivf_pruning_sweep: 3 contenders share one index per nclusters (plain nprobe / B&B LB-order / B&B steelman) x 2 regimes (128-d, PCA-8), exact-regime pruning probe, matched-recall@0.95, frozen-gate verdict. RESULT (n=20k & n=50k both): steelman = 1.00x evals vs nprobe in EVERY cell, BOTH regimes. NO-GO. Mechanism is structural, not dimensional: the LB bound only prunes FAR clusters that tuned nprobe already skips, so it's redundant with nprobe's centroid-distance cutoff. Exact-prune fraction scales correctly with dim (0-13% @128-d, 8-87% @PCA-8) => kernel sound; the redundancy is fundamental. LB-ORDER (faithful BET-2 kernel) is strictly WORSE (0.18-0.25x) — LB-ordering probes far large-radius clusters early.

…l NO-GO Verdict: NO-GO (robust, structural). Steelman B&B (centroid order + LB-skip) ties tuned nprobe at exactly 1.00x member-evals in every cell, n=20k & n=50k, 128-d & PCA-8. Mechanism: the triangle-inequality bound only prunes FAR clusters that tuned nprobe already skips => redundant with nprobe's centroid-distance cutoff; win is structurally impossible, not just hard in high-d. LB-order (faithful BET-2 kernel) strictly worse (0.18-0.25x). Companion to ADR-199. Honest deviation recorded: the pre-registered PCA-8 control expected a B&B WIN (tight bound). It tied instead — the premise was false (tight bound beats full-scan, not tuned nprobe). Control still valid: exact-prune fraction scales correctly with dim (0-13% @128-d, 8-82% @PCA-8) => kernel sound; it revealed the structural redundancy. Scoreboard 2 WINS / 4 KILLS.

…s tuned nprobe Opens the one lever ADR-205 left explicitly open (within-list PQ asymmetric distance, orthogonal to the killed cluster-level bound). Frozen gate: PQ must beat the cheaper of {plain full-L2, early-abandon exact-L2} nprobe by >=2x full-L2-equivalent member-evals at recall@10=0.95 AND wall-clock, across nclusters{64,256,1024} at >=1 scale N>=50k. Honest prior: ~55% win-at-scale, named kill-paths = amortization crossover + concentration re-rank ceiling. Stacked on feat/seprag-bet4-ivf-pruning to reuse ruvector-bet4-ivf-bench. Thread ruvnet#534.

PqIvf trains m sub-quantizers on the shared ruvector-rairs k-means substrate (kmeans assignments ARE the PQ codes), encodes corpus to m-byte codes, and adds search_adc_rerank (cheap ADC scan of nprobe lists + exact L2 re-rank of top-R) plus search_adc_only (pure-ADC ceiling probe). AdcCost charges everything in one honest unit: 256 (LUT) + adc_members*m/D + rerank*1 full-L2-equivalents. BnBIvf gains search_nprobe_abandon = the early-abandon exact-L2 steelman incumbent (user-confirmed verdict-setter), charged in dims_touched/D. Gates (real 2k arxiv slice): PqIvf shares centroids w/ BnBIvf; PQ@full-rerank exact (recall>=0.999); early-abandon exact vs full L2 (<0.001). 6 tests green, clippy clean. Thread ruvnet#534, BET5 pre-reg frozen at 1d920b3.

examples/pq_pruning_sweep.rs: shared index per nclusters; tune incumbent nprobe to min reaching recall@10>=0.95; PQ scans the SAME nprobe lists (cannot rerank an unscanned neighbour) and we tune the smallest re-rank R recovering >=0.95. Charges all PQ ops in full-L2-equivalents (256 LUT + adc*m/D + R rerank). Reports pure-ADC ceiling, R*, early-abandon dim-prune fraction, wall-clock, crossover n*, frozen gate. Thread ruvnet#534.

…ias)

…ntender Extract build_ivf -> IvfParts; BnBIvf::from_parts + PqIvf::from_parts reuse one seeded k-means for the incumbent and every PQ(m). Cuts the worst cell (nc=1024 @100k) from 3x k-means to 1x while guaranteeing the shared-index property by construction. Behavior-preserving (N=5000 numbers identical). 6 tests green.

Pre-reg accounting + 'no free routing' adversarial check require the nclusters query-centroid routing evals charged equally to incumbent AND PQ. Harness omitted it, silently flattering PQ where routing dominates (high nclusters). Now prints member-only ratio (transparency) AND the gate-deciding TOTAL ratio with routing; verdict decided on total. Wall-clock already included routing (search computes centroid dists) so the wall guard was already honest. Re-run authoritative.

Opens ADR-205's one open lever (within-list PQ asymmetric distance, orthogonal to the killed cluster-level bound). PQ (cheap ADC scan + exact top-R rerank) beats tuned plain nprobe AND the early-abandon exact-L2 steelman by >=2x full-L2-equivalent member-evals at recall@10=0.95 AND wall-clock, across all three nclusters{64,256,1024} at N=100k. Win GROWS with N, crossover n* RISES with nclusters (routing amortization) -> >=2x at nclusters~sqrt(n) from n~20-50k. Honest caveats (none buried): win rides on the exact rerank not pure ADC (ceiling ~0.5) = IVFADC+refine validated, not a new method; scale-gated (full sweep only at 100k); nc=1024/100k knife-edge 2.03x; m=16 tuned; recall-floor tunability flatters PQ modestly; steelman halved the naive-L2 ratio. Routing charge bug in my own harness caught by the pre-registered 'no free routing' check (nc=1024/50k 2.24x member -> 1.65x total). Scoreboard 3 WINS / 4 KILLS. Thread ruvnet#534, pre-reg frozen at 1d920b3.

shaal added 13 commits June 5, 2026 00:25

chore(bet4): lockfile for ruvector-bet4-ivf-bench workspace member

47ffaf4

style(bet5): clippy-clean PQ kernel + sweep (iterator idioms, type al…

3defcf4

…ias)

shaal mentioned this pull request Jun 5, 2026

SepRAG: CCH-inspired retrieval exploration + customizable re-weighting for self-learning ANN #534

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BET 5 (SepRAG #534): PQ/IVFADC within-list pruning vs tuned IVF nprobe — scale-gated WIN (ADR-206)#542

BET 5 (SepRAG #534): PQ/IVFADC within-list pruning vs tuned IVF nprobe — scale-gated WIN (ADR-206)#542
shaal wants to merge 13 commits into
ruvnet:mainfrom
shaal:feat/seprag-bet5-ivf-pq

shaal commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shaal commented Jun 5, 2026

BET 5 (SepRAG #534) — PQ/IVFADC within-list pruning vs tuned plain IVF nprobe — scale-gated WIN

The lever

Claim (frozen before any run, docs/plans/bet5-ivf-pq/PRE-REGISTRATION.md)

Result — WIN at N=100k (all three nclusters); ≥2× at nclusters∈{64,256} from n≈20–50k

Honesty (none buried — see ADR-206)

Correctness gates (before any claim)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BET 5 (SepRAG #534) — PQ/IVFADC within-list pruning vs tuned plain IVF `nprobe` — scale-gated WIN

Claim (frozen before any run, `docs/plans/bet5-ivf-pq/PRE-REGISTRATION.md`)