Skip to content

BET 5 (SepRAG #534): PQ/IVFADC within-list pruning vs tuned IVF nprobe — scale-gated WIN (ADR-206)#542

Open
shaal wants to merge 13 commits into
ruvnet:mainfrom
shaal:feat/seprag-bet5-ivf-pq
Open

BET 5 (SepRAG #534): PQ/IVFADC within-list pruning vs tuned IVF nprobe — scale-gated WIN (ADR-206)#542
shaal wants to merge 13 commits into
ruvnet:mainfrom
shaal:feat/seprag-bet5-ivf-pq

Conversation

@shaal
Copy link
Copy Markdown
Contributor

@shaal shaal commented Jun 5, 2026

BET 5 (SepRAG #534) — PQ/IVFADC within-list pruning vs tuned plain IVF nprobescale-gated WIN

A research finding, run under the thread's prove-not-hype protocol (one claim+number, beat a
tuned in-repo incumbent + a steelman, public data, pre-registered frozen gate, adversarial check,
honest caveats). No merge urgency — it adds a self-contained benchmark crate and an ADR; it does
not change any shipping code.

Stacked on #540 (BET 4): reuses the crates/ruvector-bet4-ivf-bench harness, so this PR's diff
includes #540's commits. The new work is the BET-5 commits (1d920b3a87bf2d5e): src/pq.rs,
examples/pq_pruning_sweep.rs, tests/pq_gate.rs, the pre-registration, and ADR-206.

The lever

ADR-205 killed cluster-level triangle-inequality pruning vs tuned nprobe (the bound was
redundant with nprobe's centroid cutoff — 1.00× in every cell) and named one open lever:
within-list pruning via product-quantized / IVFADC asymmetric distance — a different mechanism.
This is that head-to-head.

Claim (frozen before any run, docs/plans/bet5-ivf-pq/PRE-REGISTRATION.md)

PQ/IVFADC within-list pruning (cheap ADC scan of the nprobe lists + exact-L2 re-rank of the
top-R) reaches matched recall@10 = 0.95 at ≥ 2× fewer full-L2-equivalent member-evals than the
strongest PQ-free incumbent, and wins wall-clock, across nclusters ∈ {64,256,1024} at ≥ one N ≥ 50k.

Result — WIN at N=100k (all three nclusters); ≥2× at nclusters∈{64,256} from n≈20–50k

Total full-L2-equivalent ratio (routing charged to both; best m per cell; wall-clock wins everywhere):

N nc=64 nc=256 nc=1024
20k 2.51× 1.95× 1.33×
50k 3.20× 2.50× 1.65×
100k 3.38× 2.80× 2.03×

The win grows with N and the crossover n* rises with nclusters — a clean amortization
signature (routing ∝ nclusters, working set ∝ n/nclusters). Unlike ADR-205, the mechanism is
orthogonal to nprobe (cheapens the per-member distance, not the list selection).

Honesty (none buried — see ADR-206)

  • Win rides on the exact re-rank, not pure ADC (ADC ceiling ~0.48–0.52 at m=16). This is
    IVFADC + refine (FAISS's standard design) validated to beat ruvector-rairs::IvfFlat — not a
    new algorithm. The honest takeaway: add an IVFPQ + rerank path to IvfFlat.
  • Scale-gated: the full "all three nclusters" sweep only at n=100k; nc=1024/100k is a knife-edge 2.03×.
  • Steelman mattered: the early-abandon exact-L2 incumbent prunes 40–53% of dims and was the
    cheaper baseline in every cell — it ~halved the naive-L2 ratio (would have read ~6×).
  • A routing-charge bug in my own harness (omitting nclusters centroid evals) was caught by the
    pre-registered "no free routing" check — it had inflated nc=1024/50k from a true 1.65× to 2.24×.
    Fixed; the table above charges routing throughout.

Correctness gates (before any claim)

PQ@full-rerank exact (recall ≥ 0.999); early-abandon steelman exact vs full-L2; PQ shares the
k-means index with the incumbent by construction. 6 tests green, clippy clean.

Scoreboard: 3 WINS (ADR-200/202, ADR-204, ADR-206) / 4 KILLS (ADR-199, ADR-201, ADR-203, ADR-205).

Closes the ADR-205 "within-list / PQ" open lever. Reproduce: cargo run --release -p ruvector-bet4-ivf-bench --example pq_pruning_sweep -- 20000 50000 100000 (public ogbn-arxiv 128-d).

shaal added 13 commits June 5, 2026 00:25
Closes the BET 4 caveat left open by ADR-201: the region-pruning IVF
kernel was only run against ACORN (BET 2), never against its natural
incumbent, plain IVF nprobe, on unfiltered ANN. Frozen gate: WIN = >=2x
member-scan reduction at matched recall@10 (R=0.95) AND wall-clock win
across nclusters in {64,256,1024}; KILL = <1.5x or wall-clock reverses.
Two controls: exact-vs-exact pruning-fraction probe + low-d (PCA-8)
soundness control. Honest prior: NO-GO lean (128-d concentration makes
the triangle-inequality bound loose) — the IVF-level companion to
ADR-199. Branch off clean main; B&B kernel rebuilt self-contained
(BET 2's lives only on ruvnet#536).
…s certified)

New crate ruvector-bet4-ivf-bench (deps: ruvector-rairs, rand).
- data.rs: aligned arxiv 128-d feature CSV loader.
- kernel.rs: BnBIvf — IVF probed in ascending lower-bound order with B&B
  early termination (break when LB >= kth-best); LB(q,c)=max(0,|q-mu_c|-r_c),
  r_c=max member radius. Full budget = exact; max_probe cap = nprobe analogue.
  Built on ruvector-rairs kmeans so it shares centroids with the IvfFlat
  incumbent (shared-index pre-reg requirement).
- oracle.rs: brute-force exact kNN + recall@k + shared true-L2 helper.
- M0 gate test PASSES on real arxiv slice: full-budget B&B == oracle
  (recall@10 >= 0.999) → B&B invariant certified. clippy clean.

Frozen gate: docs/plans/bet4-ivf-pruning/PRE-REGISTRATION.md. Off clean main.
…aithfulness gate

BnBIvf::search_nprobe: the plain-IVF incumbent strategy (nprobe nearest
centroids, scan all members, no B&B) on the SAME centroids/lists as the
B&B contender, with member-eval counting. Refactored top-k accumulation
into shared consider()/finalize() so both strategies accumulate
identically and only the probe loop differs (shared-index pre-reg
requirement). New gate instrumented_nprobe_matches_rairs PASSES: recall
matches ruvector-rairs::IvfFlat within 0.01 at matched params → the
cost-measured incumbent is algorithmically the real one. 3 tests green.
- kernel: search_bnb_skip — the STEELMAN. Centroid-distance order (the
  effective nprobe ordering) + per-cluster LB-skip (correctness-safe in
  any order, unlike the LB-order global break). The strongest cluster-level
  B&B: if it can't beat tuned nprobe, the bound doesn't pay.
- pca: minimal power-iteration top-m PCA (no linalg dep) for the low-dim
  control — projects real arxiv features to 8-d where the bound is tight.
- examples/ivf_pruning_sweep: 3 contenders share one index per nclusters
  (plain nprobe / B&B LB-order / B&B steelman) x 2 regimes (128-d, PCA-8),
  exact-regime pruning probe, matched-recall@0.95, frozen-gate verdict.

RESULT (n=20k & n=50k both): steelman = 1.00x evals vs nprobe in EVERY
cell, BOTH regimes. NO-GO. Mechanism is structural, not dimensional: the
LB bound only prunes FAR clusters that tuned nprobe already skips, so it's
redundant with nprobe's centroid-distance cutoff. Exact-prune fraction
scales correctly with dim (0-13% @128-d, 8-87% @PCA-8) => kernel sound;
the redundancy is fundamental. LB-ORDER (faithful BET-2 kernel) is strictly
WORSE (0.18-0.25x) — LB-ordering probes far large-radius clusters early.
…l NO-GO

Verdict: NO-GO (robust, structural). Steelman B&B (centroid order +
LB-skip) ties tuned nprobe at exactly 1.00x member-evals in every cell,
n=20k & n=50k, 128-d & PCA-8. Mechanism: the triangle-inequality bound
only prunes FAR clusters that tuned nprobe already skips => redundant with
nprobe's centroid-distance cutoff; win is structurally impossible, not
just hard in high-d. LB-order (faithful BET-2 kernel) strictly worse
(0.18-0.25x). Companion to ADR-199.

Honest deviation recorded: the pre-registered PCA-8 control expected a B&B
WIN (tight bound). It tied instead — the premise was false (tight bound
beats full-scan, not tuned nprobe). Control still valid: exact-prune
fraction scales correctly with dim (0-13% @128-d, 8-82% @PCA-8) => kernel
sound; it revealed the structural redundancy. Scoreboard 2 WINS / 4 KILLS.
…s tuned nprobe

Opens the one lever ADR-205 left explicitly open (within-list PQ asymmetric
distance, orthogonal to the killed cluster-level bound). Frozen gate: PQ must
beat the cheaper of {plain full-L2, early-abandon exact-L2} nprobe by >=2x
full-L2-equivalent member-evals at recall@10=0.95 AND wall-clock, across
nclusters{64,256,1024} at >=1 scale N>=50k. Honest prior: ~55% win-at-scale,
named kill-paths = amortization crossover + concentration re-rank ceiling.
Stacked on feat/seprag-bet4-ivf-pruning to reuse ruvector-bet4-ivf-bench.
Thread ruvnet#534.
PqIvf trains m sub-quantizers on the shared ruvector-rairs k-means substrate
(kmeans assignments ARE the PQ codes), encodes corpus to m-byte codes, and adds
search_adc_rerank (cheap ADC scan of nprobe lists + exact L2 re-rank of top-R)
plus search_adc_only (pure-ADC ceiling probe). AdcCost charges everything in one
honest unit: 256 (LUT) + adc_members*m/D + rerank*1 full-L2-equivalents.
BnBIvf gains search_nprobe_abandon = the early-abandon exact-L2 steelman
incumbent (user-confirmed verdict-setter), charged in dims_touched/D.

Gates (real 2k arxiv slice): PqIvf shares centroids w/ BnBIvf; PQ@full-rerank
exact (recall>=0.999); early-abandon exact vs full L2 (<0.001). 6 tests green,
clippy clean. Thread ruvnet#534, BET5 pre-reg frozen at 1d920b3.
examples/pq_pruning_sweep.rs: shared index per nclusters; tune incumbent nprobe
to min reaching recall@10>=0.95; PQ scans the SAME nprobe lists (cannot rerank an
unscanned neighbour) and we tune the smallest re-rank R recovering >=0.95. Charges
all PQ ops in full-L2-equivalents (256 LUT + adc*m/D + R rerank). Reports pure-ADC
ceiling, R*, early-abandon dim-prune fraction, wall-clock, crossover n*, frozen gate.
Thread ruvnet#534.
…ntender

Extract build_ivf -> IvfParts; BnBIvf::from_parts + PqIvf::from_parts reuse one
seeded k-means for the incumbent and every PQ(m). Cuts the worst cell (nc=1024
@100k) from 3x k-means to 1x while guaranteeing the shared-index property by
construction. Behavior-preserving (N=5000 numbers identical). 6 tests green.
Pre-reg accounting + 'no free routing' adversarial check require the nclusters
query-centroid routing evals charged equally to incumbent AND PQ. Harness omitted
it, silently flattering PQ where routing dominates (high nclusters). Now prints
member-only ratio (transparency) AND the gate-deciding TOTAL ratio with routing;
verdict decided on total. Wall-clock already included routing (search computes
centroid dists) so the wall guard was already honest. Re-run authoritative.
Opens ADR-205's one open lever (within-list PQ asymmetric distance, orthogonal
to the killed cluster-level bound). PQ (cheap ADC scan + exact top-R rerank)
beats tuned plain nprobe AND the early-abandon exact-L2 steelman by >=2x
full-L2-equivalent member-evals at recall@10=0.95 AND wall-clock, across all
three nclusters{64,256,1024} at N=100k. Win GROWS with N, crossover n* RISES
with nclusters (routing amortization) -> >=2x at nclusters~sqrt(n) from n~20-50k.

Honest caveats (none buried): win rides on the exact rerank not pure ADC
(ceiling ~0.5) = IVFADC+refine validated, not a new method; scale-gated (full
sweep only at 100k); nc=1024/100k knife-edge 2.03x; m=16 tuned; recall-floor
tunability flatters PQ modestly; steelman halved the naive-L2 ratio. Routing
charge bug in my own harness caught by the pre-registered 'no free routing'
check (nc=1024/50k 2.24x member -> 1.65x total). Scoreboard 3 WINS / 4 KILLS.
Thread ruvnet#534, pre-reg frozen at 1d920b3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant