research(seprag): BET 2⊗4 filtered-ANN region-pruning — qualified NO-GO (ADR-201) [finding, not a feature] by shaal · Pull Request #536 · ruvnet/RuVector

shaal · 2026-06-04T20:48:02Z

⚠️ This is a research finding, not a feature. The verdict is a qualified NO-GO. Merging records an ADR + reusable benchmark tooling — it does not add a production code path or claim a win. No urgency to merge; opened for visibility and the record (same as the ADR-199 kill).

TL;DR

BET 2 ⊗ BET 4 of the SepRAG research line (#534): does region-pruned IVF search beat the ruvector-acorn incumbent on correlated filtered queries? Pre-registered ≥5× distance-eval gate.

Verdict: qualified NO-GO. Region-pruning beats vanilla ACORN 6–48× at selectivity ≤ 1% — but the win does not survive the mandatory adversarial check: giving ACORN a predicate-aware entry (a simple, standard enhancement) collapses the gap to ~2× at high correlation, below the bar. A real but narrow edge remains at moderate correlation (ρ≈0.7). Full reasoning: docs/adr/ADR-201.

What's in this PR (independent of #535)

New self-contained crate ruvector-filtered-bench — depends only on ruvector-acorn + ruvector-rairs; zero dependency on ruvector-seprag/PR SepRAG: CCH-inspired retrieval exploration + customizable re-weighting (ADRs 196-200, ruvector-seprag) #535.
ADR-201 + the pre-registration doc (gate frozen before the run).
Additive, result-preserving instrumentation of ruvector-acorn: acorn_search_counted, flat_filtered_search_counted, acorn_search_seeded_counted. Existing functions delegate unchanged; 13 acorn tests prove behavior is preserved. Useful tooling for anyone measuring ACORN's distance-eval cost.

Why it may still be worth landing despite the NO-GO

ADR-201 is a documented, cited finding — kills are first-class in this thread (cf. ADR-199).
The ACORN eval-counting variants and the ρ-correlation-knob benchmark are reusable for the named follow-ups (multi-predicate conjunctions; large-n).

Honest verdict (cost at matched recall, n=20k arxiv)

ρ	sel	A vs vanilla ACORN	A vs predicate-aware-entry ACORN
1.0	0.1%	25.9×	2.4× — below the 5× bar
1.0	1%	6.1×	1.9× — below the bar
0.7	1%	9.0×	6.5× — holds

Central lesson: a filtered-ANN cost claim is meaningless without a predicate-aware-entry baseline.

Notes for review

Branches off main; does not affect SepRAG: CCH-inspired retrieval exploration + customizable re-weighting (ADRs 196-200, ruvector-seprag) #535 (BET 1, a separate WIN).
Data-dependent tests skip gracefully without the ogbn-arxiv download → CI is green without data.
ruvector-acorn is touched only additively (measurement); the core algorithm is unchanged.

Resolves the BET 2 ⊗ BET 4 item of #534 (qualified NO-GO). Follow-ups (conjunctions, large-n, BET 4 standalone) noted in ADR-201 and #534.

… (issue ruvnet#534) Region-pruned filtered ANN vs tuned ACORN. New self-contained crate ruvector-filtered-bench, depending only on ruvector-acorn (incumbent + oracle) and ruvector-rairs (IVF) — independent of ruvector-seprag/PR ruvnet#535. Pre-registration (docs/plans/bet2-filtered-ann/PRE-REGISTRATION.md) freezes a selectivity-shaped win/kill gate before any contender runs: at correlation rho>=0.7, contender A within 2% filtered-recall@10 of tuned ACORN at >=5x fewer distance-evals/query at sel<=1% (>=2x at sel=5%), monotonic in selectivity; graceful-degradation and wall-clock honesty guards; rho=0 recall-collapse kill control. M0 (plumbing, pre-freeze-safe): - data.rs: aligned ogbn-arxiv feat/label/year loader. - predicate.rs: rho-correlation knob holding selectivity exactly constant across rho, plus natural label/year predicate families. - tests/oracle_gate.rs: exact_filtered_knn cross-checked against an independent brute force on a real arxiv slice (sel x rho grid). 5 tests green, clippy clean.

… baseline Instrument ruvector-acorn with additive, result-preserving counted-search variants (acorn_search_counted, flat_filtered_search_counted) so distance-evals — the pre-registered primary cost metric — are measured exactly on ACORN-as-shipped. 13 acorn tests pass incl. a counted==uncounted + flat-evals==#matches invariant. filtered-bench contenders (src/contenders.rs): - B: ACORN predicate-agnostic search (the incumbent), exact eval counts. - C: classic post-filter (retrieve top-pool unfiltered, then filter) — the floor. M1 findings (n=20k arxiv, ρ=1, k=10): - TEETH (examples/teeth.rs): at the gate-relevant low selectivity, post-filter collapses while ACORN holds — sel=0.1%: 73.7% vs 22.7%; sel=0.5%: 90.4% vs 59.7%; sel=1%: 92.6% vs 79.3%. At sel>=5% post-filter is fine (as theory predicts). Benchmark is demonstrably sensitive (50+ pt recall swing) — the negative control. - TUNED ACORN (examples/acorn_tune.rs): ACORN reaches ~92.6% recall at sel=1% with gamma=2, ef=512, at ~1622 evals/query; evals are ~flat in ef (early-termination bound), so "tuned" = crank ef for recall at near-constant cost. This is the fair incumbent baseline for the M3 gate, and it validates the >=5x bar: contender A must reach >=90.6% recall at <=~324 evals/query to win.

src/prune.rs: RegionPruneIvf, built on ruvector-rairs k-means (ADR-193 substrate). Two stacked prunings realizing the salvaged SepRAG kernel on the treewidth-immune IVF hierarchy: 1. predicate pruning — skip clusters with zero matching members (the BET-2 win). 2. branch-and-bound distance pruning — triangle-inequality lower bound (dist(q,centroid) - radius); once the top-k heap is full, clusters whose LB exceeds the worst result are skipped. Probe in LB order so the bound lets us break, not just skip — a strict improvement over the M2-sketch's match-count ordering, and it yields EXACT filtered top-k. Cost metric = nclusters (routing) + matching members scanned; the O(1) predicate gates the expensive distance, so non-matching points cost nothing (the asymmetry vs ACORN, which evaluates a distance per expanded node regardless of predicate). max_probe knob: None = exact B&B (recall 1.0); Some(p) caps match-clusters probed (trades recall for fewer evals, mirroring ACORN's ef) for equal-recall comparison. Tests: exact_bb_matches_oracle (recall 1.0 vs exact_filtered_knn on 20 queries) and zero_match_clusters_are_skipped (1% selectivity → <1000 evals vs 4000 full scan). 8 unit + 1 integration green, clippy clean.

@1622

… sel<=1%) examples/sweep.rs: full selectivity x rho grid, cost-at-matched-recall comparison (tune A's probe cap to ACORN's recall, then compare distance-evals), with the wall-clock honesty guard and the rho=0 kill control. VERDICT vs the frozen gate (n=20k, ACORN gamma2 ef=512, IVF nclusters=64): - WIN at sel<=1%, rho>=0.7: region-pruned IVF beats tuned ACORN by 6.1-48x evals and 4.7-26x wall-clock at equal-or-better recall (A's exact B&B recall >= ACORN). e.g. rho=1 sel=1%: ACORN 92.6%@1622 evals vs A 99.9%@264 evals = 6.1x (4.7x wall). - MISS at sel=5%: best 1.5x (gate wanted >=2x). The win is a low-selectivity (<=1%) phenomenon — the dominant production metadata-filter regime, but a real boundary, not the full pre-registered claim. - Mechanism partly refuted: A also wins at rho=0 (low sel), so the eval advantage is selectivity-driven (few matches -> cheap exact B&B) more than correlation- driven; correlation governs recall, not cost. Reported, not buried. - rho=0 kill control: A does NOT collapse (recall-safe); high-sel (>=10%) A loses as expected (ACORN's regime). Wall-clock guard: PASS (win survives the clock). nclusters is A's tuning knob (parallel to ACORN's ef): 64 beats 128 in the win regime (cheaper routing); both confirm the same boundary.

…y fails the gate Adds predicate-aware-entry ACORN (the rule-ruvnet#5 "tune harder" adversary): - ruvector-acorn: acorn_search_seeded_counted (beam starts from caller seeds instead of multi-probe entry); acorn_search_impl refactored to take Option<seeds>, existing fns pass None — 13 acorn tests still green (behavior preserved). - contenders.rs: Acorn::search_predicate_entry — stride-sample probes, predicate-test free, distance-eval only matching probes, seed the beam from the nearest matches. - examples/adversarial.rs: A vs best-of(vanilla-B, predicate-entry-D) at matched recall. FINDING (rule ruvnet#5 changed the verdict): predicate-aware entry slashes ACORN's cost at HIGH correlation (rho=1 sel=0.1%: 3753 -> 203 evals), collapsing A's advantage from 44.7x (vs vanilla) to 2.4x — BELOW the pre-registered 5x bar. A vs best ACORN: rho=1.0: 2.4x / 2.3x / 1.9x (sel .001/.005/.01) — MISS at the 5x bar. rho=0.7: 38.8x / 14.6x / 6.5x — WIN (D's seeding is weak at moderate correlation, where matches are scattered so a seeded walk still wanders). So A and predicate-entry-ACORN exploit the SAME structure and converge (~2x) at high correlation; A's clean win is NOT robust to a properly-tuned ACORN. Honest verdict: largely a KILL at the pre-registered bar, with a narrower conditional edge at rho~0.7. Caveat favoring A: D's seeding leans on ~16k "free" predicate tests (the eval metric ignores the O(1) predicate scan); at scale that scan isn't free, restoring some edge.

…O-GO (M4) Writes up the BET 2 ⊗ BET 4 outcome with ADR-199/200 honesty. Verdict: region-pruned IVF beats VANILLA ACORN 6-48x evals (4.7-26x wall-clock) at sel<=1%, but the pre-registered >=5x WIN does NOT survive the rule-ruvnet#5 adversarial check — giving ACORN a predicate-aware entry collapses the gap to ~2x at high correlation (rho=1), below the bar. A retains a narrow conditional edge at moderate correlation (rho~0.7, 6-39x) plus an at-scale caveat (D's seeding leans on a ~full predicate scan the eval metric treats as free). Net: the bet does not cleanly pay; the clean win was an artifact of an under-equipped incumbent. Central lesson: a filtered-ANN cost claim is meaningless without a predicate-aware-entry baseline. Also strips a stray tag from the pre-registration doc (non-semantic).

The experiment's own evidence points to two flip conditions (conjunctions where ACORN's predicate-seeding degrades but cluster-skip composes; large-n where the predicate scan stops being free) and the open BET 4 standalone baseline.

… hold) A conjunction is a single O(1) boolean predicate of selectivity = product; in the distance-eval metric it reduces to (selectivity, scatter) — both already swept. The 'exponentially-unlikely seed' reasoning was wrong (testing a conjunction is O(1)). Residual leads downgraded to narrow/speculative (predicate-eval cost, large-n). Recommend closing BET 2 ⊗ BET 4; thread value is BET 1 productionization + BET 3.

shaal added 7 commits June 4, 2026 14:44

shaal mentioned this pull request Jun 4, 2026

SepRAG: CCH-inspired retrieval exploration + customizable re-weighting for self-learning ANN #534

Open

5 tasks

shaal mentioned this pull request Jun 5, 2026

SepRAG BET 3 — curated-KG treewidth probe → NO-GO (finding, ADR-203) #538

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(seprag): BET 2⊗4 filtered-ANN region-pruning — qualified NO-GO (ADR-201) [finding, not a feature]#536

research(seprag): BET 2⊗4 filtered-ANN region-pruning — qualified NO-GO (ADR-201) [finding, not a feature]#536
shaal wants to merge 8 commits into
ruvnet:mainfrom
shaal:docs/seprag-bet2-filtered-ann

shaal commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shaal commented Jun 4, 2026

TL;DR

What's in this PR (independent of #535)

Why it may still be worth landing despite the NO-GO

Honest verdict (cost at matched recall, n=20k arxiv)

Notes for review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant