SepRAG: CCH-inspired retrieval exploration + customizable re-weighting (ADRs 196-200, ruvector-seprag) by shaal · Pull Request #535 · ruvnet/RuVector

shaal · 2026-06-04T15:42:09Z

Part of #534.

A disciplined, evidence-gated exploration of CCH-inspired retrieval ("SepRAG") for hybrid
vector + KG memory. Produces one honest negative (CCH contraction ruled out for
embedding graphs) and one proven, shippable positive (customizable re-weighting of a
fixed ANN topology under metric drift). Docs + a reference crate + six reproducible
experiment harnesses; no changes to existing crates.

Measured impact (proven, with scope)

A new capability — adapt the ANN index to a changed relevance metric by re-weighting
the existing graph instead of rebuilding. Does not change query latency or absolute
recall of existing search; it removes the rebuild cost a metric change otherwise incurs.

	full rebuild (current)	re-weight (this work)
Update on metric change, n=100k (reference Vamana)	141.8s	0.035s (~4,000× lower)
Update on metric change, n=20k (production `ruvector-diskann`)	~7s	~0.01s; recall within 2%
Recall@10 vs rebuild (diag/rot/non-linear/region-local/compounding, ≤100k)	baseline	within 2%
Hybrid (re-weight + rebuild every ~4 steps)	99.1% @ 24 rebuilds	98.8% @ 6 rebuilds (25% cost)

Update-cost ratio ~1,000–4,000× across n=5k–100k (rebuild is super-linear). Not yet
proven: drift is synthetic (parametric), not a live ruvector-gnn learned-metric
trajectory; production-loop integration is backlog item #1.

What's here

ADR-196…200 (docs/adr/) — full decision record.
docs/plans/seprag-cch-retrieval/ — milestone plans + FUTURE-DIRECTIONS.md (backlog
- "prove-not-hype" protocol).
crates/ruvector-seprag/ — reference CCH nested-dissection + separator-tree k-NN, a
shared Vamana-lite ANN engine (src/ann.rs), and 6 harnesses. Unit tests + clippy green.
Adds ruvector-seprag to the workspace; ruvector-diskann used only as a dev-dependency
(one harness confirms results on the production index).

Finding 1 — CCH full-contraction is NO-GO for embedding retrieval (measured, ADR-199)

Implementation validated (separator-tree k-NN == Dijkstra oracle on toy + real graphs), then:

Backbone (n=1500)	shortcut blowup	elim-tree height
roadNet-PA (control)	7.6×	~3.5·√n
ogbn-arxiv citation	23.8×	~0.6·n
ogbn-arxiv feature kNN	42.4×	~0.56·n

Road control proves the code is correct; embedding-derived backbones are intrinsically
high-treewidth → contraction blows up. (HNSW already handles embedding kNN.)

Finding 2 — Customizable re-weighting is a WIN (ADR-200)

In a self-learning store the metric drifts. Instead of rebuilding the ANN index, reuse the
topology and re-score under the new metric. On the production ruvector-diskann Vamana:

Recall within 2% of full rebuild across diagonal / rotational / non-linear /
region-local / compounding drift, up to n=100k, at ~1,000–4,000× lower update cost.
Hybrid policy (re-weight every step + rebuild every ~4): 98.8% recall vs 99.1% always /
94.4% never, at 25% of the rebuild cost.
Honest caveats kept in the ADR: gap widens with scale/churn (defer/batch rebuilds, not
never); the drift-triggered monitor underperformed simple periodic.

Testing

cargo test -p ruvector-seprag (8 + doctest green), cargo clippy -p ruvector-seprag clean.
Experiment harnesses run on public data (ogbn-arxiv, SNAP roadNet-PA); each prints its
pre-registered win/kill gate.

Scope / safety

Additive only — new crate + docs; no existing crate modified. ruvector-diskann is a
dev-dependency (examples only), not a runtime dependency.

Add design ADRs and milestone plans for adapting Customizable Contraction Hierarchies (nested dissection, separators, contraction shortcuts, elimination trees, separator-tree k-NN) to RuVector's hybrid vector + knowledge-graph memory. ADRs: - ADR-196: SepRAG keystone (separator-tree retrieval; complements HNSW/DiskANN) - ADR-197: navigation-graph construction + metric-independent ND ordering - ADR-198: customizable metric layer (CCH customization <-> GNN self-learning loop) - ADR-199: public-corpus benchmark & evaluation harness Plans (docs/plans/seprag-cch-retrieval/): M0 correctness gate -> M1 blowup go/no-go on ogbn-arxiv -> M2 customization -> M3 full hybrid -> M4 integration. Maps decisions onto existing crates (ruvector-mincut/jtree, solver/bmssp, sparsifier, diskann, gnn, attn-mincut).

…aphs New crate ruvector-seprag implementing the SepRAG M0 milestone (docs/plans/seprag-cch-retrieval/M0-correctness-gate.md): - graph: undirected weighted graph + Dijkstra brute-force k-NN oracle - order: nested-dissection ordering via BFS-layer separators + separator tree - contraction: metric-free symbolic contraction -> chordal upward graph + elim tree - customize: bottom-up triangle-sweep shortcut weighting (re-runnable per metric) - query: upward search, exhaustive CCH k-NN, bucket-based branch-and-bound k-NN with admissible early-stop and search-space accounting - gen: deterministic SBM/grid/path/clique generators (SplitMix64) - examples/blowup_report: M0->M1 diagnostic (blowup ratio, elim height, pruning) Tests (8 + doctest, all green): SepRAG k-NN == Dijkstra oracle on SBM/grid/ path/clique; pruned == unpruned (pruning sound); pruning reduces search space; determinism; bounded blowup. cargo clippy clean. Finding: query pruning eliminates 95-100% of scans regardless of structure; blowup/elim-height are dominated by separator quality, and the naive BFS separator degenerates on low-diameter dense graphs (SBM 18.6x) — motivating the ruvector-mincut separator swap planned for M1 (ADR-197).

…ion subgraph examples/m1_arxiv: ingest the real ogbn-arxiv edge list (169K nodes / 1.16M edges), induce a connected BFS-ball subgraph, build the SepRAG hierarchy, and report the ADR-199 go/no-go metrics (blowup ratio, elim-tree height, build time) plus a sampled Dijkstra-oracle recall check. First-pass result (N=1500 citation-only subgraph, M0 BFS separator): - recall 50/50 vs Dijkstra oracle -> CORRECTNESS HOLDS on real data - query pruning saves ~100% of scans (364 vs 418555 bucket scans/query) - BUT blowup 56.9x and elim height ~= n -> the BFS separator degenerates completely on small-world citation graphs (picks a giant BFS layer as the separator), and the raw citation ball is dense (avg degree ~24). Verdict (ADR-199 fallback ladder): NO-GO for the naive separator + dense backbone. Trustworthy precisely because M0 proved the algorithm correct, so 57x is a separator-quality/backbone artifact, not a SepRAG refutation. Next: ruvector-mincut balanced separators + alpha-pruned sparse backbone (ADR-197).

…tion order.rs: add SeparatorKind::Balanced (grow half-size region, take only its boundary) alongside the M0 BfsLayer strategy; make Balanced the default. lib.rs: SepRag::build_with(graph, kind). m1_arxiv: add max_degree backbone sparsification + separator-kind args for A/B attribution. M1 attribution (ogbn-arxiv N=1500 citation BFS-ball): raw + layer blowup 56.9x elim_h 1443 build 56s raw + balanced blowup 23.8x elim_h 941 build 13s <- best deg<=10 + layer blowup 89.6x elim_h 1295 build 38s deg<=10 + bal blowup 60.1x elim_h 1035 build 18s (recall 50/50 and ~100% query pruning in all configs) Findings: (1) balanced separator is a real win (2.4x less fill, 4x faster); (2) hub-dampening degree-bound BACKFIRES (shrinks denominator faster than |G+|, destroys good cuts) — discard it; (3) even best config leaves 23.8x / elim_h~0.6n: the dense small-world citation ball is intrinsically high-treewidth (ADR-197 expander risk, measured). Next: feature-manifold/hyperbolic backbone, not the citation topology.

m1_arxiv: robust edge reader (skip # comments, comma/tab/space separators) so SNAP road networks load through the same harness. m1_manifold: alpha-pruned kNN backbone over real ogbn-arxiv 128-d node features (Vamana RobustPrune) — the decisive ADR-197 thesis test. Results (N=1500, balanced separator, recall 50/50 everywhere): roadNet-PA blowup 7.6x elim_h 136 (~3.5 sqrt n) <- CCH works citation (arxiv) blowup 23.8x elim_h 941 (~0.6 n) feature-manifold k10 blowup 42.4x elim_h 837 (~0.56 n) <- worse feature-manifold k6 blowup 45.1x elim_h 699 (~0.47 n) Conclusion: the road control proves the implementation is sound (planar sqrt(n) separators -> 7.6x, instant build). Both embedding-derived backbones (citation small-world AND Euclidean feature kNN) are intrinsically high-treewidth (elim_h ~ 0.5n), so CCH contraction blows up regardless of separator quality or degree. The expander risk (ADR-197) is confirmed across two independent backbones. Query pruning stays ~100% effective; the cost is preprocessing. Last untested rung: hyperbolic backbone (needs real hyperbolic embeddings).

The M1 go/no-go gate ran on real public data and returned NO-GO for CCH full contraction on embedding/citation retrieval graphs. Recorded the measured evidence in ADR-199 (Empirical Outcome section), updated ADR-196/197 status, and marked the milestone tracker (M0 done, M1 NO-GO, M2-M4 not pursued). Evidence (N=1500, recall 50/50 everywhere): roadNet-PA control blowup 7.6x elim_h ~3.5 sqrt n (CCH works) ogbn-arxiv citation blowup 23.8x elim_h ~0.6 n ogbn-arxiv feature kNN blowup 42.4x elim_h ~0.56 n Implementation is sound (road control + exact recall); embedding backbones are intrinsically high-treewidth. Query pruning works (~100%); preprocessing fill-in is the blocker. ruvector-seprag retained as a validated reference.

… on linear drift) Salvage ADR-198's customizable-metric idea, decoupled from CCH, as a rigorous pre-registered head-to-head: does a fixed ANN topology + recomputed distances absorb metric drift as well as a full rebuild? examples/reweight_vs_rebuild.rs: self-contained Vamana-lite (RobustPrune + greedy beam search); drift modelled as a vector-space transform M=A^T A; sweeps diagonal AND adversarial dense-Mahalanobis (rotational) drift; A=reuse topology, B=rebuild, C=stale control. Result (n=2000 ogbn-arxiv embeddings, recall@10, pre-registered gate): A (re-weight, 0 rebuild) within 0.2% of B (full rebuild) up to 36% relevant-set churn, under both drift modes. C (stale) loses up to 29 points -> benchmark has teeth, A's parity is genuine adaptation. WIN. Honest claim: COST win at equal quality (rebuilds become free under LINEAR drift). Boundaries (ADR-200): non-linear/region-local drift + scale untested. Next: non-linear learned metric (decisive adversarial test). Also adds docs/plans/.../FUTURE-DIRECTIONS.md (4-bet backlog + prove-not-hype protocol) and ADR-200.

…otal WIN) Extend the re-weight-vs-rebuild harness with the decisive adversarial cases and close the proof: - non-linear drift mode (residual tanh warp v + s*tanh(Wv)) — removes the 'linear only' caveat; A still matches B within 0.2% up to 35% churn. - per-query distance-eval columns — A and B match within ~1%, disproving any hidden query-cost trade. Reuse is equal recall AND equal query cost. - fix display bug (C/stale was double-divided by query count; control now correctly reads 90% at t=0, validating the negative control). - drift modelled via transform closure (diag/rot/nonlin share one code path). - clippy: idiomatic char-class split in m1_arxiv reader. ADR-200 + FUTURE-DIRECTIONS updated: WIN across diagonal/rotational/non-linear drift; only open caveats are scale (n>=1e5, decisive next), region-local drift, incremental-rebuild baseline.

Extract the Vamana-lite ANN + metric-drift helpers into a reusable lib module (src/ann.rs) with an efficient two-heap greedy beam search (replaces the O(L) linear-scan beam, ~2x faster, needed for n>=1e5). Thin reweight_vs_rebuild to use it (regression-checked: identical recall/evals at n=2000). Add scale_drift example: sweeps N (5k..100k), measures recall(reuse) vs recall(rebuild) at the adversarial rotational drift point plus the rebuild-cost curve.

… widens scale_drift sweep (5k->100k, rotational drift ~40% churn, recall@10): N A reuse B rebuild gap rebuild update ratio 5000 90.2% 90.0% +0.2% 3.6s 0.001s ~3600x 10000 89.5% 90.3% -0.8% 10.2s 0.004s ~2500x 25000 88.5% 89.2% -0.7% 21.4s 0.009s ~2400x 50000 87.7% 88.6% -0.9% 47.1s 0.043s ~1100x 100000 85.0% 86.7% -1.7% 141.8s 0.035s ~4000x Verdict: WIN within the 2% gate through 100k at ~1000-4000x lower update cost, BUT the recall gap widens with N (-0.2%->-1.7%) => defer/batch rebuilds, not never-rebuild. Honest caveats: both A&B recall fall with N (fixed beam); 100 queries => ~+-1% noise, confirm trend with more queries. Also: rename Rng::next->next_u64 (clippy). ADR-200 + FUTURE-DIRECTIONS updated with scale evidence, widening-gap caveat, and a hybrid re-weight+periodic-rebuild policy as a next step.

…, gate PASS examples/region_drift.rs: warp only a 15% local cluster, grade recall separately for queries INSIDE vs OUTSIDE the drifted region (a global average would hide a local failure). Result (n=20k, recall@10): t churnIn A_in B_in | churnOut A_out B_out 0.25 44% 89.8% 81.4% | 21% 87.9% 89.0% 0.50 53% 89.3% 90.0% | 21% 87.9% 89.0% 1.00 45% 89.5% 90.0% | 21% 87.9% 89.0% Gate PASS: reuse holds inside the drifted region (A_in within 0.7% of B_in, and ABOVE it at t=0.25) even at 53% in-region churn; out-region ~unchanged. Region- local drift did NOT break reuse. Honest caveat: the t=0.25 B_in dip to 81% (reuse beats rebuild by 8pts) is a build-variance artifact of the simplified single-pass Vamana baseline, not a smooth effect — strongest argument to port the baseline to production ruvector-diskann. ADR-200 + FUTURE-DIRECTIONS + status updated.

…onfirmed examples/diskann_drift.rs: re-run the re-weight-vs-rebuild test on the shipping ruvector-diskann VamanaGraph (added as dev-dependency). The reuse trick is native - the graph stores only topology and greedy_search takes vectors externally, so drift = search a graph-built-on-original with the transformed vectors. Result (n=20k, recall@10): GLOBAL rotational: A reuse vs B rebuild within 2% (95.6 vs 97.1 worst, t=0.5) REGION-LOCAL in-region: A_in within ~1.5% of B_in; A_in 98.6 vs B_in 94.5 at t=0.25 absolute recall 96-99% (stronger/fairer baseline than lite-Vamana ~90%) Confirms BET 1 on the production index. The t=0.25 reuse-beats-rebuild dip REPRODUCED on diskann => it is a real property (fresh Vamana build on a half-warped region underperforms reuse), not lite-Vamana noise. Baseline-variance caveat RESOLVED. Remaining caveat: gap widens with scale/churn (defer/batch rebuilds). ADR-200 + FUTURE-DIRECTIONS + status updated.

…ippable examples/hybrid_policy.rs: simulate a compounding random-walk metric-drift trajectory and compare operating policies on the production diskann Vamana — always / never / periodic-K / drift-triggered (Frobenius monitor). Result (n=10k, 24 steps, aggressive random-walk drift, recall@10): always 99.1% mean 98.4% min 24 rebuilds never 94.4% mean 89.7% min 1 rebuild <- decays under heavy drift periodic-4 98.8% mean 97.9% min 6 rebuilds <- ~always at 25% cost periodic-8 98.4% mean 96.5% min 3 rebuilds <- at 12.5% cost Shippable operating point: re-weight every step + rebuild every ~4 steps recovers near-full recall at a fraction of the cost. Honest sub-finding: the drift-TRIGGERED monitor (Frobenius of cumulative transform) underperformed simple periodic — periodic-K is the recommended knob; a sampled-recall probe trigger is future work. Under gentle single-direction drift (n=5k) never did NOT decay, so the hybrid only matters under large/compounding drift. ADR-200 status/boundaries/next-steps + FUTURE-DIRECTIONS updated; stale 'n=2000/scale-unconfirmed' caveats removed (now resolved).

shaal added 15 commits June 4, 2026 02:11

chore(seprag): idiomatic char-array split (clippy clean)

11bef01

chore(seprag): add ruvector-seprag to Cargo.lock

44ee4db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SepRAG: CCH-inspired retrieval exploration + customizable re-weighting (ADRs 196-200, ruvector-seprag)#535

SepRAG: CCH-inspired retrieval exploration + customizable re-weighting (ADRs 196-200, ruvector-seprag)#535
shaal wants to merge 15 commits into
ruvnet:mainfrom
shaal:docs/seprag-cch-retrieval

shaal commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shaal commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Measured impact (proven, with scope)

What's here

Finding 1 — CCH full-contraction is NO-GO for embedding retrieval (measured, ADR-199)

Finding 2 — Customizable re-weighting is a WIN (ADR-200)

Testing

Scope / safety

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shaal commented Jun 4, 2026 •

edited

Loading