SepRAG: CCH-inspired retrieval exploration + customizable re-weighting (ADRs 196-200, ruvector-seprag)#535
Open
shaal wants to merge 15 commits into
Open
SepRAG: CCH-inspired retrieval exploration + customizable re-weighting (ADRs 196-200, ruvector-seprag)#535shaal wants to merge 15 commits into
shaal wants to merge 15 commits into
Conversation
Add design ADRs and milestone plans for adapting Customizable Contraction Hierarchies (nested dissection, separators, contraction shortcuts, elimination trees, separator-tree k-NN) to RuVector's hybrid vector + knowledge-graph memory. ADRs: - ADR-196: SepRAG keystone (separator-tree retrieval; complements HNSW/DiskANN) - ADR-197: navigation-graph construction + metric-independent ND ordering - ADR-198: customizable metric layer (CCH customization <-> GNN self-learning loop) - ADR-199: public-corpus benchmark & evaluation harness Plans (docs/plans/seprag-cch-retrieval/): M0 correctness gate -> M1 blowup go/no-go on ogbn-arxiv -> M2 customization -> M3 full hybrid -> M4 integration. Maps decisions onto existing crates (ruvector-mincut/jtree, solver/bmssp, sparsifier, diskann, gnn, attn-mincut).
…aphs New crate ruvector-seprag implementing the SepRAG M0 milestone (docs/plans/seprag-cch-retrieval/M0-correctness-gate.md): - graph: undirected weighted graph + Dijkstra brute-force k-NN oracle - order: nested-dissection ordering via BFS-layer separators + separator tree - contraction: metric-free symbolic contraction -> chordal upward graph + elim tree - customize: bottom-up triangle-sweep shortcut weighting (re-runnable per metric) - query: upward search, exhaustive CCH k-NN, bucket-based branch-and-bound k-NN with admissible early-stop and search-space accounting - gen: deterministic SBM/grid/path/clique generators (SplitMix64) - examples/blowup_report: M0->M1 diagnostic (blowup ratio, elim height, pruning) Tests (8 + doctest, all green): SepRAG k-NN == Dijkstra oracle on SBM/grid/ path/clique; pruned == unpruned (pruning sound); pruning reduces search space; determinism; bounded blowup. cargo clippy clean. Finding: query pruning eliminates 95-100% of scans regardless of structure; blowup/elim-height are dominated by separator quality, and the naive BFS separator degenerates on low-diameter dense graphs (SBM 18.6x) — motivating the ruvector-mincut separator swap planned for M1 (ADR-197).
…ion subgraph examples/m1_arxiv: ingest the real ogbn-arxiv edge list (169K nodes / 1.16M edges), induce a connected BFS-ball subgraph, build the SepRAG hierarchy, and report the ADR-199 go/no-go metrics (blowup ratio, elim-tree height, build time) plus a sampled Dijkstra-oracle recall check. First-pass result (N=1500 citation-only subgraph, M0 BFS separator): - recall 50/50 vs Dijkstra oracle -> CORRECTNESS HOLDS on real data - query pruning saves ~100% of scans (364 vs 418555 bucket scans/query) - BUT blowup 56.9x and elim height ~= n -> the BFS separator degenerates completely on small-world citation graphs (picks a giant BFS layer as the separator), and the raw citation ball is dense (avg degree ~24). Verdict (ADR-199 fallback ladder): NO-GO for the naive separator + dense backbone. Trustworthy precisely because M0 proved the algorithm correct, so 57x is a separator-quality/backbone artifact, not a SepRAG refutation. Next: ruvector-mincut balanced separators + alpha-pruned sparse backbone (ADR-197).
…tion order.rs: add SeparatorKind::Balanced (grow half-size region, take only its boundary) alongside the M0 BfsLayer strategy; make Balanced the default. lib.rs: SepRag::build_with(graph, kind). m1_arxiv: add max_degree backbone sparsification + separator-kind args for A/B attribution. M1 attribution (ogbn-arxiv N=1500 citation BFS-ball): raw + layer blowup 56.9x elim_h 1443 build 56s raw + balanced blowup 23.8x elim_h 941 build 13s <- best deg<=10 + layer blowup 89.6x elim_h 1295 build 38s deg<=10 + bal blowup 60.1x elim_h 1035 build 18s (recall 50/50 and ~100% query pruning in all configs) Findings: (1) balanced separator is a real win (2.4x less fill, 4x faster); (2) hub-dampening degree-bound BACKFIRES (shrinks denominator faster than |G+|, destroys good cuts) — discard it; (3) even best config leaves 23.8x / elim_h~0.6n: the dense small-world citation ball is intrinsically high-treewidth (ADR-197 expander risk, measured). Next: feature-manifold/hyperbolic backbone, not the citation topology.
m1_arxiv: robust edge reader (skip # comments, comma/tab/space separators) so SNAP road networks load through the same harness. m1_manifold: alpha-pruned kNN backbone over real ogbn-arxiv 128-d node features (Vamana RobustPrune) — the decisive ADR-197 thesis test. Results (N=1500, balanced separator, recall 50/50 everywhere): roadNet-PA blowup 7.6x elim_h 136 (~3.5 sqrt n) <- CCH works citation (arxiv) blowup 23.8x elim_h 941 (~0.6 n) feature-manifold k10 blowup 42.4x elim_h 837 (~0.56 n) <- worse feature-manifold k6 blowup 45.1x elim_h 699 (~0.47 n) Conclusion: the road control proves the implementation is sound (planar sqrt(n) separators -> 7.6x, instant build). Both embedding-derived backbones (citation small-world AND Euclidean feature kNN) are intrinsically high-treewidth (elim_h ~ 0.5n), so CCH contraction blows up regardless of separator quality or degree. The expander risk (ADR-197) is confirmed across two independent backbones. Query pruning stays ~100% effective; the cost is preprocessing. Last untested rung: hyperbolic backbone (needs real hyperbolic embeddings).
The M1 go/no-go gate ran on real public data and returned NO-GO for CCH full contraction on embedding/citation retrieval graphs. Recorded the measured evidence in ADR-199 (Empirical Outcome section), updated ADR-196/197 status, and marked the milestone tracker (M0 done, M1 NO-GO, M2-M4 not pursued). Evidence (N=1500, recall 50/50 everywhere): roadNet-PA control blowup 7.6x elim_h ~3.5 sqrt n (CCH works) ogbn-arxiv citation blowup 23.8x elim_h ~0.6 n ogbn-arxiv feature kNN blowup 42.4x elim_h ~0.56 n Implementation is sound (road control + exact recall); embedding backbones are intrinsically high-treewidth. Query pruning works (~100%); preprocessing fill-in is the blocker. ruvector-seprag retained as a validated reference.
… on linear drift) Salvage ADR-198's customizable-metric idea, decoupled from CCH, as a rigorous pre-registered head-to-head: does a fixed ANN topology + recomputed distances absorb metric drift as well as a full rebuild? examples/reweight_vs_rebuild.rs: self-contained Vamana-lite (RobustPrune + greedy beam search); drift modelled as a vector-space transform M=A^T A; sweeps diagonal AND adversarial dense-Mahalanobis (rotational) drift; A=reuse topology, B=rebuild, C=stale control. Result (n=2000 ogbn-arxiv embeddings, recall@10, pre-registered gate): A (re-weight, 0 rebuild) within 0.2% of B (full rebuild) up to 36% relevant-set churn, under both drift modes. C (stale) loses up to 29 points -> benchmark has teeth, A's parity is genuine adaptation. WIN. Honest claim: COST win at equal quality (rebuilds become free under LINEAR drift). Boundaries (ADR-200): non-linear/region-local drift + scale untested. Next: non-linear learned metric (decisive adversarial test). Also adds docs/plans/.../FUTURE-DIRECTIONS.md (4-bet backlog + prove-not-hype protocol) and ADR-200.
…otal WIN) Extend the re-weight-vs-rebuild harness with the decisive adversarial cases and close the proof: - non-linear drift mode (residual tanh warp v + s*tanh(Wv)) — removes the 'linear only' caveat; A still matches B within 0.2% up to 35% churn. - per-query distance-eval columns — A and B match within ~1%, disproving any hidden query-cost trade. Reuse is equal recall AND equal query cost. - fix display bug (C/stale was double-divided by query count; control now correctly reads 90% at t=0, validating the negative control). - drift modelled via transform closure (diag/rot/nonlin share one code path). - clippy: idiomatic char-class split in m1_arxiv reader. ADR-200 + FUTURE-DIRECTIONS updated: WIN across diagonal/rotational/non-linear drift; only open caveats are scale (n>=1e5, decisive next), region-local drift, incremental-rebuild baseline.
Extract the Vamana-lite ANN + metric-drift helpers into a reusable lib module (src/ann.rs) with an efficient two-heap greedy beam search (replaces the O(L) linear-scan beam, ~2x faster, needed for n>=1e5). Thin reweight_vs_rebuild to use it (regression-checked: identical recall/evals at n=2000). Add scale_drift example: sweeps N (5k..100k), measures recall(reuse) vs recall(rebuild) at the adversarial rotational drift point plus the rebuild-cost curve.
… widens scale_drift sweep (5k->100k, rotational drift ~40% churn, recall@10): N A reuse B rebuild gap rebuild update ratio 5000 90.2% 90.0% +0.2% 3.6s 0.001s ~3600x 10000 89.5% 90.3% -0.8% 10.2s 0.004s ~2500x 25000 88.5% 89.2% -0.7% 21.4s 0.009s ~2400x 50000 87.7% 88.6% -0.9% 47.1s 0.043s ~1100x 100000 85.0% 86.7% -1.7% 141.8s 0.035s ~4000x Verdict: WIN within the 2% gate through 100k at ~1000-4000x lower update cost, BUT the recall gap widens with N (-0.2%->-1.7%) => defer/batch rebuilds, not never-rebuild. Honest caveats: both A&B recall fall with N (fixed beam); 100 queries => ~+-1% noise, confirm trend with more queries. Also: rename Rng::next->next_u64 (clippy). ADR-200 + FUTURE-DIRECTIONS updated with scale evidence, widening-gap caveat, and a hybrid re-weight+periodic-rebuild policy as a next step.
…, gate PASS examples/region_drift.rs: warp only a 15% local cluster, grade recall separately for queries INSIDE vs OUTSIDE the drifted region (a global average would hide a local failure). Result (n=20k, recall@10): t churnIn A_in B_in | churnOut A_out B_out 0.25 44% 89.8% 81.4% | 21% 87.9% 89.0% 0.50 53% 89.3% 90.0% | 21% 87.9% 89.0% 1.00 45% 89.5% 90.0% | 21% 87.9% 89.0% Gate PASS: reuse holds inside the drifted region (A_in within 0.7% of B_in, and ABOVE it at t=0.25) even at 53% in-region churn; out-region ~unchanged. Region- local drift did NOT break reuse. Honest caveat: the t=0.25 B_in dip to 81% (reuse beats rebuild by 8pts) is a build-variance artifact of the simplified single-pass Vamana baseline, not a smooth effect — strongest argument to port the baseline to production ruvector-diskann. ADR-200 + FUTURE-DIRECTIONS + status updated.
…onfirmed examples/diskann_drift.rs: re-run the re-weight-vs-rebuild test on the shipping ruvector-diskann VamanaGraph (added as dev-dependency). The reuse trick is native - the graph stores only topology and greedy_search takes vectors externally, so drift = search a graph-built-on-original with the transformed vectors. Result (n=20k, recall@10): GLOBAL rotational: A reuse vs B rebuild within 2% (95.6 vs 97.1 worst, t=0.5) REGION-LOCAL in-region: A_in within ~1.5% of B_in; A_in 98.6 vs B_in 94.5 at t=0.25 absolute recall 96-99% (stronger/fairer baseline than lite-Vamana ~90%) Confirms BET 1 on the production index. The t=0.25 reuse-beats-rebuild dip REPRODUCED on diskann => it is a real property (fresh Vamana build on a half-warped region underperforms reuse), not lite-Vamana noise. Baseline-variance caveat RESOLVED. Remaining caveat: gap widens with scale/churn (defer/batch rebuilds). ADR-200 + FUTURE-DIRECTIONS + status updated.
…ippable examples/hybrid_policy.rs: simulate a compounding random-walk metric-drift trajectory and compare operating policies on the production diskann Vamana — always / never / periodic-K / drift-triggered (Frobenius monitor). Result (n=10k, 24 steps, aggressive random-walk drift, recall@10): always 99.1% mean 98.4% min 24 rebuilds never 94.4% mean 89.7% min 1 rebuild <- decays under heavy drift periodic-4 98.8% mean 97.9% min 6 rebuilds <- ~always at 25% cost periodic-8 98.4% mean 96.5% min 3 rebuilds <- at 12.5% cost Shippable operating point: re-weight every step + rebuild every ~4 steps recovers near-full recall at a fraction of the cost. Honest sub-finding: the drift-TRIGGERED monitor (Frobenius of cumulative transform) underperformed simple periodic — periodic-K is the recommended knob; a sampled-recall probe trigger is future work. Under gentle single-direction drift (n=5k) never did NOT decay, so the hybrid only matters under large/compounding drift. ADR-200 status/boundaries/next-steps + FUTURE-DIRECTIONS updated; stale 'n=2000/scale-unconfirmed' caveats removed (now resolved).
This was referenced Jun 4, 2026
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of #534.
A disciplined, evidence-gated exploration of CCH-inspired retrieval ("SepRAG") for hybrid
vector + KG memory. Produces one honest negative (CCH contraction ruled out for
embedding graphs) and one proven, shippable positive (customizable re-weighting of a
fixed ANN topology under metric drift). Docs + a reference crate + six reproducible
experiment harnesses; no changes to existing crates.
Measured impact (proven, with scope)
A new capability — adapt the ANN index to a changed relevance metric by re-weighting
the existing graph instead of rebuilding. Does not change query latency or absolute
recall of existing search; it removes the rebuild cost a metric change otherwise incurs.
ruvector-diskann)Update-cost ratio ~1,000–4,000× across n=5k–100k (rebuild is super-linear). Not yet
proven: drift is synthetic (parametric), not a live
ruvector-gnnlearned-metrictrajectory; production-loop integration is backlog item #1.
What's here
docs/adr/) — full decision record.docs/plans/seprag-cch-retrieval/— milestone plans +FUTURE-DIRECTIONS.md(backlogcrates/ruvector-seprag/— reference CCH nested-dissection + separator-tree k-NN, ashared Vamana-lite ANN engine (
src/ann.rs), and 6 harnesses. Unit tests + clippy green.ruvector-sepragto the workspace;ruvector-diskannused only as a dev-dependency(one harness confirms results on the production index).
Finding 1 — CCH full-contraction is NO-GO for embedding retrieval (measured, ADR-199)
Implementation validated (separator-tree k-NN == Dijkstra oracle on toy + real graphs), then:
Road control proves the code is correct; embedding-derived backbones are intrinsically
high-treewidth → contraction blows up. (HNSW already handles embedding kNN.)
Finding 2 — Customizable re-weighting is a WIN (ADR-200)
In a self-learning store the metric drifts. Instead of rebuilding the ANN index, reuse the
topology and re-score under the new metric. On the production
ruvector-diskannVamana:region-local / compounding drift, up to n=100k, at ~1,000–4,000× lower update cost.
94.4% never, at 25% of the rebuild cost.
never); the drift-triggered monitor underperformed simple periodic.
Testing
cargo test -p ruvector-seprag(8 + doctest green),cargo clippy -p ruvector-sepragclean.Experiment harnesses run on public data (ogbn-arxiv, SNAP roadNet-PA); each prints its
pre-registered win/kill gate.
Scope / safety
Additive only — new crate + docs; no existing crate modified.
ruvector-diskannis adev-dependency (examples only), not a runtime dependency.