Skip to content

SepRAG: CCH-inspired retrieval exploration + customizable re-weighting (ADRs 196-200, ruvector-seprag)#535

Open
shaal wants to merge 15 commits into
ruvnet:mainfrom
shaal:docs/seprag-cch-retrieval
Open

SepRAG: CCH-inspired retrieval exploration + customizable re-weighting (ADRs 196-200, ruvector-seprag)#535
shaal wants to merge 15 commits into
ruvnet:mainfrom
shaal:docs/seprag-cch-retrieval

Conversation

@shaal
Copy link
Copy Markdown
Contributor

@shaal shaal commented Jun 4, 2026

Part of #534.

A disciplined, evidence-gated exploration of CCH-inspired retrieval ("SepRAG") for hybrid
vector + KG memory. Produces one honest negative (CCH contraction ruled out for
embedding graphs) and one proven, shippable positive (customizable re-weighting of a
fixed ANN topology under metric drift). Docs + a reference crate + six reproducible
experiment harnesses; no changes to existing crates.

Measured impact (proven, with scope)

A new capability — adapt the ANN index to a changed relevance metric by re-weighting
the existing graph instead of rebuilding. Does not change query latency or absolute
recall of existing search; it removes the rebuild cost a metric change otherwise incurs.

full rebuild (current) re-weight (this work)
Update on metric change, n=100k (reference Vamana) 141.8s 0.035s (~4,000× lower)
Update on metric change, n=20k (production ruvector-diskann) ~7s ~0.01s; recall within 2%
Recall@10 vs rebuild (diag/rot/non-linear/region-local/compounding, ≤100k) baseline within 2%
Hybrid (re-weight + rebuild every ~4 steps) 99.1% @ 24 rebuilds 98.8% @ 6 rebuilds (25% cost)

Update-cost ratio ~1,000–4,000× across n=5k–100k (rebuild is super-linear). Not yet
proven:
drift is synthetic (parametric), not a live ruvector-gnn learned-metric
trajectory; production-loop integration is backlog item #1.

What's here

  • ADR-196…200 (docs/adr/) — full decision record.
  • docs/plans/seprag-cch-retrieval/ — milestone plans + FUTURE-DIRECTIONS.md (backlog
    • "prove-not-hype" protocol).
  • crates/ruvector-seprag/ — reference CCH nested-dissection + separator-tree k-NN, a
    shared Vamana-lite ANN engine (src/ann.rs), and 6 harnesses. Unit tests + clippy green.
  • Adds ruvector-seprag to the workspace; ruvector-diskann used only as a dev-dependency
    (one harness confirms results on the production index).

Finding 1 — CCH full-contraction is NO-GO for embedding retrieval (measured, ADR-199)

Implementation validated (separator-tree k-NN == Dijkstra oracle on toy + real graphs), then:

Backbone (n=1500) shortcut blowup elim-tree height
roadNet-PA (control) 7.6× ~3.5·√n
ogbn-arxiv citation 23.8× ~0.6·n
ogbn-arxiv feature kNN 42.4× ~0.56·n

Road control proves the code is correct; embedding-derived backbones are intrinsically
high-treewidth → contraction blows up. (HNSW already handles embedding kNN.)

Finding 2 — Customizable re-weighting is a WIN (ADR-200)

In a self-learning store the metric drifts. Instead of rebuilding the ANN index, reuse the
topology and re-score under the new metric.
On the production ruvector-diskann Vamana:

  • Recall within 2% of full rebuild across diagonal / rotational / non-linear /
    region-local / compounding drift, up to n=100k, at ~1,000–4,000× lower update cost.
  • Hybrid policy (re-weight every step + rebuild every ~4): 98.8% recall vs 99.1% always /
    94.4% never, at 25% of the rebuild cost.
  • Honest caveats kept in the ADR: gap widens with scale/churn (defer/batch rebuilds, not
    never); the drift-triggered monitor underperformed simple periodic.

Testing

cargo test -p ruvector-seprag (8 + doctest green), cargo clippy -p ruvector-seprag clean.
Experiment harnesses run on public data (ogbn-arxiv, SNAP roadNet-PA); each prints its
pre-registered win/kill gate.

Scope / safety

Additive only — new crate + docs; no existing crate modified. ruvector-diskann is a
dev-dependency (examples only), not a runtime dependency.

shaal added 15 commits June 4, 2026 02:11
Add design ADRs and milestone plans for adapting Customizable Contraction
Hierarchies (nested dissection, separators, contraction shortcuts, elimination
trees, separator-tree k-NN) to RuVector's hybrid vector + knowledge-graph memory.

ADRs:
- ADR-196: SepRAG keystone (separator-tree retrieval; complements HNSW/DiskANN)
- ADR-197: navigation-graph construction + metric-independent ND ordering
- ADR-198: customizable metric layer (CCH customization <-> GNN self-learning loop)
- ADR-199: public-corpus benchmark & evaluation harness

Plans (docs/plans/seprag-cch-retrieval/): M0 correctness gate -> M1 blowup
go/no-go on ogbn-arxiv -> M2 customization -> M3 full hybrid -> M4 integration.
Maps decisions onto existing crates (ruvector-mincut/jtree, solver/bmssp,
sparsifier, diskann, gnn, attn-mincut).
…aphs

New crate ruvector-seprag implementing the SepRAG M0 milestone
(docs/plans/seprag-cch-retrieval/M0-correctness-gate.md):

- graph: undirected weighted graph + Dijkstra brute-force k-NN oracle
- order: nested-dissection ordering via BFS-layer separators + separator tree
- contraction: metric-free symbolic contraction -> chordal upward graph + elim tree
- customize: bottom-up triangle-sweep shortcut weighting (re-runnable per metric)
- query: upward search, exhaustive CCH k-NN, bucket-based branch-and-bound k-NN
  with admissible early-stop and search-space accounting
- gen: deterministic SBM/grid/path/clique generators (SplitMix64)
- examples/blowup_report: M0->M1 diagnostic (blowup ratio, elim height, pruning)

Tests (8 + doctest, all green): SepRAG k-NN == Dijkstra oracle on SBM/grid/
path/clique; pruned == unpruned (pruning sound); pruning reduces search space;
determinism; bounded blowup. cargo clippy clean.

Finding: query pruning eliminates 95-100% of scans regardless of structure;
blowup/elim-height are dominated by separator quality, and the naive BFS
separator degenerates on low-diameter dense graphs (SBM 18.6x) — motivating the
ruvector-mincut separator swap planned for M1 (ADR-197).
…ion subgraph

examples/m1_arxiv: ingest the real ogbn-arxiv edge list (169K nodes / 1.16M
edges), induce a connected BFS-ball subgraph, build the SepRAG hierarchy, and
report the ADR-199 go/no-go metrics (blowup ratio, elim-tree height, build
time) plus a sampled Dijkstra-oracle recall check.

First-pass result (N=1500 citation-only subgraph, M0 BFS separator):
- recall 50/50 vs Dijkstra oracle  -> CORRECTNESS HOLDS on real data
- query pruning saves ~100% of scans (364 vs 418555 bucket scans/query)
- BUT blowup 56.9x and elim height ~= n  -> the BFS separator degenerates
  completely on small-world citation graphs (picks a giant BFS layer as the
  separator), and the raw citation ball is dense (avg degree ~24).

Verdict (ADR-199 fallback ladder): NO-GO for the naive separator + dense
backbone. Trustworthy precisely because M0 proved the algorithm correct, so 57x
is a separator-quality/backbone artifact, not a SepRAG refutation. Next:
ruvector-mincut balanced separators + alpha-pruned sparse backbone (ADR-197).
…tion

order.rs: add SeparatorKind::Balanced (grow half-size region, take only its
boundary) alongside the M0 BfsLayer strategy; make Balanced the default.
lib.rs: SepRag::build_with(graph, kind). m1_arxiv: add max_degree backbone
sparsification + separator-kind args for A/B attribution.

M1 attribution (ogbn-arxiv N=1500 citation BFS-ball):
  raw + layer      blowup 56.9x  elim_h 1443  build 56s
  raw + balanced   blowup 23.8x  elim_h  941  build 13s   <- best
  deg<=10 + layer  blowup 89.6x  elim_h 1295  build 38s
  deg<=10 + bal    blowup 60.1x  elim_h 1035  build 18s
  (recall 50/50 and ~100% query pruning in all configs)

Findings: (1) balanced separator is a real win (2.4x less fill, 4x faster);
(2) hub-dampening degree-bound BACKFIRES (shrinks denominator faster than
|G+|, destroys good cuts) — discard it; (3) even best config leaves 23.8x /
elim_h~0.6n: the dense small-world citation ball is intrinsically high-treewidth
(ADR-197 expander risk, measured). Next: feature-manifold/hyperbolic backbone,
not the citation topology.
m1_arxiv: robust edge reader (skip # comments, comma/tab/space separators) so
SNAP road networks load through the same harness.
m1_manifold: alpha-pruned kNN backbone over real ogbn-arxiv 128-d node features
(Vamana RobustPrune) — the decisive ADR-197 thesis test.

Results (N=1500, balanced separator, recall 50/50 everywhere):
  roadNet-PA          blowup  7.6x  elim_h 136 (~3.5 sqrt n)   <- CCH works
  citation (arxiv)    blowup 23.8x  elim_h 941 (~0.6 n)
  feature-manifold k10 blowup 42.4x elim_h 837 (~0.56 n)       <- worse
  feature-manifold k6  blowup 45.1x elim_h 699 (~0.47 n)

Conclusion: the road control proves the implementation is sound (planar sqrt(n)
separators -> 7.6x, instant build). Both embedding-derived backbones (citation
small-world AND Euclidean feature kNN) are intrinsically high-treewidth
(elim_h ~ 0.5n), so CCH contraction blows up regardless of separator quality or
degree. The expander risk (ADR-197) is confirmed across two independent
backbones. Query pruning stays ~100% effective; the cost is preprocessing.
Last untested rung: hyperbolic backbone (needs real hyperbolic embeddings).
The M1 go/no-go gate ran on real public data and returned NO-GO for CCH full
contraction on embedding/citation retrieval graphs. Recorded the measured
evidence in ADR-199 (Empirical Outcome section), updated ADR-196/197 status, and
marked the milestone tracker (M0 done, M1 NO-GO, M2-M4 not pursued).

Evidence (N=1500, recall 50/50 everywhere):
  roadNet-PA control       blowup  7.6x  elim_h ~3.5 sqrt n   (CCH works)
  ogbn-arxiv citation      blowup 23.8x  elim_h ~0.6 n
  ogbn-arxiv feature kNN   blowup 42.4x  elim_h ~0.56 n

Implementation is sound (road control + exact recall); embedding backbones are
intrinsically high-treewidth. Query pruning works (~100%); preprocessing
fill-in is the blocker. ruvector-seprag retained as a validated reference.
… on linear drift)

Salvage ADR-198's customizable-metric idea, decoupled from CCH, as a rigorous
pre-registered head-to-head: does a fixed ANN topology + recomputed distances
absorb metric drift as well as a full rebuild?

examples/reweight_vs_rebuild.rs: self-contained Vamana-lite (RobustPrune +
greedy beam search); drift modelled as a vector-space transform M=A^T A; sweeps
diagonal AND adversarial dense-Mahalanobis (rotational) drift; A=reuse topology,
B=rebuild, C=stale control.

Result (n=2000 ogbn-arxiv embeddings, recall@10, pre-registered gate):
  A (re-weight, 0 rebuild) within 0.2% of B (full rebuild) up to 36% relevant-set
  churn, under both drift modes. C (stale) loses up to 29 points -> benchmark has
  teeth, A's parity is genuine adaptation. WIN.

Honest claim: COST win at equal quality (rebuilds become free under LINEAR drift).
Boundaries (ADR-200): non-linear/region-local drift + scale untested. Next:
non-linear learned metric (decisive adversarial test).

Also adds docs/plans/.../FUTURE-DIRECTIONS.md (4-bet backlog + prove-not-hype
protocol) and ADR-200.
…otal WIN)

Extend the re-weight-vs-rebuild harness with the decisive adversarial cases and
close the proof:
- non-linear drift mode (residual tanh warp v + s*tanh(Wv)) — removes the
  'linear only' caveat; A still matches B within 0.2% up to 35% churn.
- per-query distance-eval columns — A and B match within ~1%, disproving any
  hidden query-cost trade. Reuse is equal recall AND equal query cost.
- fix display bug (C/stale was double-divided by query count; control now
  correctly reads 90% at t=0, validating the negative control).
- drift modelled via transform closure (diag/rot/nonlin share one code path).
- clippy: idiomatic char-class split in m1_arxiv reader.

ADR-200 + FUTURE-DIRECTIONS updated: WIN across diagonal/rotational/non-linear
drift; only open caveats are scale (n>=1e5, decisive next), region-local drift,
incremental-rebuild baseline.
Extract the Vamana-lite ANN + metric-drift helpers into a reusable lib module
(src/ann.rs) with an efficient two-heap greedy beam search (replaces the O(L)
linear-scan beam, ~2x faster, needed for n>=1e5). Thin reweight_vs_rebuild to
use it (regression-checked: identical recall/evals at n=2000). Add scale_drift
example: sweeps N (5k..100k), measures recall(reuse) vs recall(rebuild) at the
adversarial rotational drift point plus the rebuild-cost curve.
… widens

scale_drift sweep (5k->100k, rotational drift ~40% churn, recall@10):
  N        A reuse  B rebuild  gap     rebuild   update    ratio
  5000     90.2%    90.0%      +0.2%   3.6s      0.001s    ~3600x
  10000    89.5%    90.3%      -0.8%   10.2s     0.004s    ~2500x
  25000    88.5%    89.2%      -0.7%   21.4s     0.009s    ~2400x
  50000    87.7%    88.6%      -0.9%   47.1s     0.043s    ~1100x
  100000   85.0%    86.7%      -1.7%   141.8s    0.035s    ~4000x

Verdict: WIN within the 2% gate through 100k at ~1000-4000x lower update cost,
BUT the recall gap widens with N (-0.2%->-1.7%) => defer/batch rebuilds, not
never-rebuild. Honest caveats: both A&B recall fall with N (fixed beam); 100
queries => ~+-1% noise, confirm trend with more queries.

Also: rename Rng::next->next_u64 (clippy). ADR-200 + FUTURE-DIRECTIONS updated
with scale evidence, widening-gap caveat, and a hybrid re-weight+periodic-rebuild
policy as a next step.
…, gate PASS

examples/region_drift.rs: warp only a 15% local cluster, grade recall separately
for queries INSIDE vs OUTSIDE the drifted region (a global average would hide a
local failure).

Result (n=20k, recall@10):
  t      churnIn  A_in   B_in  | churnOut A_out  B_out
  0.25   44%      89.8%  81.4% | 21%      87.9%  89.0%
  0.50   53%      89.3%  90.0% | 21%      87.9%  89.0%
  1.00   45%      89.5%  90.0% | 21%      87.9%  89.0%

Gate PASS: reuse holds inside the drifted region (A_in within 0.7% of B_in, and
ABOVE it at t=0.25) even at 53% in-region churn; out-region ~unchanged. Region-
local drift did NOT break reuse.

Honest caveat: the t=0.25 B_in dip to 81% (reuse beats rebuild by 8pts) is a
build-variance artifact of the simplified single-pass Vamana baseline, not a
smooth effect — strongest argument to port the baseline to production
ruvector-diskann. ADR-200 + FUTURE-DIRECTIONS + status updated.
…onfirmed

examples/diskann_drift.rs: re-run the re-weight-vs-rebuild test on the shipping
ruvector-diskann VamanaGraph (added as dev-dependency). The reuse trick is native
- the graph stores only topology and greedy_search takes vectors externally, so
drift = search a graph-built-on-original with the transformed vectors.

Result (n=20k, recall@10):
  GLOBAL rotational: A reuse vs B rebuild within 2% (95.6 vs 97.1 worst, t=0.5)
  REGION-LOCAL in-region: A_in within ~1.5% of B_in; A_in 98.6 vs B_in 94.5 at t=0.25
  absolute recall 96-99% (stronger/fairer baseline than lite-Vamana ~90%)

Confirms BET 1 on the production index. The t=0.25 reuse-beats-rebuild dip
REPRODUCED on diskann => it is a real property (fresh Vamana build on a
half-warped region underperforms reuse), not lite-Vamana noise. Baseline-variance
caveat RESOLVED. Remaining caveat: gap widens with scale/churn (defer/batch
rebuilds). ADR-200 + FUTURE-DIRECTIONS + status updated.
…ippable

examples/hybrid_policy.rs: simulate a compounding random-walk metric-drift
trajectory and compare operating policies on the production diskann Vamana —
always / never / periodic-K / drift-triggered (Frobenius monitor).

Result (n=10k, 24 steps, aggressive random-walk drift, recall@10):
  always       99.1% mean  98.4% min   24 rebuilds
  never        94.4% mean  89.7% min    1 rebuild   <- decays under heavy drift
  periodic-4   98.8% mean  97.9% min    6 rebuilds  <- ~always at 25% cost
  periodic-8   98.4% mean  96.5% min    3 rebuilds  <- at 12.5% cost

Shippable operating point: re-weight every step + rebuild every ~4 steps recovers
near-full recall at a fraction of the cost. Honest sub-finding: the drift-TRIGGERED
monitor (Frobenius of cumulative transform) underperformed simple periodic —
periodic-K is the recommended knob; a sampled-recall probe trigger is future work.
Under gentle single-direction drift (n=5k) never did NOT decay, so the hybrid only
matters under large/compounding drift.

ADR-200 status/boundaries/next-steps + FUTURE-DIRECTIONS updated; stale
'n=2000/scale-unconfirmed' caveats removed (now resolved).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant