SepRAG BET 3 — curated-KG treewidth probe → NO-GO (finding, ADR-203)#538
Open
shaal wants to merge 18 commits into
Open
SepRAG BET 3 — curated-KG treewidth probe → NO-GO (finding, ADR-203)#538shaal wants to merge 18 commits into
shaal wants to merge 18 commits into
Conversation
Add design ADRs and milestone plans for adapting Customizable Contraction Hierarchies (nested dissection, separators, contraction shortcuts, elimination trees, separator-tree k-NN) to RuVector's hybrid vector + knowledge-graph memory. ADRs: - ADR-196: SepRAG keystone (separator-tree retrieval; complements HNSW/DiskANN) - ADR-197: navigation-graph construction + metric-independent ND ordering - ADR-198: customizable metric layer (CCH customization <-> GNN self-learning loop) - ADR-199: public-corpus benchmark & evaluation harness Plans (docs/plans/seprag-cch-retrieval/): M0 correctness gate -> M1 blowup go/no-go on ogbn-arxiv -> M2 customization -> M3 full hybrid -> M4 integration. Maps decisions onto existing crates (ruvector-mincut/jtree, solver/bmssp, sparsifier, diskann, gnn, attn-mincut).
…aphs New crate ruvector-seprag implementing the SepRAG M0 milestone (docs/plans/seprag-cch-retrieval/M0-correctness-gate.md): - graph: undirected weighted graph + Dijkstra brute-force k-NN oracle - order: nested-dissection ordering via BFS-layer separators + separator tree - contraction: metric-free symbolic contraction -> chordal upward graph + elim tree - customize: bottom-up triangle-sweep shortcut weighting (re-runnable per metric) - query: upward search, exhaustive CCH k-NN, bucket-based branch-and-bound k-NN with admissible early-stop and search-space accounting - gen: deterministic SBM/grid/path/clique generators (SplitMix64) - examples/blowup_report: M0->M1 diagnostic (blowup ratio, elim height, pruning) Tests (8 + doctest, all green): SepRAG k-NN == Dijkstra oracle on SBM/grid/ path/clique; pruned == unpruned (pruning sound); pruning reduces search space; determinism; bounded blowup. cargo clippy clean. Finding: query pruning eliminates 95-100% of scans regardless of structure; blowup/elim-height are dominated by separator quality, and the naive BFS separator degenerates on low-diameter dense graphs (SBM 18.6x) — motivating the ruvector-mincut separator swap planned for M1 (ADR-197).
…ion subgraph examples/m1_arxiv: ingest the real ogbn-arxiv edge list (169K nodes / 1.16M edges), induce a connected BFS-ball subgraph, build the SepRAG hierarchy, and report the ADR-199 go/no-go metrics (blowup ratio, elim-tree height, build time) plus a sampled Dijkstra-oracle recall check. First-pass result (N=1500 citation-only subgraph, M0 BFS separator): - recall 50/50 vs Dijkstra oracle -> CORRECTNESS HOLDS on real data - query pruning saves ~100% of scans (364 vs 418555 bucket scans/query) - BUT blowup 56.9x and elim height ~= n -> the BFS separator degenerates completely on small-world citation graphs (picks a giant BFS layer as the separator), and the raw citation ball is dense (avg degree ~24). Verdict (ADR-199 fallback ladder): NO-GO for the naive separator + dense backbone. Trustworthy precisely because M0 proved the algorithm correct, so 57x is a separator-quality/backbone artifact, not a SepRAG refutation. Next: ruvector-mincut balanced separators + alpha-pruned sparse backbone (ADR-197).
…tion order.rs: add SeparatorKind::Balanced (grow half-size region, take only its boundary) alongside the M0 BfsLayer strategy; make Balanced the default. lib.rs: SepRag::build_with(graph, kind). m1_arxiv: add max_degree backbone sparsification + separator-kind args for A/B attribution. M1 attribution (ogbn-arxiv N=1500 citation BFS-ball): raw + layer blowup 56.9x elim_h 1443 build 56s raw + balanced blowup 23.8x elim_h 941 build 13s <- best deg<=10 + layer blowup 89.6x elim_h 1295 build 38s deg<=10 + bal blowup 60.1x elim_h 1035 build 18s (recall 50/50 and ~100% query pruning in all configs) Findings: (1) balanced separator is a real win (2.4x less fill, 4x faster); (2) hub-dampening degree-bound BACKFIRES (shrinks denominator faster than |G+|, destroys good cuts) — discard it; (3) even best config leaves 23.8x / elim_h~0.6n: the dense small-world citation ball is intrinsically high-treewidth (ADR-197 expander risk, measured). Next: feature-manifold/hyperbolic backbone, not the citation topology.
m1_arxiv: robust edge reader (skip # comments, comma/tab/space separators) so SNAP road networks load through the same harness. m1_manifold: alpha-pruned kNN backbone over real ogbn-arxiv 128-d node features (Vamana RobustPrune) — the decisive ADR-197 thesis test. Results (N=1500, balanced separator, recall 50/50 everywhere): roadNet-PA blowup 7.6x elim_h 136 (~3.5 sqrt n) <- CCH works citation (arxiv) blowup 23.8x elim_h 941 (~0.6 n) feature-manifold k10 blowup 42.4x elim_h 837 (~0.56 n) <- worse feature-manifold k6 blowup 45.1x elim_h 699 (~0.47 n) Conclusion: the road control proves the implementation is sound (planar sqrt(n) separators -> 7.6x, instant build). Both embedding-derived backbones (citation small-world AND Euclidean feature kNN) are intrinsically high-treewidth (elim_h ~ 0.5n), so CCH contraction blows up regardless of separator quality or degree. The expander risk (ADR-197) is confirmed across two independent backbones. Query pruning stays ~100% effective; the cost is preprocessing. Last untested rung: hyperbolic backbone (needs real hyperbolic embeddings).
The M1 go/no-go gate ran on real public data and returned NO-GO for CCH full contraction on embedding/citation retrieval graphs. Recorded the measured evidence in ADR-199 (Empirical Outcome section), updated ADR-196/197 status, and marked the milestone tracker (M0 done, M1 NO-GO, M2-M4 not pursued). Evidence (N=1500, recall 50/50 everywhere): roadNet-PA control blowup 7.6x elim_h ~3.5 sqrt n (CCH works) ogbn-arxiv citation blowup 23.8x elim_h ~0.6 n ogbn-arxiv feature kNN blowup 42.4x elim_h ~0.56 n Implementation is sound (road control + exact recall); embedding backbones are intrinsically high-treewidth. Query pruning works (~100%); preprocessing fill-in is the blocker. ruvector-seprag retained as a validated reference.
… on linear drift) Salvage ADR-198's customizable-metric idea, decoupled from CCH, as a rigorous pre-registered head-to-head: does a fixed ANN topology + recomputed distances absorb metric drift as well as a full rebuild? examples/reweight_vs_rebuild.rs: self-contained Vamana-lite (RobustPrune + greedy beam search); drift modelled as a vector-space transform M=A^T A; sweeps diagonal AND adversarial dense-Mahalanobis (rotational) drift; A=reuse topology, B=rebuild, C=stale control. Result (n=2000 ogbn-arxiv embeddings, recall@10, pre-registered gate): A (re-weight, 0 rebuild) within 0.2% of B (full rebuild) up to 36% relevant-set churn, under both drift modes. C (stale) loses up to 29 points -> benchmark has teeth, A's parity is genuine adaptation. WIN. Honest claim: COST win at equal quality (rebuilds become free under LINEAR drift). Boundaries (ADR-200): non-linear/region-local drift + scale untested. Next: non-linear learned metric (decisive adversarial test). Also adds docs/plans/.../FUTURE-DIRECTIONS.md (4-bet backlog + prove-not-hype protocol) and ADR-200.
…otal WIN) Extend the re-weight-vs-rebuild harness with the decisive adversarial cases and close the proof: - non-linear drift mode (residual tanh warp v + s*tanh(Wv)) — removes the 'linear only' caveat; A still matches B within 0.2% up to 35% churn. - per-query distance-eval columns — A and B match within ~1%, disproving any hidden query-cost trade. Reuse is equal recall AND equal query cost. - fix display bug (C/stale was double-divided by query count; control now correctly reads 90% at t=0, validating the negative control). - drift modelled via transform closure (diag/rot/nonlin share one code path). - clippy: idiomatic char-class split in m1_arxiv reader. ADR-200 + FUTURE-DIRECTIONS updated: WIN across diagonal/rotational/non-linear drift; only open caveats are scale (n>=1e5, decisive next), region-local drift, incremental-rebuild baseline.
Extract the Vamana-lite ANN + metric-drift helpers into a reusable lib module (src/ann.rs) with an efficient two-heap greedy beam search (replaces the O(L) linear-scan beam, ~2x faster, needed for n>=1e5). Thin reweight_vs_rebuild to use it (regression-checked: identical recall/evals at n=2000). Add scale_drift example: sweeps N (5k..100k), measures recall(reuse) vs recall(rebuild) at the adversarial rotational drift point plus the rebuild-cost curve.
… widens scale_drift sweep (5k->100k, rotational drift ~40% churn, recall@10): N A reuse B rebuild gap rebuild update ratio 5000 90.2% 90.0% +0.2% 3.6s 0.001s ~3600x 10000 89.5% 90.3% -0.8% 10.2s 0.004s ~2500x 25000 88.5% 89.2% -0.7% 21.4s 0.009s ~2400x 50000 87.7% 88.6% -0.9% 47.1s 0.043s ~1100x 100000 85.0% 86.7% -1.7% 141.8s 0.035s ~4000x Verdict: WIN within the 2% gate through 100k at ~1000-4000x lower update cost, BUT the recall gap widens with N (-0.2%->-1.7%) => defer/batch rebuilds, not never-rebuild. Honest caveats: both A&B recall fall with N (fixed beam); 100 queries => ~+-1% noise, confirm trend with more queries. Also: rename Rng::next->next_u64 (clippy). ADR-200 + FUTURE-DIRECTIONS updated with scale evidence, widening-gap caveat, and a hybrid re-weight+periodic-rebuild policy as a next step.
…, gate PASS examples/region_drift.rs: warp only a 15% local cluster, grade recall separately for queries INSIDE vs OUTSIDE the drifted region (a global average would hide a local failure). Result (n=20k, recall@10): t churnIn A_in B_in | churnOut A_out B_out 0.25 44% 89.8% 81.4% | 21% 87.9% 89.0% 0.50 53% 89.3% 90.0% | 21% 87.9% 89.0% 1.00 45% 89.5% 90.0% | 21% 87.9% 89.0% Gate PASS: reuse holds inside the drifted region (A_in within 0.7% of B_in, and ABOVE it at t=0.25) even at 53% in-region churn; out-region ~unchanged. Region- local drift did NOT break reuse. Honest caveat: the t=0.25 B_in dip to 81% (reuse beats rebuild by 8pts) is a build-variance artifact of the simplified single-pass Vamana baseline, not a smooth effect — strongest argument to port the baseline to production ruvector-diskann. ADR-200 + FUTURE-DIRECTIONS + status updated.
…onfirmed examples/diskann_drift.rs: re-run the re-weight-vs-rebuild test on the shipping ruvector-diskann VamanaGraph (added as dev-dependency). The reuse trick is native - the graph stores only topology and greedy_search takes vectors externally, so drift = search a graph-built-on-original with the transformed vectors. Result (n=20k, recall@10): GLOBAL rotational: A reuse vs B rebuild within 2% (95.6 vs 97.1 worst, t=0.5) REGION-LOCAL in-region: A_in within ~1.5% of B_in; A_in 98.6 vs B_in 94.5 at t=0.25 absolute recall 96-99% (stronger/fairer baseline than lite-Vamana ~90%) Confirms BET 1 on the production index. The t=0.25 reuse-beats-rebuild dip REPRODUCED on diskann => it is a real property (fresh Vamana build on a half-warped region underperforms reuse), not lite-Vamana noise. Baseline-variance caveat RESOLVED. Remaining caveat: gap widens with scale/churn (defer/batch rebuilds). ADR-200 + FUTURE-DIRECTIONS + status updated.
…ippable examples/hybrid_policy.rs: simulate a compounding random-walk metric-drift trajectory and compare operating policies on the production diskann Vamana — always / never / periodic-K / drift-triggered (Frobenius monitor). Result (n=10k, 24 steps, aggressive random-walk drift, recall@10): always 99.1% mean 98.4% min 24 rebuilds never 94.4% mean 89.7% min 1 rebuild <- decays under heavy drift periodic-4 98.8% mean 97.9% min 6 rebuilds <- ~always at 25% cost periodic-8 98.4% mean 96.5% min 3 rebuilds <- at 12.5% cost Shippable operating point: re-weight every step + rebuild every ~4 steps recovers near-full recall at a fraction of the cost. Honest sub-finding: the drift-TRIGGERED monitor (Frobenius of cumulative transform) underperformed simple periodic — periodic-K is the recommended knob; a sampled-recall probe trigger is future work. Under gentle single-direction drift (n=5k) never did NOT decay, so the hybrid only matters under large/compounding drift. ADR-200 status/boundaries/next-steps + FUTURE-DIRECTIONS updated; stale 'n=2000/scale-unconfirmed' caveats removed (now resolved).
BET 3 go/no-go gate, frozen before any data: probe whether a curated bounded-degree KG (WN18RR / FB15k-237 / 2WikiMultiHop, the backbone under HotpotQA/MuSiQue) is low-treewidth enough for CCH contraction/build to stay cheap — the last untested backbone for the salvaged separator-tree kernel (treewidth-independent at query, but build blew up on high-treewidth embedding/ citation graphs, ADR-199). Primary metric: elim-tree-height exponent p (elim_h ~ n^p). GO p<=0.6 AND blowup<=10x; KILL p>=0.8 OR blowup>=23x; inconclusive 0.6<p<0.8; VOID if the roadNet-PA control fails to reproduce ~sqrt(n)/7.6x. Adversarial upgrade over ADR-199: add a minor-min-width treewidth LOWER bound + run both separators so a NO-GO is structurally certain, not heuristic-limited. QA harness NOT built unless GO. Branch off PR ruvnet#535 (where ruvector-seprag lives); re-home if GO. Refs ruvnet#534
…control New example kg_treewidth_probe.rs (reuses SepRag::build_with, blowup_ratio, elim_depth, Dijkstra oracle from the validated kernel). Adds only what the frozen gate needs: generic KG/triple loader (WordNet/Freebase/Wikidata), scale sweep with OLS log-log exponent fit (elim_h ~ n^p, the primary metric), both separators, a minor-min-width treewidth LOWER bound (the adversarial upgrade over ADR-199), and an adaptive build-time budget so high-treewidth backbones reveal themselves at moderate n without runaway contraction. Run prints every KG bracketed between the two calibrated controls and applies the frozen gate automatically. Refs ruvnet#534
All three curated KGs (WN18RR/FB15k-237/CoDEx-L = WordNet/Freebase/Wikidata) fail the frozen treewidth gate, each with a distinct signature the conjunction gate caught: WN18RR blowup 59.9x (low exponent but huge absolute height), FB15k-237 elim_h~0.46n + tw>=42, CoDEx-L exponent p=1.83 (tree-like periphery, hub-dense core collapses treewidth — a false-GO the scaling metric caught). Minor-min-width LOWER bounds (28-44) are 7-11x the road control's (4) => structurally certain, not heuristic. Cause: KGs are small-world WITH hubs (max deg 520/1999/4999 vs road 9). Combined with ADR-199 this closes the CCH-contraction line. Query kernel stays exact (recall 30/30) + treewidth- independent; QA harness correctly never built. Records 3 honest deviations from the frozen pre-reg (none changed a verdict). Refs ruvnet#534
Open
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SepRAG BET 3 — curated-KG treewidth probe → NO-GO (a finding, not a feature)
This is a reportable finding, not a merge request — no urgency, framed like #536.
It records the third KILL in the SepRAG thread (issue #534) and closes the
CCH-full-contraction line.
TL;DR
The salvaged separator-tree branch-and-bound k-NN kernel is validated and
treewidth-independent at query time (recall stays exact), but CCH contraction/build
blew up on high-treewidth embedding/citation graphs (ADR-199). BET 3 tested the last
untested backbone — a curated, bounded-degree knowledge graph (Wikidata-style). It was
probed first (cheap treewidth go/no-go gate) before any multi-hop-QA harness.
Verdict: NO-GO. All three curated KGs — WN18RR (WordNet), FB15k-237 (Freebase),
CoDEx-L (genuine Wikidata) — are high-treewidth. Scoreboard: 1 WIN, 3 KILLS.
Why it's a robust NO-GO
lower bounds (28–44) are 7–11× the road control's (4) and on par with the
citation reference — a lower bound guarantees treewidth is at least that large. This
is the adversarial upgrade over ADR-199 (which used the separator upper bound only).
exponent but huge absolute blowup (the "judge absolute height" lesson). FB15k-237:
near-linear elim_h. CoDEx: tree-like periphery (5.1×@2k) whose hub-dense core
collapses treewidth (p=1.83) — a single-n blowup snapshot would have been a false GO;
the scaling exponent (the pre-registered primary metric) caught it.
Hubs kill balanced separators regardless of average degree. Road-like low treewidth
needs near-planar geometric locality, which no semantic graph has.
What's here (review only the 3
bet3commits)Prove-not-hype protocol honored: frozen pre-registered gate, calibrated road control,
treewidth lower bound as adversarial check, 3 honest deviations recorded (none changed a
verdict). The multi-hop QA benchmark was correctly never built — the cheap probe
killed the backbone first.
Refs #534. Writeup:
docs/adr/ADR-203-curated-kg-treewidth-probe.md.