SepRAG BET 3 — curated-KG treewidth probe → NO-GO (finding, ADR-203) by shaal · Pull Request #538 · ruvnet/RuVector

shaal · 2026-06-05T01:51:09Z

SepRAG BET 3 — curated-KG treewidth probe → NO-GO (a finding, not a feature)

This is a reportable finding, not a merge request — no urgency, framed like #536.
It records the third KILL in the SepRAG thread (issue #534) and closes the
CCH-full-contraction line.

TL;DR

The salvaged separator-tree branch-and-bound k-NN kernel is validated and
treewidth-independent at query time (recall stays exact), but CCH contraction/build
blew up on high-treewidth embedding/citation graphs (ADR-199). BET 3 tested the last
untested backbone — a curated, bounded-degree knowledge graph (Wikidata-style). It was
probed first (cheap treewidth go/no-go gate) before any multi-hop-QA harness.

Verdict: NO-GO. All three curated KGs — WN18RR (WordNet), FB15k-237 (Freebase),
CoDEx-L (genuine Wikidata) — are high-treewidth. Scoreboard: 1 WIN, 3 KILLS.

Backbone	role	p (exp)	blow@ref(2k)	elim_h/n	tw lower bound	verdict
roadNet-PA	control	0.613	7.2×	0.04	tw ≥ 4	calibrated ✓
ogbn-arxiv	reference	1.259	17.4×	0.28	tw ≥ 44	NO-GO (ADR-199)
WN18RR	KG	0.508	59.9×	0.21	tw ≥ 28	NO-GO
FB15k-237	KG	—	42.3×	0.46	tw ≥ 42	NO-GO
CoDEx-L (Wikidata)	KG	1.826	5.1×	0.17	tw ≥ 28	NO-GO

Why it's a robust NO-GO

Structurally certain, not a heuristic artifact. The minor-min-width treewidth
lower bounds (28–44) are 7–11× the road control's (4) and on par with the
citation reference — a lower bound guarantees treewidth is at least that large. This
is the adversarial upgrade over ADR-199 (which used the separator upper bound only).
Three distinct failure signatures; the conjunction gate caught each. WN18RR: low
exponent but huge absolute blowup (the "judge absolute height" lesson). FB15k-237:
near-linear elim_h. CoDEx: tree-like periphery (5.1×@2k) whose hub-dense core
collapses treewidth (p=1.83) — a single-n blowup snapshot would have been a false GO;
the scaling exponent (the pre-registered primary metric) caught it.
Root cause: KGs are small-world with hubs (max degree 520/1999/4999 vs road's 9).
Hubs kill balanced separators regardless of average degree. Road-like low treewidth
needs near-planar geometric locality, which no semantic graph has.

What's here (review only the 3 `bet3` commits)

⚠️ Stacked on #535 (the BET-1/CCH branch where ruvector-seprag lives). The first
15 commits are #535's and are already under review there. Only these 3 are new:

docs(bet3): frozen pre-registration (gate fixed before any data)

feat(bet3): kg_treewidth_probe.rs — reuses the validated kernel + road control

docs(bet3): ADR-203 (the NO-GO writeup)

Will rebase onto main if/when #535 merges. No new production code; the probe is an
example, KG data stays under target/ (gitignored).

Prove-not-hype protocol honored: frozen pre-registered gate, calibrated road control,
treewidth lower bound as adversarial check, 3 honest deviations recorded (none changed a
verdict). The multi-hop QA benchmark was correctly never built — the cheap probe
killed the backbone first.

Refs #534. Writeup: docs/adr/ADR-203-curated-kg-treewidth-probe.md.

Add design ADRs and milestone plans for adapting Customizable Contraction Hierarchies (nested dissection, separators, contraction shortcuts, elimination trees, separator-tree k-NN) to RuVector's hybrid vector + knowledge-graph memory. ADRs: - ADR-196: SepRAG keystone (separator-tree retrieval; complements HNSW/DiskANN) - ADR-197: navigation-graph construction + metric-independent ND ordering - ADR-198: customizable metric layer (CCH customization <-> GNN self-learning loop) - ADR-199: public-corpus benchmark & evaluation harness Plans (docs/plans/seprag-cch-retrieval/): M0 correctness gate -> M1 blowup go/no-go on ogbn-arxiv -> M2 customization -> M3 full hybrid -> M4 integration. Maps decisions onto existing crates (ruvector-mincut/jtree, solver/bmssp, sparsifier, diskann, gnn, attn-mincut).

…aphs New crate ruvector-seprag implementing the SepRAG M0 milestone (docs/plans/seprag-cch-retrieval/M0-correctness-gate.md): - graph: undirected weighted graph + Dijkstra brute-force k-NN oracle - order: nested-dissection ordering via BFS-layer separators + separator tree - contraction: metric-free symbolic contraction -> chordal upward graph + elim tree - customize: bottom-up triangle-sweep shortcut weighting (re-runnable per metric) - query: upward search, exhaustive CCH k-NN, bucket-based branch-and-bound k-NN with admissible early-stop and search-space accounting - gen: deterministic SBM/grid/path/clique generators (SplitMix64) - examples/blowup_report: M0->M1 diagnostic (blowup ratio, elim height, pruning) Tests (8 + doctest, all green): SepRAG k-NN == Dijkstra oracle on SBM/grid/ path/clique; pruned == unpruned (pruning sound); pruning reduces search space; determinism; bounded blowup. cargo clippy clean. Finding: query pruning eliminates 95-100% of scans regardless of structure; blowup/elim-height are dominated by separator quality, and the naive BFS separator degenerates on low-diameter dense graphs (SBM 18.6x) — motivating the ruvector-mincut separator swap planned for M1 (ADR-197).

…ion subgraph examples/m1_arxiv: ingest the real ogbn-arxiv edge list (169K nodes / 1.16M edges), induce a connected BFS-ball subgraph, build the SepRAG hierarchy, and report the ADR-199 go/no-go metrics (blowup ratio, elim-tree height, build time) plus a sampled Dijkstra-oracle recall check. First-pass result (N=1500 citation-only subgraph, M0 BFS separator): - recall 50/50 vs Dijkstra oracle -> CORRECTNESS HOLDS on real data - query pruning saves ~100% of scans (364 vs 418555 bucket scans/query) - BUT blowup 56.9x and elim height ~= n -> the BFS separator degenerates completely on small-world citation graphs (picks a giant BFS layer as the separator), and the raw citation ball is dense (avg degree ~24). Verdict (ADR-199 fallback ladder): NO-GO for the naive separator + dense backbone. Trustworthy precisely because M0 proved the algorithm correct, so 57x is a separator-quality/backbone artifact, not a SepRAG refutation. Next: ruvector-mincut balanced separators + alpha-pruned sparse backbone (ADR-197).

…tion order.rs: add SeparatorKind::Balanced (grow half-size region, take only its boundary) alongside the M0 BfsLayer strategy; make Balanced the default. lib.rs: SepRag::build_with(graph, kind). m1_arxiv: add max_degree backbone sparsification + separator-kind args for A/B attribution. M1 attribution (ogbn-arxiv N=1500 citation BFS-ball): raw + layer blowup 56.9x elim_h 1443 build 56s raw + balanced blowup 23.8x elim_h 941 build 13s <- best deg<=10 + layer blowup 89.6x elim_h 1295 build 38s deg<=10 + bal blowup 60.1x elim_h 1035 build 18s (recall 50/50 and ~100% query pruning in all configs) Findings: (1) balanced separator is a real win (2.4x less fill, 4x faster); (2) hub-dampening degree-bound BACKFIRES (shrinks denominator faster than |G+|, destroys good cuts) — discard it; (3) even best config leaves 23.8x / elim_h~0.6n: the dense small-world citation ball is intrinsically high-treewidth (ADR-197 expander risk, measured). Next: feature-manifold/hyperbolic backbone, not the citation topology.

m1_arxiv: robust edge reader (skip # comments, comma/tab/space separators) so SNAP road networks load through the same harness. m1_manifold: alpha-pruned kNN backbone over real ogbn-arxiv 128-d node features (Vamana RobustPrune) — the decisive ADR-197 thesis test. Results (N=1500, balanced separator, recall 50/50 everywhere): roadNet-PA blowup 7.6x elim_h 136 (~3.5 sqrt n) <- CCH works citation (arxiv) blowup 23.8x elim_h 941 (~0.6 n) feature-manifold k10 blowup 42.4x elim_h 837 (~0.56 n) <- worse feature-manifold k6 blowup 45.1x elim_h 699 (~0.47 n) Conclusion: the road control proves the implementation is sound (planar sqrt(n) separators -> 7.6x, instant build). Both embedding-derived backbones (citation small-world AND Euclidean feature kNN) are intrinsically high-treewidth (elim_h ~ 0.5n), so CCH contraction blows up regardless of separator quality or degree. The expander risk (ADR-197) is confirmed across two independent backbones. Query pruning stays ~100% effective; the cost is preprocessing. Last untested rung: hyperbolic backbone (needs real hyperbolic embeddings).

The M1 go/no-go gate ran on real public data and returned NO-GO for CCH full contraction on embedding/citation retrieval graphs. Recorded the measured evidence in ADR-199 (Empirical Outcome section), updated ADR-196/197 status, and marked the milestone tracker (M0 done, M1 NO-GO, M2-M4 not pursued). Evidence (N=1500, recall 50/50 everywhere): roadNet-PA control blowup 7.6x elim_h ~3.5 sqrt n (CCH works) ogbn-arxiv citation blowup 23.8x elim_h ~0.6 n ogbn-arxiv feature kNN blowup 42.4x elim_h ~0.56 n Implementation is sound (road control + exact recall); embedding backbones are intrinsically high-treewidth. Query pruning works (~100%); preprocessing fill-in is the blocker. ruvector-seprag retained as a validated reference.

… on linear drift) Salvage ADR-198's customizable-metric idea, decoupled from CCH, as a rigorous pre-registered head-to-head: does a fixed ANN topology + recomputed distances absorb metric drift as well as a full rebuild? examples/reweight_vs_rebuild.rs: self-contained Vamana-lite (RobustPrune + greedy beam search); drift modelled as a vector-space transform M=A^T A; sweeps diagonal AND adversarial dense-Mahalanobis (rotational) drift; A=reuse topology, B=rebuild, C=stale control. Result (n=2000 ogbn-arxiv embeddings, recall@10, pre-registered gate): A (re-weight, 0 rebuild) within 0.2% of B (full rebuild) up to 36% relevant-set churn, under both drift modes. C (stale) loses up to 29 points -> benchmark has teeth, A's parity is genuine adaptation. WIN. Honest claim: COST win at equal quality (rebuilds become free under LINEAR drift). Boundaries (ADR-200): non-linear/region-local drift + scale untested. Next: non-linear learned metric (decisive adversarial test). Also adds docs/plans/.../FUTURE-DIRECTIONS.md (4-bet backlog + prove-not-hype protocol) and ADR-200.

…otal WIN) Extend the re-weight-vs-rebuild harness with the decisive adversarial cases and close the proof: - non-linear drift mode (residual tanh warp v + s*tanh(Wv)) — removes the 'linear only' caveat; A still matches B within 0.2% up to 35% churn. - per-query distance-eval columns — A and B match within ~1%, disproving any hidden query-cost trade. Reuse is equal recall AND equal query cost. - fix display bug (C/stale was double-divided by query count; control now correctly reads 90% at t=0, validating the negative control). - drift modelled via transform closure (diag/rot/nonlin share one code path). - clippy: idiomatic char-class split in m1_arxiv reader. ADR-200 + FUTURE-DIRECTIONS updated: WIN across diagonal/rotational/non-linear drift; only open caveats are scale (n>=1e5, decisive next), region-local drift, incremental-rebuild baseline.

Extract the Vamana-lite ANN + metric-drift helpers into a reusable lib module (src/ann.rs) with an efficient two-heap greedy beam search (replaces the O(L) linear-scan beam, ~2x faster, needed for n>=1e5). Thin reweight_vs_rebuild to use it (regression-checked: identical recall/evals at n=2000). Add scale_drift example: sweeps N (5k..100k), measures recall(reuse) vs recall(rebuild) at the adversarial rotational drift point plus the rebuild-cost curve.

… widens scale_drift sweep (5k->100k, rotational drift ~40% churn, recall@10): N A reuse B rebuild gap rebuild update ratio 5000 90.2% 90.0% +0.2% 3.6s 0.001s ~3600x 10000 89.5% 90.3% -0.8% 10.2s 0.004s ~2500x 25000 88.5% 89.2% -0.7% 21.4s 0.009s ~2400x 50000 87.7% 88.6% -0.9% 47.1s 0.043s ~1100x 100000 85.0% 86.7% -1.7% 141.8s 0.035s ~4000x Verdict: WIN within the 2% gate through 100k at ~1000-4000x lower update cost, BUT the recall gap widens with N (-0.2%->-1.7%) => defer/batch rebuilds, not never-rebuild. Honest caveats: both A&B recall fall with N (fixed beam); 100 queries => ~+-1% noise, confirm trend with more queries. Also: rename Rng::next->next_u64 (clippy). ADR-200 + FUTURE-DIRECTIONS updated with scale evidence, widening-gap caveat, and a hybrid re-weight+periodic-rebuild policy as a next step.

…, gate PASS examples/region_drift.rs: warp only a 15% local cluster, grade recall separately for queries INSIDE vs OUTSIDE the drifted region (a global average would hide a local failure). Result (n=20k, recall@10): t churnIn A_in B_in | churnOut A_out B_out 0.25 44% 89.8% 81.4% | 21% 87.9% 89.0% 0.50 53% 89.3% 90.0% | 21% 87.9% 89.0% 1.00 45% 89.5% 90.0% | 21% 87.9% 89.0% Gate PASS: reuse holds inside the drifted region (A_in within 0.7% of B_in, and ABOVE it at t=0.25) even at 53% in-region churn; out-region ~unchanged. Region- local drift did NOT break reuse. Honest caveat: the t=0.25 B_in dip to 81% (reuse beats rebuild by 8pts) is a build-variance artifact of the simplified single-pass Vamana baseline, not a smooth effect — strongest argument to port the baseline to production ruvector-diskann. ADR-200 + FUTURE-DIRECTIONS + status updated.

…onfirmed examples/diskann_drift.rs: re-run the re-weight-vs-rebuild test on the shipping ruvector-diskann VamanaGraph (added as dev-dependency). The reuse trick is native - the graph stores only topology and greedy_search takes vectors externally, so drift = search a graph-built-on-original with the transformed vectors. Result (n=20k, recall@10): GLOBAL rotational: A reuse vs B rebuild within 2% (95.6 vs 97.1 worst, t=0.5) REGION-LOCAL in-region: A_in within ~1.5% of B_in; A_in 98.6 vs B_in 94.5 at t=0.25 absolute recall 96-99% (stronger/fairer baseline than lite-Vamana ~90%) Confirms BET 1 on the production index. The t=0.25 reuse-beats-rebuild dip REPRODUCED on diskann => it is a real property (fresh Vamana build on a half-warped region underperforms reuse), not lite-Vamana noise. Baseline-variance caveat RESOLVED. Remaining caveat: gap widens with scale/churn (defer/batch rebuilds). ADR-200 + FUTURE-DIRECTIONS + status updated.

…ippable examples/hybrid_policy.rs: simulate a compounding random-walk metric-drift trajectory and compare operating policies on the production diskann Vamana — always / never / periodic-K / drift-triggered (Frobenius monitor). Result (n=10k, 24 steps, aggressive random-walk drift, recall@10): always 99.1% mean 98.4% min 24 rebuilds never 94.4% mean 89.7% min 1 rebuild <- decays under heavy drift periodic-4 98.8% mean 97.9% min 6 rebuilds <- ~always at 25% cost periodic-8 98.4% mean 96.5% min 3 rebuilds <- at 12.5% cost Shippable operating point: re-weight every step + rebuild every ~4 steps recovers near-full recall at a fraction of the cost. Honest sub-finding: the drift-TRIGGERED monitor (Frobenius of cumulative transform) underperformed simple periodic — periodic-K is the recommended knob; a sampled-recall probe trigger is future work. Under gentle single-direction drift (n=5k) never did NOT decay, so the hybrid only matters under large/compounding drift. ADR-200 status/boundaries/next-steps + FUTURE-DIRECTIONS updated; stale 'n=2000/scale-unconfirmed' caveats removed (now resolved).

BET 3 go/no-go gate, frozen before any data: probe whether a curated bounded-degree KG (WN18RR / FB15k-237 / 2WikiMultiHop, the backbone under HotpotQA/MuSiQue) is low-treewidth enough for CCH contraction/build to stay cheap — the last untested backbone for the salvaged separator-tree kernel (treewidth-independent at query, but build blew up on high-treewidth embedding/ citation graphs, ADR-199). Primary metric: elim-tree-height exponent p (elim_h ~ n^p). GO p<=0.6 AND blowup<=10x; KILL p>=0.8 OR blowup>=23x; inconclusive 0.6<p<0.8; VOID if the roadNet-PA control fails to reproduce ~sqrt(n)/7.6x. Adversarial upgrade over ADR-199: add a minor-min-width treewidth LOWER bound + run both separators so a NO-GO is structurally certain, not heuristic-limited. QA harness NOT built unless GO. Branch off PR ruvnet#535 (where ruvector-seprag lives); re-home if GO. Refs ruvnet#534

…control New example kg_treewidth_probe.rs (reuses SepRag::build_with, blowup_ratio, elim_depth, Dijkstra oracle from the validated kernel). Adds only what the frozen gate needs: generic KG/triple loader (WordNet/Freebase/Wikidata), scale sweep with OLS log-log exponent fit (elim_h ~ n^p, the primary metric), both separators, a minor-min-width treewidth LOWER bound (the adversarial upgrade over ADR-199), and an adaptive build-time budget so high-treewidth backbones reveal themselves at moderate n without runaway contraction. Run prints every KG bracketed between the two calibrated controls and applies the frozen gate automatically. Refs ruvnet#534

All three curated KGs (WN18RR/FB15k-237/CoDEx-L = WordNet/Freebase/Wikidata) fail the frozen treewidth gate, each with a distinct signature the conjunction gate caught: WN18RR blowup 59.9x (low exponent but huge absolute height), FB15k-237 elim_h~0.46n + tw>=42, CoDEx-L exponent p=1.83 (tree-like periphery, hub-dense core collapses treewidth — a false-GO the scaling metric caught). Minor-min-width LOWER bounds (28-44) are 7-11x the road control's (4) => structurally certain, not heuristic. Cause: KGs are small-world WITH hubs (max deg 520/1999/4999 vs road 9). Combined with ADR-199 this closes the CCH-contraction line. Query kernel stays exact (recall 30/30) + treewidth- independent; QA harness correctly never built. Records 3 honest deviations from the frozen pre-reg (none changed a verdict). Refs ruvnet#534

shaal added 18 commits June 4, 2026 02:11

chore(seprag): idiomatic char-array split (clippy clean)

11bef01

chore(seprag): add ruvector-seprag to Cargo.lock

44ee4db

shaal mentioned this pull request Jun 5, 2026

SepRAG: CCH-inspired retrieval exploration + customizable re-weighting for self-learning ANN #534

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SepRAG BET 3 — curated-KG treewidth probe → NO-GO (finding, ADR-203)#538

SepRAG BET 3 — curated-KG treewidth probe → NO-GO (finding, ADR-203)#538
shaal wants to merge 18 commits into
ruvnet:mainfrom
shaal:docs/seprag-bet3-kg-treewidth

shaal commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shaal commented Jun 5, 2026

SepRAG BET 3 — curated-KG treewidth probe → NO-GO (a finding, not a feature)

TL;DR

Why it's a robust NO-GO

What's here (review only the 3 bet3 commits)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

What's here (review only the 3 `bet3` commits)