SepRAG BET 1 (finding): incremental reindex WINS the high-recall tier vs reuse + periodic rebuild by shaal · Pull Request #539 · ruvnet/RuVector

shaal · 2026-06-05T03:30:43Z

Addendum carried in this PR — BET 1 live serving hook SCOPED (commit `14bafab0`)

One extra commit (14bafab0) appends an ADR-202 addendum (a #537 file, visible here because
this branch is stacked on #537). It records the scoping of ADR-202 next-step #1 (wire the reuse
policy into the live ruvector-gnn embedding-flush path): the production embedding→index seam
does not exist on either end — gnn has no serving module and only a dev-dep on diskann; the NAPI
serving surface is a static-index API with reuse-under-drift off; mcp-brain-server is
monitor-only with no diskann dep. Building the loop now would mean inventing the producer, so the
minimal seam (a feature-gated DriftingDiskAnn NAPI binding) is recorded, not built. BET 1 is
mechanism-complete, consumer-blocked. See #534.

SepRAG BET 1 — adversarial check: incremental reindex vs reuse vs full rebuild

This is a research finding (a WIN that narrows BET 1), not a feature request — no merge
urgency. It is stacked on #537 (the BET 1 reuse-under-drift PR, where reuse.rs + the
real-trajectory harness live). The incremental-specific commits are the last three:
b388c427 (frozen pre-registration), 05ba882c (IncrementalIndex + harness contender),
5e029aba (ADR-204). Linked to #534.

The question

ADR-200/202 compared only two index-maintenance strategies under metric drift: reuse
everything (ReweightOnly, free, decays) vs rebuild everything (AlwaysRebuild, full cost),
interleaved by Periodic{k}. Missing middle: repair only the part of the graph that went
stale. Does a cheap incremental update beat both topology-reuse and full rebuild?

Cheap pre-check first (per protocol)

ruvector-diskann has no faithful incremental update: insert appends + flags a full
rebuild; delete is a tombstone with no graph repair. So the baseline was built faithfully —
not a silent FreshDiskANN. Under drift membership is fixed (nothing is deleted), so the operation
is out-edge recompute + back-edge refresh of the displaced subset — no delete-consolidation,
no reverse index. Only always-compiled change: robust_prune → pub(crate) (visibility).

Result — WIN (scale-qualified, regime-concentrated)

Pre-registered + frozen before any run (PRE-REGISTRATION-incremental.md). Reproduced at
n=20k, n=50k, and a gradual trajectory:

inc-50% matches full-rebuild recall@10 within ~0.2 pts at ~42% of rebuild cost AND
Pareto-dominates Periodic{k=2} — the strongest BET 1 incumbent.

(recall@10, cost) frontier, n=20k overdriven (<- = Pareto-optimal):

policy	recall	cost
inc 20%	95.7%	34s `<-`
P k=4	95.0%	54s (dominated)
inc 50%	98.1%	88s `<-`
P k=2	95.9%	105s (dominated)
B rebuild	96.3%	208s (dominated)

Mechanism: Periodic{k} rebuilds all n nodes (most didn't move) and sawtooth-decays
between rebuilds; incremental repairs only displaced nodes every step, so recall never decays.
Targeted repair beats lumped blind rebuilds at equal cost — in exactly the decay-tail regime
ADR-202 had assigned to periodic.

Honest narrowings (all measured — see ADR-204)

Scale-sensitive. The frontier sweep (incremental dominating every periodic) held only at
n=20k/93% churn. At n=50k and moderate churn, P k=4/P k=8 reclaim Pareto-optimality —
incremental extends the high-recall end, it does not replace periodic.
Regime-concentrated. The win lives above ~40% churn; below that all policies cluster.
Degeneracy caveat. inc > B at >90% churn (n=20k) is the fresh-build-on-collapsed-geometry
effect (ADR-200/202); at n=50k inc ≈ B exactly → conservative claim is "matches rebuild."
Eval bar. f=5% failed the ≤1.10× per-query-eval guard at n=50k; clean win regime is
f ∈ [0.2, 0.5]. Recall margins vs periodic carry per-run build-noise; the cost and
frontier-shape advantages are the robust signals.

Tests / build

IncrementalIndex behind the existing reuse-under-drift feature (default off → shipping build
byte-identical), 3 new unit tests green, clippy clean. Harness measures the incremental contender
on the same trajectory/queries/truth as A/B/P/C.

🤖 Generated with claude-flow

Productionize BET 1 (ADR-200 WIN under synthetic drift) by wiring re-weight + periodic-rebuild into the ruvector-diskann loop behind a feature flag, validated on a REAL contrastive-link-prediction embedding trajectory on ogbn-arxiv (ADR-200 next-step ruvnet#4). Gate frozen before any contender run (prove-not-hype): WIN = ReweightOnly within 2% recall@10 of AlwaysRebuild + Periodic{k} within 1% at <=50% cumulative rebuild cost; KILL = no transfer from synthetic to real drift. Minimum-drift precondition (>=15% top-10 churn) guards against a vacuous pass. Self-contained off main; independent of PR ruvnet#535. Outcome -> ADR-202. Linked: ruvnet#534

DriftingIndex wraps a VamanaGraph and owns only the rebuild decision (RebuildPolicy: AlwaysRebuild / ReweightOnly / Periodic{k}); the consumer owns the drifting vectors and passes snapshots to on_metric_update + search. Native reuse hook: greedy_search takes vectors externally, so adapt-to-drift recomputes only distances. Feature-gated (reuse-under-drift, default off) — default build byte-identical. 5 unit tests green (cadence + search). Refs ruvnet#534

examples/diskann_real_trajectory.rs: generates a REAL learned-GNN metric trajectory via contrastive link-prediction (InfoNCE over ogbn-arxiv citations, ruvector-gnn Optimizer + info_nce_loss, embeddings on the unit sphere so cosine==dot and L2 ranking agrees), then drives the diskann reuse policy (DriftingIndex) through all four contenders step-by-step. Result (n=20k, gradual trajectory to 67% churn): - WIN. Reuse holds within 2% recall@10 of full rebuild up to 40% top-10 churn (>= ADR-200's synthetic ~36% regime) -- transfer confirmed on real learned drift. Stale control collapses 92%->33% (teeth). - Periodic recovers the high-churn tail: P k=4 = 98.7% (gap -0.01%) at 24% of rebuild cost, evals 1.00x B. ADR-200 hybrid reproduced on real drift. - Honest caveat: pure reuse past the ceiling decays (-4.73% over the whole overdriven trajectory, 1.05x evals); the shippable periodic policy does not. Refs ruvnet#534

…ectory Outcome ADR for BET 1 productionization (closes ADR-200 next-step ruvnet#4). Fixed-topology reuse + periodic rebuild, validated on a real contrastive- link-prediction trajectory over ogbn-arxiv (not synthetic A(t)). WIN at n=20k AND n=50k: pure reuse holds within 2% recall@10 of full rebuild up to a 40% top-10 churn ceiling (identical at both scales, >= ADR-200's synthetic ~36%); Periodic{k:4} recovers the high-churn tail to within 0.01% (20k) / above rebuild (50k) at 20-24% of rebuild cost, equal per-query work. Stale control collapses (teeth). Honest caveat: pure reuse past the ceiling decays -- the shippable policy is periodic, not never. Refs ruvnet#534

…plumbing Pre-register (frozen before any run) the ADR-200 next-step #2 bet: does a sampled-recall rebuild trigger beat fixed Periodic{k} under VARIABLE-RATE drift, and beat the Frobenius monitor ADR-200 found wanting? Honest test = the (rebuilds, recall) Pareto frontier; WIN = trigger >=25% fewer rebuilds at matched recall with probe cost counted; KILL = no frontier dominance. Plumbing (allowed pre-freeze): DriftingIndex::force_rebuild + harness. Refs ruvnet#534

…t run was VOID) The first variable-rate run was VOID (0% churn): plain SGD at lr 0.002-0.03 on unit-normalized embeddings doesn't move them. Switched to Adam (real motion in bursts), n=20k for edge density, and ENFORCED the >=15% churn precondition (abort before rendering a verdict) so a no-drift trajectory can't masquerade as a result. Gate criteria unchanged. Result (n=20k, bursty trajectory, per-step Δchurn ~45 burst / ~2 calm, 89% end churn): WIN. Recall{floor=0.95} = 97.2% @ 7 rebuilds beats Periodic{k=2} (96.8% @ 12) on BOTH axes; probe cost ~1s vs ~73s rebuild time saved (trap passed); beats best Frobenius (97.3% @ 9) on rebuilds. Refs ruvnet#534

The sampled-recall trigger WON (ADR-200 next-step #2): under bursty drift it uses ~42% fewer rebuilds than fixed Periodic{k} at matched recall, beats the Frobenius monitor ADR-200 found wanting, and passes the probe-cost trap (~1s probe vs ~73s rebuild saved). Productionized as RecallTrigger in ruvector_diskann::reuse (DriftingIndex in ReweightOnly mode + a probe-driven force_rebuild); its knob 'floor' IS the recall SLA, unlike k/tau. 8 reuse tests (incl. holds-under-no-drift + fires-then-recovers). ADR-202 addendum records the result; pre-registration carries the WIN outcome pointer. Refs ruvnet#534

…ctory Frozen-before-run generality check of ADR-202's 40% holding ceiling: does it generalize beyond contrastive link-prediction to a DIFFERENT learned objective? Adds a node-classification trajectory (real arxiv 40-class labels, CE on a linear head, embeddings as params) selectable via an 'objective=nodeclass' arg to the existing harness — same contenders + 2% gate, only the objective changes. CONFIRM = holding ceiling >=30% churn + periodic recovers; CAVEAT = <20% or materially different (reportable). Refs ruvnet#534

…y caveat Node-classification trajectory (2nd objective) holds reuse within 2% of rebuild up to a 54% churn ceiling (>= link-pred's 40%) -> the ADR-202 holding-ceiling result GENERALIZES across two learned objectives; the objective-dependence caveat is resolved. Honest finding (reported, not buried): past ~60% churn node-class CE collapses embeddings into ~40 class blobs where recall@10 is ill-posed (intra-blob near-ties) and the FULL-REBUILD baseline itself destabilizes (B swings 55-96%). The trajectory-wide 'reuse > rebuild +4.3%' is a benchmark-degeneracy artifact (ADR-200's t=0.25 dip amplified), NOT a genuine superiority claim. Operational conclusion unaffected (reuse+periodic never worse). ADR-202 addendum + next-step ruvnet#5 (collapse-aware metric). Refs ruvnet#534

…ing middle vs reuse/rebuild Adversarial check on BET 1 (ADR-200/202): does cheap incremental graph repair of the displaced subset beat BOTH topology-reuse AND full rebuild under metric drift? Cheap pre-check recorded: ruvector-diskann has NO faithful incremental update (insert=append+full-rebuild-flag; delete=tombstone, no graph repair). Baseline must be built. Scoped as in-memory out-edge-recompute + back-edge-refresh of the top-f displaced nodes (no delete-consolidation — membership is fixed under drift). Frozen gate: WIN = incremental beats pure-reuse >2pts recall AND <=0.5x rebuild cost AND within 2pts of rebuild in some churn band; adversarial check vs Periodic{k} (the real BET 1 incumbent) reported regardless. NO-GO/PARTIAL are acceptable.

…+ harness contender The BET-1 missing middle: repair only the DISPLACED subset of the Vamana graph under metric drift, between ReweightOnly (repair nothing) and AlwaysRebuild (repair everything). ruvector-diskann (feature reuse-under-drift): - graph.rs: expose robust_prune as pub(crate) (visibility only, no logic change) - reuse.rs: IncrementalIndex — for each displaced node, recompute out-edges (greedy_search -> robust_prune at new position) + refresh back-edges; top-f by displacement-since-last-reindex is the cost/recall knob. No delete- consolidation (membership is fixed under drift; nothing is removed). 3 tests. - lib.rs: export under feature. harness (diskann_real_trajectory.rs): incremental contender measured on the SAME trajectory/queries/truth as A/B/P/C; reports the full (recall,cost) Pareto frontier + adversarial domination vs Periodic{k}. Frozen thresholds unchanged from the pre-registration; f* selection corrected to 'best knob' (was 'first qualifying') to match the frozen wording. Gate frozen at b388c42 before any contender run.

…scale-qualified) Adversarial check on BET 1 (ADR-200/202): does cheap incremental graph repair beat BOTH topology-reuse AND full rebuild under metric drift? YES, at the high-recall tier. Reproduced at n=20k, n=50k, and on a gradual trajectory: inc-50% matches full-rebuild recall@10 within ~0.2pts at ~42% of rebuild cost AND Pareto-dominates Periodic{k=2} (the strongest BET 1 incumbent). Targeted repair of the displaced subset beats lumped periodic rebuilds at equal cost because it never lets recall sawtooth-decay. Honest narrowings (all measured, in the ADR): - Scale-sensitive: frontier SWEEP only at n=20k/93% churn; at n=50k & moderate churn the cheap periodic tiers (k=4,k=8) reclaim Pareto-optimality. Incremental EXTENDS the high-recall end, not a replacement for periodic. - Regime-concentrated: advantage emerges above ~40% churn; below that all policies cluster. - Degeneracy: inc>B at >90% churn is fresh-build-on-collapsed-geometry (inc==B at n=50k). - f=5% fails the per-query-eval bar at n=50k; clean win regime is f in [0.2,0.5]. Frozen gate (b388c42) passed; outcome stamped on the pre-registration.

…(no build) Scoped next-step #1 (wire the reuse policy into the live ruvector-gnn embedding-flush path) before committing any integration code. Finding: the production embedding->index seam does not exist on either end — gnn produces embeddings but has no serving module and only a dev-dep on diskann; the NAPI serving surface is a static-index API with reuse-under-drift off; mcp-brain-server has a monitor-only DriftMonitor and no diskann dep. The only place a drifted embedding meets a diskann index is examples/. Building the loop now would mean inventing the producer. Recorded the minimal seam (feature-gated DriftingDiskAnn NAPI binding) instead. Honors prove-not-hype: 'the path isn't there yet, here's the seam.'

shaal added 14 commits June 4, 2026 17:20

style(bet1): rustfmt the reuse module + trajectory harness

f18742c

docs(bet1): record WIN outcome pointer to ADR-202 in pre-registration

2bb2349

shaal mentioned this pull request Jun 5, 2026

SepRAG: CCH-inspired retrieval exploration + customizable re-weighting for self-learning ANN #534

Open

5 tasks

shaal mentioned this pull request Jun 5, 2026

SepRAG BET 4 (finding): IVF cluster-pruning is structurally redundant with tuned nprobe — NO-GO #540

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SepRAG BET 1 (finding): incremental reindex WINS the high-recall tier vs reuse + periodic rebuild#539

SepRAG BET 1 (finding): incremental reindex WINS the high-recall tier vs reuse + periodic rebuild#539
shaal wants to merge 15 commits into
ruvnet:mainfrom
shaal:feat/seprag-bet1-incremental-baseline

shaal commented Jun 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shaal commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Addendum carried in this PR — BET 1 live serving hook SCOPED (commit 14bafab0)

SepRAG BET 1 — adversarial check: incremental reindex vs reuse vs full rebuild

The question

Cheap pre-check first (per protocol)

Result — WIN (scale-qualified, regime-concentrated)

Honest narrowings (all measured — see ADR-204)

Tests / build

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shaal commented Jun 5, 2026 •

edited

Loading

Addendum carried in this PR — BET 1 live serving hook SCOPED (commit `14bafab0`)