SepRAG BET 1 (finding): incremental reindex WINS the high-recall tier vs reuse + periodic rebuild#539
Open
shaal wants to merge 15 commits into
Open
SepRAG BET 1 (finding): incremental reindex WINS the high-recall tier vs reuse + periodic rebuild#539shaal wants to merge 15 commits into
shaal wants to merge 15 commits into
Conversation
Productionize BET 1 (ADR-200 WIN under synthetic drift) by wiring re-weight + periodic-rebuild into the ruvector-diskann loop behind a feature flag, validated on a REAL contrastive-link-prediction embedding trajectory on ogbn-arxiv (ADR-200 next-step ruvnet#4). Gate frozen before any contender run (prove-not-hype): WIN = ReweightOnly within 2% recall@10 of AlwaysRebuild + Periodic{k} within 1% at <=50% cumulative rebuild cost; KILL = no transfer from synthetic to real drift. Minimum-drift precondition (>=15% top-10 churn) guards against a vacuous pass. Self-contained off main; independent of PR ruvnet#535. Outcome -> ADR-202. Linked: ruvnet#534
DriftingIndex wraps a VamanaGraph and owns only the rebuild decision
(RebuildPolicy: AlwaysRebuild / ReweightOnly / Periodic{k}); the consumer
owns the drifting vectors and passes snapshots to on_metric_update + search.
Native reuse hook: greedy_search takes vectors externally, so adapt-to-drift
recomputes only distances. Feature-gated (reuse-under-drift, default off) —
default build byte-identical. 5 unit tests green (cadence + search).
Refs ruvnet#534
examples/diskann_real_trajectory.rs: generates a REAL learned-GNN metric trajectory via contrastive link-prediction (InfoNCE over ogbn-arxiv citations, ruvector-gnn Optimizer + info_nce_loss, embeddings on the unit sphere so cosine==dot and L2 ranking agrees), then drives the diskann reuse policy (DriftingIndex) through all four contenders step-by-step. Result (n=20k, gradual trajectory to 67% churn): - WIN. Reuse holds within 2% recall@10 of full rebuild up to 40% top-10 churn (>= ADR-200's synthetic ~36% regime) -- transfer confirmed on real learned drift. Stale control collapses 92%->33% (teeth). - Periodic recovers the high-churn tail: P k=4 = 98.7% (gap -0.01%) at 24% of rebuild cost, evals 1.00x B. ADR-200 hybrid reproduced on real drift. - Honest caveat: pure reuse past the ceiling decays (-4.73% over the whole overdriven trajectory, 1.05x evals); the shippable periodic policy does not. Refs ruvnet#534
…ectory Outcome ADR for BET 1 productionization (closes ADR-200 next-step ruvnet#4). Fixed-topology reuse + periodic rebuild, validated on a real contrastive- link-prediction trajectory over ogbn-arxiv (not synthetic A(t)). WIN at n=20k AND n=50k: pure reuse holds within 2% recall@10 of full rebuild up to a 40% top-10 churn ceiling (identical at both scales, >= ADR-200's synthetic ~36%); Periodic{k:4} recovers the high-churn tail to within 0.01% (20k) / above rebuild (50k) at 20-24% of rebuild cost, equal per-query work. Stale control collapses (teeth). Honest caveat: pure reuse past the ceiling decays -- the shippable policy is periodic, not never. Refs ruvnet#534
…plumbing Pre-register (frozen before any run) the ADR-200 next-step #2 bet: does a sampled-recall rebuild trigger beat fixed Periodic{k} under VARIABLE-RATE drift, and beat the Frobenius monitor ADR-200 found wanting? Honest test = the (rebuilds, recall) Pareto frontier; WIN = trigger >=25% fewer rebuilds at matched recall with probe cost counted; KILL = no frontier dominance. Plumbing (allowed pre-freeze): DriftingIndex::force_rebuild + harness. Refs ruvnet#534
…t run was VOID)
The first variable-rate run was VOID (0% churn): plain SGD at lr 0.002-0.03
on unit-normalized embeddings doesn't move them. Switched to Adam (real
motion in bursts), n=20k for edge density, and ENFORCED the >=15% churn
precondition (abort before rendering a verdict) so a no-drift trajectory
can't masquerade as a result. Gate criteria unchanged.
Result (n=20k, bursty trajectory, per-step Δchurn ~45 burst / ~2 calm,
89% end churn): WIN. Recall{floor=0.95} = 97.2% @ 7 rebuilds beats
Periodic{k=2} (96.8% @ 12) on BOTH axes; probe cost ~1s vs ~73s rebuild
time saved (trap passed); beats best Frobenius (97.3% @ 9) on rebuilds.
Refs ruvnet#534
The sampled-recall trigger WON (ADR-200 next-step #2): under bursty drift it uses ~42% fewer rebuilds than fixed Periodic{k} at matched recall, beats the Frobenius monitor ADR-200 found wanting, and passes the probe-cost trap (~1s probe vs ~73s rebuild saved). Productionized as RecallTrigger in ruvector_diskann::reuse (DriftingIndex in ReweightOnly mode + a probe-driven force_rebuild); its knob 'floor' IS the recall SLA, unlike k/tau. 8 reuse tests (incl. holds-under-no-drift + fires-then-recovers). ADR-202 addendum records the result; pre-registration carries the WIN outcome pointer. Refs ruvnet#534
…ctory Frozen-before-run generality check of ADR-202's 40% holding ceiling: does it generalize beyond contrastive link-prediction to a DIFFERENT learned objective? Adds a node-classification trajectory (real arxiv 40-class labels, CE on a linear head, embeddings as params) selectable via an 'objective=nodeclass' arg to the existing harness — same contenders + 2% gate, only the objective changes. CONFIRM = holding ceiling >=30% churn + periodic recovers; CAVEAT = <20% or materially different (reportable). Refs ruvnet#534
…y caveat Node-classification trajectory (2nd objective) holds reuse within 2% of rebuild up to a 54% churn ceiling (>= link-pred's 40%) -> the ADR-202 holding-ceiling result GENERALIZES across two learned objectives; the objective-dependence caveat is resolved. Honest finding (reported, not buried): past ~60% churn node-class CE collapses embeddings into ~40 class blobs where recall@10 is ill-posed (intra-blob near-ties) and the FULL-REBUILD baseline itself destabilizes (B swings 55-96%). The trajectory-wide 'reuse > rebuild +4.3%' is a benchmark-degeneracy artifact (ADR-200's t=0.25 dip amplified), NOT a genuine superiority claim. Operational conclusion unaffected (reuse+periodic never worse). ADR-202 addendum + next-step ruvnet#5 (collapse-aware metric). Refs ruvnet#534
…ing middle vs reuse/rebuild
Adversarial check on BET 1 (ADR-200/202): does cheap incremental graph repair of
the displaced subset beat BOTH topology-reuse AND full rebuild under metric drift?
Cheap pre-check recorded: ruvector-diskann has NO faithful incremental update
(insert=append+full-rebuild-flag; delete=tombstone, no graph repair). Baseline
must be built. Scoped as in-memory out-edge-recompute + back-edge-refresh of the
top-f displaced nodes (no delete-consolidation — membership is fixed under drift).
Frozen gate: WIN = incremental beats pure-reuse >2pts recall AND <=0.5x rebuild
cost AND within 2pts of rebuild in some churn band; adversarial check vs Periodic{k}
(the real BET 1 incumbent) reported regardless. NO-GO/PARTIAL are acceptable.
…+ harness contender
The BET-1 missing middle: repair only the DISPLACED subset of the Vamana graph
under metric drift, between ReweightOnly (repair nothing) and AlwaysRebuild
(repair everything).
ruvector-diskann (feature reuse-under-drift):
- graph.rs: expose robust_prune as pub(crate) (visibility only, no logic change)
- reuse.rs: IncrementalIndex — for each displaced node, recompute out-edges
(greedy_search -> robust_prune at new position) + refresh back-edges; top-f
by displacement-since-last-reindex is the cost/recall knob. No delete-
consolidation (membership is fixed under drift; nothing is removed). 3 tests.
- lib.rs: export under feature.
harness (diskann_real_trajectory.rs): incremental contender measured on the SAME
trajectory/queries/truth as A/B/P/C; reports the full (recall,cost) Pareto frontier
+ adversarial domination vs Periodic{k}. Frozen thresholds unchanged from the
pre-registration; f* selection corrected to 'best knob' (was 'first qualifying')
to match the frozen wording.
Gate frozen at b388c42 before any contender run.
…scale-qualified)
Adversarial check on BET 1 (ADR-200/202): does cheap incremental graph repair beat
BOTH topology-reuse AND full rebuild under metric drift? YES, at the high-recall tier.
Reproduced at n=20k, n=50k, and on a gradual trajectory: inc-50% matches full-rebuild
recall@10 within ~0.2pts at ~42% of rebuild cost AND Pareto-dominates Periodic{k=2}
(the strongest BET 1 incumbent). Targeted repair of the displaced subset beats lumped
periodic rebuilds at equal cost because it never lets recall sawtooth-decay.
Honest narrowings (all measured, in the ADR):
- Scale-sensitive: frontier SWEEP only at n=20k/93% churn; at n=50k & moderate churn
the cheap periodic tiers (k=4,k=8) reclaim Pareto-optimality. Incremental EXTENDS the
high-recall end, not a replacement for periodic.
- Regime-concentrated: advantage emerges above ~40% churn; below that all policies cluster.
- Degeneracy: inc>B at >90% churn is fresh-build-on-collapsed-geometry (inc==B at n=50k).
- f=5% fails the per-query-eval bar at n=50k; clean win regime is f in [0.2,0.5].
Frozen gate (b388c42) passed; outcome stamped on the pre-registration.
Open
5 tasks
…(no build) Scoped next-step #1 (wire the reuse policy into the live ruvector-gnn embedding-flush path) before committing any integration code. Finding: the production embedding->index seam does not exist on either end — gnn produces embeddings but has no serving module and only a dev-dep on diskann; the NAPI serving surface is a static-index API with reuse-under-drift off; mcp-brain-server has a monitor-only DriftMonitor and no diskann dep. The only place a drifted embedding meets a diskann index is examples/. Building the loop now would mean inventing the producer. Recorded the minimal seam (feature-gated DriftingDiskAnn NAPI binding) instead. Honors prove-not-hype: 'the path isn't there yet, here's the seam.'
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addendum carried in this PR — BET 1 live serving hook SCOPED (commit
14bafab0)One extra commit (
14bafab0) appends an ADR-202 addendum (a #537 file, visible here becausethis branch is stacked on #537). It records the scoping of ADR-202 next-step #1 (wire the reuse
policy into the live
ruvector-gnnembedding-flush path): the production embedding→index seamdoes not exist on either end — gnn has no serving module and only a dev-dep on diskann; the NAPI
serving surface is a static-index API with
reuse-under-driftoff;mcp-brain-serverismonitor-only with no diskann dep. Building the loop now would mean inventing the producer, so the
minimal seam (a feature-gated
DriftingDiskAnnNAPI binding) is recorded, not built. BET 1 ismechanism-complete, consumer-blocked. See #534.
SepRAG BET 1 — adversarial check: incremental reindex vs reuse vs full rebuild
This is a research finding (a WIN that narrows BET 1), not a feature request — no merge
urgency. It is stacked on #537 (the BET 1 reuse-under-drift PR, where
reuse.rs+ thereal-trajectory harness live). The incremental-specific commits are the last three:
b388c427(frozen pre-registration),05ba882c(IncrementalIndex+ harness contender),5e029aba(ADR-204). Linked to #534.The question
ADR-200/202 compared only two index-maintenance strategies under metric drift: reuse
everything (
ReweightOnly, free, decays) vs rebuild everything (AlwaysRebuild, full cost),interleaved by
Periodic{k}. Missing middle: repair only the part of the graph that wentstale. Does a cheap incremental update beat both topology-reuse and full rebuild?
Cheap pre-check first (per protocol)
ruvector-diskannhas no faithful incremental update:insertappends + flags a fullrebuild;
deleteis a tombstone with no graph repair. So the baseline was built faithfully —not a silent FreshDiskANN. Under drift membership is fixed (nothing is deleted), so the operation
is out-edge recompute + back-edge refresh of the displaced subset — no delete-consolidation,
no reverse index. Only always-compiled change:
robust_prune→pub(crate)(visibility).Result — WIN (scale-qualified, regime-concentrated)
Pre-registered + frozen before any run (
PRE-REGISTRATION-incremental.md). Reproduced atn=20k, n=50k, and a gradual trajectory:
(recall@10, cost)frontier, n=20k overdriven (<-= Pareto-optimal):<-<-Mechanism:
Periodic{k}rebuilds all n nodes (most didn't move) and sawtooth-decaysbetween rebuilds; incremental repairs only displaced nodes every step, so recall never decays.
Targeted repair beats lumped blind rebuilds at equal cost — in exactly the decay-tail regime
ADR-202 had assigned to periodic.
Honest narrowings (all measured — see ADR-204)
n=20k/93% churn. At n=50k and moderate churn,
P k=4/P k=8reclaim Pareto-optimality —incremental extends the high-recall end, it does not replace periodic.
inc > Bat >90% churn (n=20k) is the fresh-build-on-collapsed-geometryeffect (ADR-200/202); at n=50k
inc ≈ Bexactly → conservative claim is "matches rebuild."f=5%failed the ≤1.10× per-query-eval guard at n=50k; clean win regime isf ∈ [0.2, 0.5]. Recall margins vs periodic carry per-run build-noise; the cost andfrontier-shape advantages are the robust signals.
Tests / build
IncrementalIndexbehind the existingreuse-under-driftfeature (default off → shipping buildbyte-identical), 3 new unit tests green, clippy clean. Harness measures the incremental contender
on the same trajectory/queries/truth as A/B/P/C.
🤖 Generated with claude-flow