Skip to content

SepRAG BET 1 (finding): incremental reindex WINS the high-recall tier vs reuse + periodic rebuild#539

Open
shaal wants to merge 15 commits into
ruvnet:mainfrom
shaal:feat/seprag-bet1-incremental-baseline
Open

SepRAG BET 1 (finding): incremental reindex WINS the high-recall tier vs reuse + periodic rebuild#539
shaal wants to merge 15 commits into
ruvnet:mainfrom
shaal:feat/seprag-bet1-incremental-baseline

Conversation

@shaal
Copy link
Copy Markdown
Contributor

@shaal shaal commented Jun 5, 2026

Addendum carried in this PR — BET 1 live serving hook SCOPED (commit 14bafab0)

One extra commit (14bafab0) appends an ADR-202 addendum (a #537 file, visible here because
this branch is stacked on #537). It records the scoping of ADR-202 next-step #1 (wire the reuse
policy into the live ruvector-gnn embedding-flush path): the production embedding→index seam
does not exist on either end
— gnn has no serving module and only a dev-dep on diskann; the NAPI
serving surface is a static-index API with reuse-under-drift off; mcp-brain-server is
monitor-only with no diskann dep. Building the loop now would mean inventing the producer, so the
minimal seam (a feature-gated DriftingDiskAnn NAPI binding) is recorded, not built. BET 1 is
mechanism-complete, consumer-blocked. See #534.


SepRAG BET 1 — adversarial check: incremental reindex vs reuse vs full rebuild

This is a research finding (a WIN that narrows BET 1), not a feature request — no merge
urgency.
It is stacked on #537 (the BET 1 reuse-under-drift PR, where reuse.rs + the
real-trajectory harness live). The incremental-specific commits are the last three:
b388c427 (frozen pre-registration), 05ba882c (IncrementalIndex + harness contender),
5e029aba (ADR-204). Linked to #534.

The question

ADR-200/202 compared only two index-maintenance strategies under metric drift: reuse
everything (ReweightOnly, free, decays) vs rebuild everything (AlwaysRebuild, full cost),
interleaved by Periodic{k}. Missing middle: repair only the part of the graph that went
stale. Does a cheap incremental update beat both topology-reuse and full rebuild?

Cheap pre-check first (per protocol)

ruvector-diskann has no faithful incremental update: insert appends + flags a full
rebuild; delete is a tombstone with no graph repair. So the baseline was built faithfully —
not a silent FreshDiskANN. Under drift membership is fixed (nothing is deleted), so the operation
is out-edge recompute + back-edge refresh of the displaced subset — no delete-consolidation,
no reverse index. Only always-compiled change: robust_prunepub(crate) (visibility).

Result — WIN (scale-qualified, regime-concentrated)

Pre-registered + frozen before any run (PRE-REGISTRATION-incremental.md). Reproduced at
n=20k, n=50k, and a gradual trajectory:

inc-50% matches full-rebuild recall@10 within ~0.2 pts at ~42% of rebuild cost AND
Pareto-dominates Periodic{k=2}
— the strongest BET 1 incumbent.

(recall@10, cost) frontier, n=20k overdriven (<- = Pareto-optimal):

policy recall cost
inc 20% 95.7% 34s <-
P k=4 95.0% 54s (dominated)
inc 50% 98.1% 88s <-
P k=2 95.9% 105s (dominated)
B rebuild 96.3% 208s (dominated)

Mechanism: Periodic{k} rebuilds all n nodes (most didn't move) and sawtooth-decays
between rebuilds; incremental repairs only displaced nodes every step, so recall never decays.
Targeted repair beats lumped blind rebuilds at equal cost — in exactly the decay-tail regime
ADR-202 had assigned to periodic.

Honest narrowings (all measured — see ADR-204)

  • Scale-sensitive. The frontier sweep (incremental dominating every periodic) held only at
    n=20k/93% churn. At n=50k and moderate churn, P k=4/P k=8 reclaim Pareto-optimality —
    incremental extends the high-recall end, it does not replace periodic.
  • Regime-concentrated. The win lives above ~40% churn; below that all policies cluster.
  • Degeneracy caveat. inc > B at >90% churn (n=20k) is the fresh-build-on-collapsed-geometry
    effect (ADR-200/202); at n=50k inc ≈ B exactly → conservative claim is "matches rebuild."
  • Eval bar. f=5% failed the ≤1.10× per-query-eval guard at n=50k; clean win regime is
    f ∈ [0.2, 0.5]. Recall margins vs periodic carry per-run build-noise; the cost and
    frontier-shape advantages are the robust signals.

Tests / build

IncrementalIndex behind the existing reuse-under-drift feature (default off → shipping build
byte-identical), 3 new unit tests green, clippy clean. Harness measures the incremental contender
on the same trajectory/queries/truth as A/B/P/C.

🤖 Generated with claude-flow

shaal added 14 commits June 4, 2026 17:20
Productionize BET 1 (ADR-200 WIN under synthetic drift) by wiring
re-weight + periodic-rebuild into the ruvector-diskann loop behind a
feature flag, validated on a REAL contrastive-link-prediction embedding
trajectory on ogbn-arxiv (ADR-200 next-step ruvnet#4).

Gate frozen before any contender run (prove-not-hype): WIN = ReweightOnly
within 2% recall@10 of AlwaysRebuild + Periodic{k} within 1% at <=50%
cumulative rebuild cost; KILL = no transfer from synthetic to real drift.
Minimum-drift precondition (>=15% top-10 churn) guards against a vacuous
pass. Self-contained off main; independent of PR ruvnet#535. Outcome -> ADR-202.

Linked: ruvnet#534
DriftingIndex wraps a VamanaGraph and owns only the rebuild decision
(RebuildPolicy: AlwaysRebuild / ReweightOnly / Periodic{k}); the consumer
owns the drifting vectors and passes snapshots to on_metric_update + search.
Native reuse hook: greedy_search takes vectors externally, so adapt-to-drift
recomputes only distances. Feature-gated (reuse-under-drift, default off) —
default build byte-identical. 5 unit tests green (cadence + search).

Refs ruvnet#534
examples/diskann_real_trajectory.rs: generates a REAL learned-GNN metric
trajectory via contrastive link-prediction (InfoNCE over ogbn-arxiv
citations, ruvector-gnn Optimizer + info_nce_loss, embeddings on the unit
sphere so cosine==dot and L2 ranking agrees), then drives the diskann
reuse policy (DriftingIndex) through all four contenders step-by-step.

Result (n=20k, gradual trajectory to 67% churn):
- WIN. Reuse holds within 2% recall@10 of full rebuild up to 40% top-10
  churn (>= ADR-200's synthetic ~36% regime) -- transfer confirmed on real
  learned drift. Stale control collapses 92%->33% (teeth).
- Periodic recovers the high-churn tail: P k=4 = 98.7% (gap -0.01%) at 24%
  of rebuild cost, evals 1.00x B. ADR-200 hybrid reproduced on real drift.
- Honest caveat: pure reuse past the ceiling decays (-4.73% over the whole
  overdriven trajectory, 1.05x evals); the shippable periodic policy does not.

Refs ruvnet#534
…ectory

Outcome ADR for BET 1 productionization (closes ADR-200 next-step ruvnet#4).
Fixed-topology reuse + periodic rebuild, validated on a real contrastive-
link-prediction trajectory over ogbn-arxiv (not synthetic A(t)).

WIN at n=20k AND n=50k: pure reuse holds within 2% recall@10 of full
rebuild up to a 40% top-10 churn ceiling (identical at both scales, >=
ADR-200's synthetic ~36%); Periodic{k:4} recovers the high-churn tail to
within 0.01% (20k) / above rebuild (50k) at 20-24% of rebuild cost, equal
per-query work. Stale control collapses (teeth). Honest caveat: pure reuse
past the ceiling decays -- the shippable policy is periodic, not never.

Refs ruvnet#534
…plumbing

Pre-register (frozen before any run) the ADR-200 next-step #2 bet: does a
sampled-recall rebuild trigger beat fixed Periodic{k} under VARIABLE-RATE
drift, and beat the Frobenius monitor ADR-200 found wanting? Honest test =
the (rebuilds, recall) Pareto frontier; WIN = trigger >=25% fewer rebuilds
at matched recall with probe cost counted; KILL = no frontier dominance.

Plumbing (allowed pre-freeze): DriftingIndex::force_rebuild + harness.

Refs ruvnet#534
…t run was VOID)

The first variable-rate run was VOID (0% churn): plain SGD at lr 0.002-0.03
on unit-normalized embeddings doesn't move them. Switched to Adam (real
motion in bursts), n=20k for edge density, and ENFORCED the >=15% churn
precondition (abort before rendering a verdict) so a no-drift trajectory
can't masquerade as a result. Gate criteria unchanged.

Result (n=20k, bursty trajectory, per-step Δchurn ~45 burst / ~2 calm,
89% end churn): WIN. Recall{floor=0.95} = 97.2% @ 7 rebuilds beats
Periodic{k=2} (96.8% @ 12) on BOTH axes; probe cost ~1s vs ~73s rebuild
time saved (trap passed); beats best Frobenius (97.3% @ 9) on rebuilds.

Refs ruvnet#534
The sampled-recall trigger WON (ADR-200 next-step #2): under bursty drift it
uses ~42% fewer rebuilds than fixed Periodic{k} at matched recall, beats the
Frobenius monitor ADR-200 found wanting, and passes the probe-cost trap
(~1s probe vs ~73s rebuild saved). Productionized as RecallTrigger in
ruvector_diskann::reuse (DriftingIndex in ReweightOnly mode + a probe-driven
force_rebuild); its knob 'floor' IS the recall SLA, unlike k/tau. 8 reuse
tests (incl. holds-under-no-drift + fires-then-recovers). ADR-202 addendum
records the result; pre-registration carries the WIN outcome pointer.

Refs ruvnet#534
…ctory

Frozen-before-run generality check of ADR-202's 40% holding ceiling: does
it generalize beyond contrastive link-prediction to a DIFFERENT learned
objective? Adds a node-classification trajectory (real arxiv 40-class
labels, CE on a linear head, embeddings as params) selectable via an
'objective=nodeclass' arg to the existing harness — same contenders + 2%
gate, only the objective changes. CONFIRM = holding ceiling >=30% churn +
periodic recovers; CAVEAT = <20% or materially different (reportable).

Refs ruvnet#534
…y caveat

Node-classification trajectory (2nd objective) holds reuse within 2% of
rebuild up to a 54% churn ceiling (>= link-pred's 40%) -> the ADR-202
holding-ceiling result GENERALIZES across two learned objectives; the
objective-dependence caveat is resolved.

Honest finding (reported, not buried): past ~60% churn node-class CE
collapses embeddings into ~40 class blobs where recall@10 is ill-posed
(intra-blob near-ties) and the FULL-REBUILD baseline itself destabilizes
(B swings 55-96%). The trajectory-wide 'reuse > rebuild +4.3%' is a
benchmark-degeneracy artifact (ADR-200's t=0.25 dip amplified), NOT a
genuine superiority claim. Operational conclusion unaffected (reuse+periodic
never worse). ADR-202 addendum + next-step ruvnet#5 (collapse-aware metric).

Refs ruvnet#534
…ing middle vs reuse/rebuild

Adversarial check on BET 1 (ADR-200/202): does cheap incremental graph repair of
the displaced subset beat BOTH topology-reuse AND full rebuild under metric drift?

Cheap pre-check recorded: ruvector-diskann has NO faithful incremental update
(insert=append+full-rebuild-flag; delete=tombstone, no graph repair). Baseline
must be built. Scoped as in-memory out-edge-recompute + back-edge-refresh of the
top-f displaced nodes (no delete-consolidation — membership is fixed under drift).

Frozen gate: WIN = incremental beats pure-reuse >2pts recall AND <=0.5x rebuild
cost AND within 2pts of rebuild in some churn band; adversarial check vs Periodic{k}
(the real BET 1 incumbent) reported regardless. NO-GO/PARTIAL are acceptable.
…+ harness contender

The BET-1 missing middle: repair only the DISPLACED subset of the Vamana graph
under metric drift, between ReweightOnly (repair nothing) and AlwaysRebuild
(repair everything).

ruvector-diskann (feature reuse-under-drift):
- graph.rs: expose robust_prune as pub(crate) (visibility only, no logic change)
- reuse.rs: IncrementalIndex — for each displaced node, recompute out-edges
  (greedy_search -> robust_prune at new position) + refresh back-edges; top-f
  by displacement-since-last-reindex is the cost/recall knob. No delete-
  consolidation (membership is fixed under drift; nothing is removed). 3 tests.
- lib.rs: export under feature.

harness (diskann_real_trajectory.rs): incremental contender measured on the SAME
trajectory/queries/truth as A/B/P/C; reports the full (recall,cost) Pareto frontier
+ adversarial domination vs Periodic{k}. Frozen thresholds unchanged from the
pre-registration; f* selection corrected to 'best knob' (was 'first qualifying')
to match the frozen wording.

Gate frozen at b388c42 before any contender run.
…scale-qualified)

Adversarial check on BET 1 (ADR-200/202): does cheap incremental graph repair beat
BOTH topology-reuse AND full rebuild under metric drift? YES, at the high-recall tier.

Reproduced at n=20k, n=50k, and on a gradual trajectory: inc-50% matches full-rebuild
recall@10 within ~0.2pts at ~42% of rebuild cost AND Pareto-dominates Periodic{k=2}
(the strongest BET 1 incumbent). Targeted repair of the displaced subset beats lumped
periodic rebuilds at equal cost because it never lets recall sawtooth-decay.

Honest narrowings (all measured, in the ADR):
- Scale-sensitive: frontier SWEEP only at n=20k/93% churn; at n=50k & moderate churn
  the cheap periodic tiers (k=4,k=8) reclaim Pareto-optimality. Incremental EXTENDS the
  high-recall end, not a replacement for periodic.
- Regime-concentrated: advantage emerges above ~40% churn; below that all policies cluster.
- Degeneracy: inc>B at >90% churn is fresh-build-on-collapsed-geometry (inc==B at n=50k).
- f=5% fails the per-query-eval bar at n=50k; clean win regime is f in [0.2,0.5].

Frozen gate (b388c42) passed; outcome stamped on the pre-registration.
…(no build)

Scoped next-step #1 (wire the reuse policy into the live ruvector-gnn
embedding-flush path) before committing any integration code. Finding:
the production embedding->index seam does not exist on either end — gnn
produces embeddings but has no serving module and only a dev-dep on
diskann; the NAPI serving surface is a static-index API with
reuse-under-drift off; mcp-brain-server has a monitor-only DriftMonitor
and no diskann dep. The only place a drifted embedding meets a diskann
index is examples/. Building the loop now would mean inventing the
producer. Recorded the minimal seam (feature-gated DriftingDiskAnn NAPI
binding) instead. Honors prove-not-hype: 'the path isn't there yet,
here's the seam.'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant