Skip to content

fix(rust): fixed IVF_RQ indexes ignoring the frag-reuse index#7217

Open
gstamatakis95 wants to merge 1 commit into
lance-format:mainfrom
gstamatakis95:fix/ivf-rq-frag-reuse-remap
Open

fix(rust): fixed IVF_RQ indexes ignoring the frag-reuse index#7217
gstamatakis95 wants to merge 1 commit into
lance-format:mainfrom
gstamatakis95:fix/ivf-rq-frag-reuse-remap

Conversation

@gstamatakis95

@gstamatakis95 gstamatakis95 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Closes #7216

After compaction with defer_index_remap=true, covering vector indexes are expected to remap their stored row addresses through the __lance_frag_reuse index at load time. The IVF_PQ storage does this, but
RabitQuantizationStorage::try_from_batch received the frag-reuse index as a parameter named _fri and discarded it. As a result an IVF_RQ index loaded from a debt-carrying version kept its pre-compaction row addresses, and any ANN query that fetched row content failed at the scanner take stage with take operation specified fragment id N but this fragment does not exist.

The IVF state cache reconstruction path in reconstruct_typed passed a hard-coded None for the frag-reuse index, so even with the storage fixed the remap was skipped whenever the index was rebuilt from a cached state. The cache key already includes the frag-reuse uuid, so cached states with and without remap debt stay distinct when the index is threaded through.

This PR makes the following changes:

In rust/lance-index/src/vector/bq/storage.rs, try_from_batch now builds a row id mapping for the partition from FragReuseIndex::remap_row_id, mirroring the PQ derivation, and applies the storage's existing
remap method. Reusing remap instead of porting the PQ inline rewrite is deliberate, because RQ codes live in a packed and permuted SIMD layout and remap already performs the layout-correct filter and repack,
including the multi-bit split ex code and factor columns. It is already pinned by the existing identity-remap regression tests. When no frag-reuse index is present, when its row id maps are empty, or when
no row in the partition is affected, the path stays a no-op. The generic remap_row_ids_record_batch helper used by the flat and SQ storages was considered and does not fit here, both because it assumes a two
column batch and because a plain row-wise take is not safe on the packed code layout.

In rust/lance/src/index/vector/ivf/v2.rs and rust/lance/src/index.rs, IvfStateEntry::reconstruct gains a frag-reuse parameter and the caller passes open_frag_reuse_index, which is already cached in the index
cache, closing the warm cache hole without adding load overhead.

In rust/lance/src/dataset/optimize.rs, a regression test test_read_ivf_rq_index_v3_with_defer_index_remap is added next to the PQ analog. It queries with a non-empty projection because the take stage where
this bug manifests is only exercised when row content is fetched. The existing PQ test uses an empty projection and never reaches that stage, which is why this went undetected.

Verification: with the storage remap reverted, the new test fails with the exact defect signature from scanner.rs. With the fix it passes. cargo test -p lance defer_index_remap passes (the new RQ test plus
the existing PQ tests), cargo test -p lance-index bq passes, and cargo fmt and cargo clippy -- -D warnings are clean on the touched crates.

@github-actions github-actions Bot added A-index Vector index, linalg, tokenizer bug Something isn't working labels Jun 10, 2026
@gstamatakis95 gstamatakis95 force-pushed the fix/ivf-rq-frag-reuse-remap branch from ad10713 to c9a48a9 Compare June 10, 2026 21:18
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.96970% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-index/src/vector/bq/storage.rs 88.00% 2 Missing and 1 partial ⚠️
rust/lance/src/dataset/optimize.rs 99.01% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-index Vector index, linalg, tokenizer bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(rust): IVF_RQ indexes ignore the frag-reuse index at read time breaking queries after compaction with defer_index_remap

1 participant