fix: evaluate all list-element docs in FTS prefilter walk-the-allowlist branch by Ar-maan05 · Pull Request #7246 · lance-format/lance

Ar-maan05 · 2026-06-12T08:46:01Z

Problem

FTS search() combined with a where(...) prefilter on a list<string> / large_list<large_string> column silently drops matches when the query token sits at any position other than the last in a row's list. .postfilter() (FTS first, then filter) returns the correct rows.

Reported as lancedb#3352 with a runnable Python repro. The plan is MatchQuery > ScalarIndexQuery, and the bug only surfaces when the planner picks the small-allowlist prefilter path (index_comparisons ≈ allowlist size):

Target row `keywords`	prefilter (default)	postfilter
`["needle", "synonym"]`	0 rows (bug)	2 rows
`["synonym", "needle"]`	2 rows	2 rows

Root cause

A list column indexes every element as its own document, so one row_id owns several doc_ids: DocSet.inv (a Vec<(row_id, doc_id)> sorted by row_id) holds multiple entries per row.

DocSet::doc_id(row_id) resolved a row to a single doc_id via binary_search_by_key, and its only caller is Wand::flat_search: the walk-the-allowlist prefilter branch. It therefore evaluated just one of the row's
documents against the posting lists; when the query token lived in any other element, the row became a false negative.

The regular WAND path is forward-driven (document -> row_id, with a per-document mask check), so it was always correct, only flat_search was affected, which is why the bug is specific to the prefilter branch.

Fix

Replace DocSet::doc_id with DocSet::doc_ids(row_id) -> impl Iterator, which yields every doc_id in the contiguous equal-key run in inv (the legacy row_id == doc_id shape still resolves to a single document).
flat_search now expands each allow-listed row_id to all of its documents (flat_map over doc_ids) before sorting into doc-id order.

This brings flat_search to parity with the WAND path, so it introduces no new duplicate-row behaviour: only documents actually present in the posting lists score.

Tests

test_doc_ids_resolves_every_document_a_row_owns: unit coverage of the multi-valued resolution (list shape, legacy shape, and a missing row).
test_flat_search_finds_list_row_with_match_at_non_last_position (rstest, compressed + plain): reproduces the bug; it fails on the previous single-doc_id resolution and passes with the fix.

All 143 scalar::inverted tests pass; cargo fmt --all --check and cargo clippy -p lance-index --tests -- -D warnings are clean.

Closes lancedb#3352

A list<string> column indexes each element as its own document sharing the row's id, but flat_search resolved each allow-listed row_id to a single doc_id, dropping matches at non-last list positions. Closes lancedb#3352

codecov · 2026-06-12T09:28:22Z

Codecov Report

❌ Patch coverage is 96.15385% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-index/src/scalar/inverted/wand.rs	95.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

fix: evaluate all list-element docs in FTS prefilter walk

0e13752

A list<string> column indexes each element as its own document sharing the row's id, but flat_search resolved each allow-listed row_id to a single doc_id, dropping matches at non-last list positions. Closes lancedb#3352

github-actions Bot added A-index Vector index, linalg, tokenizer bug Something isn't working labels Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: evaluate all list-element docs in FTS prefilter walk-the-allowlist branch#7246

fix: evaluate all list-element docs in FTS prefilter walk-the-allowlist branch#7246
Ar-maan05 wants to merge 1 commit into
lance-format:mainfrom
Ar-maan05:fix/fts-list-prefilter-drops-matches

Ar-maan05 commented Jun 12, 2026

Uh oh!

codecov Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ar-maan05 commented Jun 12, 2026

Problem

Root cause

Fix

Tests

Uh oh!

codecov Bot commented Jun 12, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant