Commit 2b5f39b
committed
perf(planner): use USearch index.get() for low-selectivity path
Replace the parquet-native full scan with direct vector retrieval from
the USearch index. The index stores vectors alongside the HNSW graph,
so index.get(key) retrieves them in O(1) per key.
Previously, the low-selectivity path scanned the entire Parquet file
including the vector column (e.g. 6.95GB for 1.2M rows) just to
compute distances for the few rows matching the WHERE clause. Now it
retrieves vectors only for valid_keys collected during the pre-scan,
computes distances, maintains a top-k heap, then fetches result rows
from the lookup provider.
This eliminates the full_scan DataSourceExec at runtime for filtered
queries. The parquet-native code is retained but unused, pending
removal after production validation.1 parent edb6c94 commit 2b5f39b
2 files changed
Lines changed: 213 additions & 310 deletions
0 commit comments