Skip to content

Commit 2b5f39b

Browse files
committed
perf(planner): use USearch index.get() for low-selectivity path
Replace the parquet-native full scan with direct vector retrieval from the USearch index. The index stores vectors alongside the HNSW graph, so index.get(key) retrieves them in O(1) per key. Previously, the low-selectivity path scanned the entire Parquet file including the vector column (e.g. 6.95GB for 1.2M rows) just to compute distances for the few rows matching the WHERE clause. Now it retrieves vectors only for valid_keys collected during the pre-scan, computes distances, maintains a top-k heap, then fetches result rows from the lookup provider. This eliminates the full_scan DataSourceExec at runtime for filtered queries. The parquet-native code is retained but unused, pending removal after production validation.
1 parent edb6c94 commit 2b5f39b

2 files changed

Lines changed: 213 additions & 310 deletions

File tree

0 commit comments

Comments
 (0)