Summary
The Milvus backend optimize() path can finish before Milvus reaches a stable post-compaction query view.
For clean VectorDBBench runs on OpenAI 500K with SVS_VAMANA_LEANVEC, the current implementation waits for only the returned compaction id to become Completed, then waits for index pending rows, then calls refresh_load(). In practice, this is not a strong enough barrier: the returned force-merge compaction can be planned over only a partial segment view while other compactions are still converging, and the first search may start with nondeterministic QueryNode-visible segment counts.
This makes consecutive clean benchmark runs hard to compare because the benchmark can start from different post-optimize segment distributions even though optimize() has returned successfully.
Environment
- VectorDBBench commit:
0c20701725a84fbcd2a14b5d628c77cac2beb071
- Milvus commit:
ddd76bccd9a1173e6a97221e834315dcbb55271f (origin/3.0, built with USE_SVS=ON)
- PyMilvus:
2.6.8
- Deployment: Milvus standalone
- Host: AWS
m6id.2xlarge, Intel Xeon Platinum 8375C, 8 vCPU, about 30 GiB RAM
- Dataset/index: OpenAI 500K, dim 1536, cosine,
SVS_VAMANA_LEANVEC, topK 10
Reproduction
Run consecutive clean VectorDBBench runs:
vectordbbench milvussvsvamanaleanvec \
--uri http://127.0.0.1:20643 \
--svs-graph-max-degree 64 \
--svs-construction-window-size 200 \
--svs-storage-kind leanvec4x8 \
--svs-search-window-size 1000 \
--svs-search-buffer-capacity 1000 \
--case-type Performance1536D500K \
--k 10 \
--svs-leanvec-dim 768 \
--skip-search-concurrent
The relevant current Milvus optimize flow is:
def _wait_for_segments_sorted(self):
segments = self.client.list_persistent_segments(self.collection_name)
unsorted = [s for s in segments if not s.is_sorted]
...
def _wait_for_index(self):
info = self.client.describe_index(self.collection_name, self._vector_index_name)
if info.get("pending_index_rows", -1) == 0:
break
def _wait_for_compaction(self, compaction_id):
state = self.client.get_compaction_state(compaction_id)
if state == "Completed":
break
def _optimize(self):
self.client.flush(self.collection_name)
self._wait_for_segments_sorted()
self._wait_for_index()
compaction_id = self.client.compact(self.collection_name, target_size=(2**63 - 1))
if compaction_id > 0:
self._wait_for_compaction(compaction_id)
log.info("force merge compaction completed.")
self._wait_for_index()
self.client.refresh_load(self.collection_name)
Actual Behavior
Across clean runs of the same workload, VectorDBBench reached serial search with different QueryNode-visible sealed segment counts:
| Run |
First-search sealedSegmentNum |
Recall@10 |
| 1 |
4 |
0.9929 |
| 2 |
4 |
0.9721 |
| 3 |
1 |
0.9914 |
| 4 |
4 |
0.9775 |
| 5 |
1 |
0.8933 |
| 6 |
2 |
0.9926 |
| 7 |
4 |
0.9847 |
| 8 |
1 |
0.9914 |
| 9 |
1 |
0.8933 |
In one representative clean run, VectorDBBench's force-merge compaction completed, but the plan only compacted one current segment of about 98K rows:
11:41:14 DataCoord force-merge calculation:
targetSegmentCount=1
triggerID=466419709007132860
11:41:14 Compaction plan submitted:
planID=466419709007132861
inputSegments=[466419709003500393]
11:41:23 get_compaction_state(466419709007132860) returned Completed
compactTo=466419709007132863
numRows=98000
Other auto/mix compactions were still producing the rest of the 500K-row segment set around the same time. Before/during the first serial search, QueryNode was fully loaded but still had multiple sealed segments:
11:45:38 QueryNode query view:
loadedRatio=1
loadedSealedRowCount=500000
unloadedSealedSegmentNum=0
sealedSegmentNum=7
11:45:48 QueryNode query view:
loadedRatio=1
loadedSealedRowCount=500000
unloadedSealedSegmentNum=0
sealedSegmentNum=5
11:46:18 QueryNode query view:
loadedRatio=1
loadedSealedRowCount=500000
unloadedSealedSegmentNum=0
sealedSegmentNum=5
When a manual force-merge was called later, after the system had converged to those 5 segments, Milvus planned over all five current segments and QueryNode eventually reached one loaded 500K segment:
11:52:37 Compaction plan submitted:
planID=466419709007266916
inputSegments=[five current segments]
11:57:48 QueryNode query view:
loadedRatio=1
loadedSealedRowCount=500000
unloadedSealedSegmentNum=0
sealedSegmentNum=1
Expected Behavior
VectorDBBench's Milvus optimize() should only return when the benchmark is ready to run a stable search phase.
At minimum, after compaction and refresh_load(), VectorDBBench should wait for a post-compaction steady state, for example:
describe_index(...).pending_index_rows == 0
- persistent segments are sorted and cover the expected row count
- loaded/query segments cover the expected row count
unloadedSealedSegmentNum == 0
- the loaded/query segment id set and row counts remain unchanged for several consecutive polls
For force-merge benchmarks where one compacted segment per channel is expected, VectorDBBench could optionally assert the expected compacted segment count before starting search. If the expected segment count cannot be hardcoded for all Milvus topologies, VectorDBBench should at least wait until the segment id set is stable for a configurable interval.
Why this matters
Without a stronger post-optimize readiness barrier, consecutive clean VectorDBBench runs may benchmark different Milvus states.
In the OpenAI500K SVS investigation, repeated search-only runs on the same already-loaded collection were stable. The instability was associated with clean runs that rebuild/reload and then immediately enter search after the current optimize() flow.
Note: the low ~0.8933 SVS recall state can also be reproduced directly with Knowhere without Milvus segments or QueryNode loading, so this issue is not claiming that segment count alone causes the SVS recall drop. The VectorDBBench issue is narrower: the current Milvus optimize path does not guarantee a stable benchmark start state.
Related
Summary
The Milvus backend
optimize()path can finish before Milvus reaches a stable post-compaction query view.For clean VectorDBBench runs on OpenAI 500K with
SVS_VAMANA_LEANVEC, the current implementation waits for only the returned compaction id to becomeCompleted, then waits for index pending rows, then callsrefresh_load(). In practice, this is not a strong enough barrier: the returned force-merge compaction can be planned over only a partial segment view while other compactions are still converging, and the first search may start with nondeterministic QueryNode-visible segment counts.This makes consecutive clean benchmark runs hard to compare because the benchmark can start from different post-optimize segment distributions even though
optimize()has returned successfully.Environment
0c20701725a84fbcd2a14b5d628c77cac2beb071ddd76bccd9a1173e6a97221e834315dcbb55271f(origin/3.0, built withUSE_SVS=ON)2.6.8m6id.2xlarge, Intel Xeon Platinum 8375C, 8 vCPU, about 30 GiB RAMSVS_VAMANA_LEANVEC, topK 10Reproduction
Run consecutive clean VectorDBBench runs:
The relevant current Milvus optimize flow is:
Actual Behavior
Across clean runs of the same workload, VectorDBBench reached serial search with different QueryNode-visible sealed segment counts:
sealedSegmentNumIn one representative clean run, VectorDBBench's force-merge compaction completed, but the plan only compacted one current segment of about 98K rows:
Other auto/mix compactions were still producing the rest of the 500K-row segment set around the same time. Before/during the first serial search, QueryNode was fully loaded but still had multiple sealed segments:
When a manual force-merge was called later, after the system had converged to those 5 segments, Milvus planned over all five current segments and QueryNode eventually reached one loaded 500K segment:
Expected Behavior
VectorDBBench's Milvus
optimize()should only return when the benchmark is ready to run a stable search phase.At minimum, after compaction and
refresh_load(), VectorDBBench should wait for a post-compaction steady state, for example:describe_index(...).pending_index_rows == 0unloadedSealedSegmentNum == 0For force-merge benchmarks where one compacted segment per channel is expected, VectorDBBench could optionally assert the expected compacted segment count before starting search. If the expected segment count cannot be hardcoded for all Milvus topologies, VectorDBBench should at least wait until the segment id set is stable for a configurable interval.
Why this matters
Without a stronger post-optimize readiness barrier, consecutive clean VectorDBBench runs may benchmark different Milvus states.
In the OpenAI500K SVS investigation, repeated search-only runs on the same already-loaded collection were stable. The instability was associated with clean runs that rebuild/reload and then immediately enter search after the current
optimize()flow.Note: the low
~0.8933SVS recall state can also be reproduced directly with Knowhere without Milvus segments or QueryNode loading, so this issue is not claiming that segment count alone causes the SVS recall drop. The VectorDBBench issue is narrower: the current Milvus optimize path does not guarantee a stable benchmark start state.Related
target_sizeoverflow with PyMilvus 3.0, while this one is about optimize/readiness semantics after compaction.