Milvus optimize can start search before post-compaction query view is stable

## Summary

The Milvus backend `optimize()` path can finish before Milvus reaches a stable post-compaction query view.

For clean VectorDBBench runs on OpenAI 500K with `SVS_VAMANA_LEANVEC`, the current implementation waits for only the returned compaction id to become `Completed`, then waits for index pending rows, then calls `refresh_load()`. In practice, this is not a strong enough barrier: the returned force-merge compaction can be planned over only a partial segment view while other compactions are still converging, and the first search may start with nondeterministic QueryNode-visible segment counts.

This makes consecutive clean benchmark runs hard to compare because the benchmark can start from different post-optimize segment distributions even though `optimize()` has returned successfully.

## Environment

- VectorDBBench commit: `0c20701725a84fbcd2a14b5d628c77cac2beb071`
- Milvus commit: `ddd76bccd9a1173e6a97221e834315dcbb55271f` (`origin/3.0`, built with `USE_SVS=ON`)
- PyMilvus: `2.6.8`
- Deployment: Milvus standalone
- Host: AWS `m6id.2xlarge`, Intel Xeon Platinum 8375C, 8 vCPU, about 30 GiB RAM
- Dataset/index: OpenAI 500K, dim 1536, cosine, `SVS_VAMANA_LEANVEC`, topK 10

## Reproduction

Run consecutive clean VectorDBBench runs:

```bash
vectordbbench milvussvsvamanaleanvec \
  --uri http://127.0.0.1:20643 \
  --svs-graph-max-degree 64 \
  --svs-construction-window-size 200 \
  --svs-storage-kind leanvec4x8 \
  --svs-search-window-size 1000 \
  --svs-search-buffer-capacity 1000 \
  --case-type Performance1536D500K \
  --k 10 \
  --svs-leanvec-dim 768 \
  --skip-search-concurrent
```

The relevant current Milvus optimize flow is:

```python
def _wait_for_segments_sorted(self):
    segments = self.client.list_persistent_segments(self.collection_name)
    unsorted = [s for s in segments if not s.is_sorted]
    ...

def _wait_for_index(self):
    info = self.client.describe_index(self.collection_name, self._vector_index_name)
    if info.get("pending_index_rows", -1) == 0:
        break

def _wait_for_compaction(self, compaction_id):
    state = self.client.get_compaction_state(compaction_id)
    if state == "Completed":
        break

def _optimize(self):
    self.client.flush(self.collection_name)
    self._wait_for_segments_sorted()
    self._wait_for_index()
    compaction_id = self.client.compact(self.collection_name, target_size=(2**63 - 1))
    if compaction_id > 0:
        self._wait_for_compaction(compaction_id)
    log.info("force merge compaction completed.")
    self._wait_for_index()
    self.client.refresh_load(self.collection_name)
```

## Actual Behavior

Across clean runs of the same workload, VectorDBBench reached serial search with different QueryNode-visible sealed segment counts:

| Run | First-search `sealedSegmentNum` | Recall@10 |
|-----|---------------------------------|-----------|
| 1 | 4 | 0.9929 |
| 2 | 4 | 0.9721 |
| 3 | 1 | 0.9914 |
| 4 | 4 | 0.9775 |
| 5 | 1 | 0.8933 |
| 6 | 2 | 0.9926 |
| 7 | 4 | 0.9847 |
| 8 | 1 | 0.9914 |
| 9 | 1 | 0.8933 |

In one representative clean run, VectorDBBench's force-merge compaction completed, but the plan only compacted one current segment of about 98K rows:

```text
11:41:14 DataCoord force-merge calculation:
  targetSegmentCount=1
  triggerID=466419709007132860

11:41:14 Compaction plan submitted:
  planID=466419709007132861
  inputSegments=[466419709003500393]

11:41:23 get_compaction_state(466419709007132860) returned Completed

compactTo=466419709007132863
numRows=98000
```

Other auto/mix compactions were still producing the rest of the 500K-row segment set around the same time. Before/during the first serial search, QueryNode was fully loaded but still had multiple sealed segments:

```text
11:45:38 QueryNode query view:
  loadedRatio=1
  loadedSealedRowCount=500000
  unloadedSealedSegmentNum=0
  sealedSegmentNum=7

11:45:48 QueryNode query view:
  loadedRatio=1
  loadedSealedRowCount=500000
  unloadedSealedSegmentNum=0
  sealedSegmentNum=5

11:46:18 QueryNode query view:
  loadedRatio=1
  loadedSealedRowCount=500000
  unloadedSealedSegmentNum=0
  sealedSegmentNum=5
```

When a manual force-merge was called later, after the system had converged to those 5 segments, Milvus planned over all five current segments and QueryNode eventually reached one loaded 500K segment:

```text
11:52:37 Compaction plan submitted:
  planID=466419709007266916
  inputSegments=[five current segments]

11:57:48 QueryNode query view:
  loadedRatio=1
  loadedSealedRowCount=500000
  unloadedSealedSegmentNum=0
  sealedSegmentNum=1
```

## Expected Behavior

VectorDBBench's Milvus `optimize()` should only return when the benchmark is ready to run a stable search phase.

At minimum, after compaction and `refresh_load()`, VectorDBBench should wait for a post-compaction steady state, for example:

- `describe_index(...).pending_index_rows == 0`
- persistent segments are sorted and cover the expected row count
- loaded/query segments cover the expected row count
- `unloadedSealedSegmentNum == 0`
- the loaded/query segment id set and row counts remain unchanged for several consecutive polls

For force-merge benchmarks where one compacted segment per channel is expected, VectorDBBench could optionally assert the expected compacted segment count before starting search. If the expected segment count cannot be hardcoded for all Milvus topologies, VectorDBBench should at least wait until the segment id set is stable for a configurable interval.

## Why this matters

Without a stronger post-optimize readiness barrier, consecutive clean VectorDBBench runs may benchmark different Milvus states.

In the OpenAI500K SVS investigation, repeated search-only runs on the same already-loaded collection were stable. The instability was associated with clean runs that rebuild/reload and then immediately enter search after the current `optimize()` flow.

Note: the low `~0.8933` SVS recall state can also be reproduced directly with Knowhere without Milvus segments or QueryNode loading, so this issue is not claiming that segment count alone causes the SVS recall drop. The VectorDBBench issue is narrower: the current Milvus optimize path does not guarantee a stable benchmark start state.

## Related

- A Milvus issue was initially filed for this investigation and should be closed/replaced by this VectorDBBench-side tracking issue: https://github.com/milvus-io/milvus/issues/49972
- Related but different VectorDBBench issue: https://github.com/zilliztech/VectorDBBench/issues/779. That issue was about `target_size` overflow with PyMilvus 3.0, while this one is about optimize/readiness semantics after compaction.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Milvus optimize can start search before post-compaction query view is stable #784

Summary

Environment

Reproduction

Actual Behavior

Expected Behavior

Why this matters

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Run	First-search `sealedSegmentNum`	Recall@10
1	4	0.9929
2	4	0.9721
3	1	0.9914
4	4	0.9775
5	1	0.8933
6	2	0.9926
7	4	0.9847
8	1	0.9914
9	1	0.8933

Milvus optimize can start search before post-compaction query view is stable #784

Description

Summary

Environment

Reproduction

Actual Behavior

Expected Behavior

Why this matters

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions