Skip to content

fix for the issue#42#44

Merged
srimon12 merged 2 commits into
mainfrom
qql22
May 24, 2026
Merged

fix for the issue#42#44
srimon12 merged 2 commits into
mainfrom
qql22

Conversation

@pavanjava
Copy link
Copy Markdown
Owner

@pavanjava pavanjava commented May 24, 2026

Summary by CodeRabbit

  • Refactor
    • Reduced redundant backend metadata calls and streamlined collection validation, improving performance for search, update, and collection-alter operations and ensuring vector-dimension checks use resolved topology.
  • Tests
    • Added regression tests to assert a single metadata fetch per operation and to cover dimension-mismatch, creation-on-404, insert, and search scenarios.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 24, 2026

📝 Walkthrough

Walkthrough

This PR caches dense vector sizes in CollectionTopology, consolidates topology resolution to a single fetch per operation, and updates executor flows and validation to use the cached topology; tests assert exactly one metadata fetch per high-level operation.

Changes

Topology Metadata Caching

Layer / File(s) Summary
CollectionTopology contract and dense size caching
src/qql/executor.py
CollectionTopology adds dense_sizes field and dense_size_map() method to expose cached dense vector sizes as {name: size}, with "" representing the unnamed single-vector case.
Single-call fetch and topology parsing
src/qql/executor.py
Introduces _fetch_collection_info() to fetch collection info once (404→None) and _topology_from_collection_info() to parse Qdrant vector configs into a CollectionTopology including dense_sizes for named and unnamed vectors.
Consolidate collection existence checks across execution flows
src/qql/executor.py
_execute_alter_collection(), _execute_search(), and _execute_update_vector() now resolve topology and check topology.exists instead of calling collection_exists() separately.
_execute_show_collection() single fetch
src/qql/executor.py
_execute_show_collection() uses _fetch_collection_info() and treats a None result (404) as "collection does not exist" instead of a separate exists+get sequence.
Dimension validation using cached topology sizes
src/qql/executor.py
_ensure_collection() validates existing-collection vector sizes against topology.dense_size_map() (named via dense_using() and unnamed via "") and raises QQLRuntimeError on mismatch; creation for non-existent collections still creates the correct dense config.
Tests: client stub and fetch-count regression
tests/test_executor.py
Updates imports, makes mock_client.get_collection raise UnexpectedResponse(404) when missing, and adds TestCollectionMetadataFetchCount to assert one get_collection() call per operation across several scenarios.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

A rabbit scurries, bytes in tow, 🐇
One fetch to learn what vectors know,
Sizes cached and calls made neat,
No repeats to slow its feet,
Victory nibble — one-call flow!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.17% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is vague and non-descriptive; it only references an issue number without explaining the actual change or improvement being made. Update the title to be more descriptive. For example: 'Optimize topology resolution to reduce Qdrant API calls' or 'Cache dense vector sizes and consolidate collection metadata fetches' to clearly convey the main change.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch qql22

Comment @coderabbitai help to get the list of available commands and usage tips.

@srimon12
Copy link
Copy Markdown
Collaborator

The change is in the right direction: caching dense vector sizes in CollectionTopology removes the extra get_collection() from _ensure_collection().

Before merging, I think we should complete the scope of issue #42 in this PR:

  1. Make _resolve_topology() the single metadata fetch path.

    • For existing collections, avoid collection_exists() + get_collection().
    • Prefer one get_collection() call and handle only the collection-not-found case as exists=False.
  2. Split API access from topology parsing.

    • Keep _resolve_topology() responsible for fetching.
    • Move vector/sparse parsing into a small helper like _topology_from_collection_info(info).
  3. Add regression tests for the actual API-call behavior.

    • Existing dense INSERT should fetch collection metadata once.
    • Existing hybrid auto-detect INSERT should fetch collection metadata once.
    • Dimension mismatch should still raise without a second metadata fetch.
    • Nonexistent collection flow should still create or fail as before.
  4. Link the PR to the issue.

    • Add Fixes #42 once the above is complete.

This would turn the current partial optimization into the full fix: one resolved topology object reused for both routing and validation.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_executor.py`:
- Around line 3524-3613: The tests (e.g.,
test_insert_existing_dense_collection_fetches_metadata_once,
test_insert_existing_hybrid_collection_fetches_metadata_once,
test_dimension_mismatch_fetches_metadata_once,
test_search_existing_collection_fetches_metadata_once,
test_insert_nonexistent_collection_issues_one_get_collection_call) currently
assert get_collection() was called once but do not assert collection_exists()
was not called; update each single-fetch-path test to also assert
mock_client.collection_exists.assert_not_called() (or the equivalent negative
assertion) after executor.execute(...) so the tests enforce the Issue `#42`
contract that no extra collection_exists() round-trip occurs when metadata is
fetched once.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 45760e95-f3fe-4a6b-839f-dcab74eb54ff

📥 Commits

Reviewing files that changed from the base of the PR and between ff19012 and 0275e85.

📒 Files selected for processing (2)
  • src/qql/executor.py
  • tests/test_executor.py

Comment thread tests/test_executor.py
Comment on lines +3524 to +3613
def test_insert_existing_dense_collection_fetches_metadata_once(
self, executor, mock_client
):
"""Dense INSERT to an existing collection must call get_collection() exactly once."""
mock_client.collection_exists.return_value = True
mock_client.get_collection.return_value.config.params.vectors.size = 384
# Use a plain unnamed-dense mock (non-dict vectors)
mock_client.get_collection.return_value.config.params.sparse_vectors = None

node = InsertStmt(collection="docs", values={"text": "hello"}, model=None)
executor.execute(node)

mock_client.get_collection.assert_called_once()

def test_insert_existing_hybrid_collection_fetches_metadata_once(
self, executor, mock_client
):
"""Hybrid auto-detect INSERT must call get_collection() exactly once."""
from qdrant_client.models import Distance, SparseVectorParams, VectorParams

mock_client.collection_exists.return_value = True
mock_client.get_collection.return_value.config.params.vectors = {
"dense": VectorParams(size=384, distance=Distance.COSINE)
}
mock_client.get_collection.return_value.config.params.sparse_vectors = {
"sparse": SparseVectorParams()
}

node = InsertStmt(collection="docs", values={"text": "hello"}, model=None)
executor.execute(node)

mock_client.get_collection.assert_called_once()

def test_dimension_mismatch_fetches_metadata_once(self, executor, mock_client):
"""A dimension-mismatch error must be raised without a second metadata fetch."""
from qdrant_client.models import Distance, VectorParams

mock_client.collection_exists.return_value = True
mock_client.get_collection.return_value.config.params.vectors = {
"dense": VectorParams(size=768, distance=Distance.COSINE)
}
mock_client.get_collection.return_value.config.params.sparse_vectors = None

node = InsertStmt(
collection="docs",
values={"text": "hello"},
model=None,
dense_vector="dense",
)
from qql.exceptions import QQLRuntimeError

with pytest.raises(QQLRuntimeError, match="dimension mismatch"):
executor.execute(node)

# Only one metadata call despite the error
mock_client.get_collection.assert_called_once()

def test_insert_nonexistent_collection_issues_one_get_collection_call(
self, executor, mock_client
):
"""Inserting into a brand-new collection calls get_collection() once
(which returns 404 and is handled as exists=False) then creates the collection."""
mock_client.collection_exists.return_value = False

node = InsertStmt(collection="new_col", values={"text": "hello"}, model=None)
executor.execute(node)

# _fetch_collection_info() must have been called once (got 404 → new collection)
mock_client.get_collection.assert_called_once()
mock_client.create_collection.assert_called_once()

def test_search_existing_collection_fetches_metadata_once(
self, executor, mock_client
):
"""SEARCH on an existing dense collection must call get_collection() exactly once."""
from qdrant_client.models import Distance, VectorParams

mock_client.collection_exists.return_value = True
mock_client.get_collection.return_value.config.params.vectors = {
"dense": VectorParams(size=384, distance=Distance.COSINE)
}
mock_client.get_collection.return_value.config.params.sparse_vectors = None
mock_client.query_points.return_value.points = []

node = SearchStmt(
collection="docs", query_text="hello", limit=5, model=None
)
executor.execute(node)

mock_client.get_collection.assert_called_once()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Coverage gap: assert that collection_exists() is not called in the single-fetch path tests.

These tests currently prove one get_collection() call, but they still allow an extra collection_exists() round-trip. That misses the explicit Issue #42 contract.

✅ Suggested test hardening
 class TestCollectionMetadataFetchCount:
@@
     def test_insert_existing_dense_collection_fetches_metadata_once(
         self, executor, mock_client
     ):
@@
         executor.execute(node)

         mock_client.get_collection.assert_called_once()
+        mock_client.collection_exists.assert_not_called()
@@
     def test_insert_existing_hybrid_collection_fetches_metadata_once(
         self, executor, mock_client
     ):
@@
         executor.execute(node)

         mock_client.get_collection.assert_called_once()
+        mock_client.collection_exists.assert_not_called()
@@
     def test_dimension_mismatch_fetches_metadata_once(self, executor, mock_client):
@@
         with pytest.raises(QQLRuntimeError, match="dimension mismatch"):
             executor.execute(node)

         # Only one metadata call despite the error
         mock_client.get_collection.assert_called_once()
+        mock_client.collection_exists.assert_not_called()
@@
     def test_insert_nonexistent_collection_issues_one_get_collection_call(
         self, executor, mock_client
     ):
@@
         # _fetch_collection_info() must have been called once (got 404 → new collection)
         mock_client.get_collection.assert_called_once()
+        mock_client.collection_exists.assert_not_called()
         mock_client.create_collection.assert_called_once()
@@
     def test_search_existing_collection_fetches_metadata_once(
         self, executor, mock_client
     ):
@@
         executor.execute(node)

         mock_client.get_collection.assert_called_once()
+        mock_client.collection_exists.assert_not_called()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_executor.py` around lines 3524 - 3613, The tests (e.g.,
test_insert_existing_dense_collection_fetches_metadata_once,
test_insert_existing_hybrid_collection_fetches_metadata_once,
test_dimension_mismatch_fetches_metadata_once,
test_search_existing_collection_fetches_metadata_once,
test_insert_nonexistent_collection_issues_one_get_collection_call) currently
assert get_collection() was called once but do not assert collection_exists()
was not called; update each single-fetch-path test to also assert
mock_client.collection_exists.assert_not_called() (or the equivalent negative
assertion) after executor.execute(...) so the tests enforce the Issue `#42`
contract that no extra collection_exists() round-trip occurs when metadata is
fetched once.

Comment thread src/qql/executor.py
return None
raise QQLRuntimeError(
f"Qdrant error fetching collection '{name}': {e}"
) from e
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        except ValueError as e:
            if f"Collection {name} not found" in str(e):
                return None
            raise

add something like this .. so that get_collection() raises ValueError instead of HTTP UnexpectedResponse(404)

Comment thread tests/test_executor.py
executor.execute(node)

mock_client.get_collection.assert_called_once()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mock_client.collection_exists.assert_not_called()

Comment thread tests/test_executor.py
executor.execute(node)

mock_client.get_collection.assert_called_once()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mock_client.collection_exists.assert_not_called()

Comment thread tests/test_executor.py

# Only one metadata call despite the error
mock_client.get_collection.assert_called_once()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mock_client.collection_exists.assert_not_called()

Comment thread tests/test_executor.py
mock_client.get_collection.assert_called_once()

def test_insert_nonexistent_collection_issues_one_get_collection_call(
self, executor, mock_client
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change it to
self, executor, mock_client, mocker

Comment thread tests/test_executor.py
"""Inserting into a brand-new collection calls get_collection() once
(which returns 404 and is handled as exists=False) then creates the collection."""
mock_client.collection_exists.return_value = False

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_collection = mocker.patch.object(executor, "_create_collection_and_wait")

Comment thread tests/test_executor.py
# _fetch_collection_info() must have been called once (got 404 → new collection)
mock_client.get_collection.assert_called_once()
mock_client.create_collection.assert_called_once()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mock_client.collection_exists.assert_not_called()

Comment thread tests/test_executor.py
)
executor.execute(node)

mock_client.get_collection.assert_called_once()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mock_client.collection_exists.assert_not_called()

Comment thread tests/test_executor.py
)
executor.execute(node)

mock_client.get_collection.assert_called_once()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def test_local_qdrant_missing_collection_resolves_as_absent(self, cfg):
"""Qdrant local mode raises ValueError for missing collections, not HTTP 404."""
from qdrant_client import QdrantClient

    topology = Executor(QdrantClient(":memory:"), cfg)._resolve_topology("ghost")

    assert topology.exists is False

@srimon12 srimon12 merged commit dbb6713 into main May 24, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants