Skip to content

feat: add async execution and batched programmatic APIs#45

Open
srimon12 wants to merge 1 commit into
pavanjava:mainfrom
srimon12:feat/refractor-batch-grpc-async-support
Open

feat: add async execution and batched programmatic APIs#45
srimon12 wants to merge 1 commit into
pavanjava:mainfrom
srimon12:feat/refractor-batch-grpc-async-support

Conversation

@srimon12
Copy link
Copy Markdown
Collaborator

@srimon12 srimon12 commented May 24, 2026

Summary

This PR upgrades QQL from a primarily sync, one-query-at-a-time execution layer into a stronger application-facing client API.

It adds async execution, sync/async batching, parameterized query helpers, optional gRPC transport, and a shared utility layer that reduces duplicated executor/parser logic.

What Changed

Programmatic API

  • Added AsyncConnection for async applications using AsyncQdrantClient
  • Added AsyncExecutor to mirror the sync executor behavior with async / await
  • Added QQLBatch and QQLAsyncBatch context managers
  • Added batch helpers:
    • run_queries_batch()
    • run_parameterized_batch()
  • Added parameterized query helpers:
    • run_parameterized_query()
    • run_parameterized_batch()
  • Added optional gRPC connection support:
    • prefer_grpc
    • grpc_port

QQL Language

  • Added BEGIN BATCH ... END BATCH syntax
  • Added semicolon token support for batch blocks
  • Updated script splitting so batch blocks are treated as one executable statement

Executor / Parser Internals

  • Added BatchBlockStmt
  • Added grouped batch execution for compatible statements
  • Combined compatible inserts into bulk insert execution
  • Combined compatible search/recommend statements through Qdrant batch query APIs
  • Moved shared pure logic into qql.utils, including:
    • filter conversion
    • dense query shaping
    • vector payload shaping
    • topology parsing helpers
    • batch grouping
    • search parser helpers
    • parameter rendering
  • Integrated the latest collection topology improvements from PR fix for the issue#42 #44

Tests and Docs

  • Added async connection/executor coverage
  • Added tests for batching, parameterized queries, gRPC wiring, and async lifecycle behavior
  • Updated docs for async usage, batching, parameterized queries, gRPC, and batch-block syntax

Why

These changes make QQL more useful inside real Python applications, not only as a CLI.

Before this, repeated programmatic usage was mostly centered around one statement at a time. This PR adds the pieces needed for higher-throughput and async workloads:

  • reuse one connection across many queries
  • run multiple compatible queries with fewer Qdrant round trips
  • use QQL from async services
  • build query templates with parameters
  • opt into gRPC transport where appropriate
  • keep sync and async behavior aligned through shared helpers

Validation

  • uv run pytest -q
  • Result: 635 passed

Summary by CodeRabbit

  • New Features

    • Batch query execution with BEGIN BATCH...END BATCH syntax for grouped operations.
    • Async Python API with AsyncConnection for asyncio-based applications.
    • gRPC transport support via prefer_grpc option.
    • Parameterized queries with template-based named parameters.
  • Documentation

    • Expanded programmatic API guide covering sync/async clients, batching, and parameterized queries.
    • Added gRPC configuration and batching examples.
  • Tests

    • Increased test coverage to 635 tests (from 549).

Review Change Stack

Add async QQL support with AsyncConnection and AsyncExecutor, plus sync
and async batching helpers for running multiple statements through one
programmatic API. Introduce BEGIN BATCH syntax, parameterized query
helpers, and optional gRPC connection settings.

Refactor shared parser and executor logic into qql.utils so sync and
async paths can reuse filter conversion, vector shaping, topology parsing,
batch grouping, and search parsing helpers.

Update tests and docs for async usage, batching, parameterized queries,
gRPC configuration, and batch block execution.
@srimon12 srimon12 requested a review from pavanjava May 24, 2026 14:02
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 24, 2026

📝 Walkthrough

Walkthrough

This pull request implements comprehensive batch query execution and asynchronous support for QQL. The changes enable BEGIN BATCH ... END BATCH syntax for grouped operations, introduce AsyncConnection and AsyncExecutor for async workloads with optional gRPC transport, add parameterized query templating, extract shared utility functions used across both sync and async paths, and expand the public API surface with all new client types.

Changes

Batch Execution & Async Support

Layer / File(s) Summary
Language Foundation: Lexer, Parser, AST, and Script Splitting
src/qql/lexer.py, src/qql/parser.py, src/qql/ast_nodes.py, src/qql/script.py
Lexer emits BEGIN/BATCH/END/SEMICOLON tokens; parser routes BEGIN to _parse_batch_block which parses semicolon-separated statements until END BATCH; AST includes new BatchBlockStmt node; script splitting tracks batch_depth to handle nested batch blocks correctly.
Shared Utilities: Batching, Filtering, and Search Parsing
src/qql/utils.py
New module provides BatchGroup and search option dataclasses, batch statement grouping logic, parameterized query rendering, filter/Qdrant conversion, hybrid fusion and MMR handling, and search clause parsing (LOOKUP, USING, WITH, GROUP BY) extracted for reuse by both executors.
Sync Executor: Refactoring and Batch Support
src/qql/executor.py
Executor refactored to delegate vector/filter/search helpers to utils; supports BatchBlockStmt via _execute_batch_block and _execute_query_batch; INSERT BULK returns {ids: [...]} instead of single count; collection/topology handling simplified via imported utilities.
Sync Connection & Batch APIs: gRPC, Batching, and Proxy Objects
src/qql/connection.py
Connection constructor adds prefer_grpc and grpc_port for optional gRPC transport; implements run_queries_batch, run_parameterized_query, and run_parameterized_batch methods; introduces QQLBatch context manager and OperationProxy for deferred batch execution.
Async Executor: Full Async Implementation for All Operations
src/qql/async_executor.py
Complete AsyncExecutor supporting all collection and data operations (insert/delete/update/search/recommend/batch) with async Qdrant client, hybrid/dense/sparse search modes, GROUP BY support, optional reranking, and concurrent collection creation coordination via async lock.
Async Connection: Async Context Manager and Batch APIs
src/qql/async_connection.py
AsyncConnection constructs AsyncQdrantClient and AsyncExecutor, exposes async run_query and batching methods, supports async context manager lifecycle, and provides config/executor properties. QQLAsyncBatch context manager queues and executes batches via async run_queries_batch, resolving AsyncOperationProxy instances.
Public API Exports
src/qql/__init__.py
Expands all to export AsyncConnection, AsyncExecutor, QQLBatch, QQLAsyncBatch, OperationProxy, async variants, error types, parser/lexer, and run_query convenience function.
Documentation: User Guides and API Reference
README.md, docs/getting-started.md, docs/programmatic.md, docs/reference.md, docs/scripts.md
Updates badges (549→635 tests), README examples showing parameterized queries and QQLBatch, Getting Started batch example, Programmatic Usage documentation for AsyncConnection and gRPC, Reference section for public API, and Scripts documentation for batch block execution semantics.
Tests: Async Connection and Batching Validation
tests/test_async_connection.py, tests/test_connection.py
New comprehensive AsyncConnection test suite covering initialization, lifecycle, batching, parameterized rendering, and proxy behavior. New TestConnectionBatching suite validates gRPC parameters, run_queries_batch, parameterized batch/query, and QQLBatch context manager; includes architectural tests for batch grammar strictness, ID propagation, and concurrent collection creation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • pavanjava

Poem

🐰 Hark! Batch blocks now bloom in QQL's domain,
With BEGIN...END, async queries reign—
Async/sync twins dance with gRPC grace,
Parameterized queries find their place,
One 635-test suite shows the gain! 🚀

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.52% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add async execution and batched programmatic APIs' accurately and concisely summarizes the main additions (async execution, batching, and programmatic APIs) without being misleading or vague.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/reference.md`:
- Line 174: Add a language identifier to the opening fenced code block so the
markdown linter (MD040) passes: change the opening ``` to ```text (i.e., update
the fenced block that currently starts with ``` before the listing "qql/" to
begin with ```text) so the code block has an explicit language.

In `@src/qql/async_connection.py`:
- Line 193: The batch resolver currently pairs proxies and results with
zip(self._proxies, results) in async_connection.py (and the analogous loop in
connection.py), which silently drops unmatched proxies or results; update the
post-run handling in the async method that calls run_queries_batch(...) to
explicitly compare len(results) to len(self._proxies) and, if they differ, raise
a clear RuntimeError (or set exception on each unmatched Proxy via
Proxy.set_exception) instead of silently leaving proxies unresolved; for matched
entries continue to call proxy.set_result(res) as before, and mirror the same
length-check and unmatched-proxy exception-handling logic in the sync batch
context manager in connection.py so all proxies are either resolved or fail
deterministically.

In `@src/qql/async_executor.py`:
- Around line 293-301: The code currently calls build_dense_point_vector (and
similarly build sparse vectors/PointStruct.vector) using a topology captured
before calling self._ensure_collection, which can race with another coroutine
that creates the collection with different vector names; change the flow so that
you first call self._ensure_collection(...) (and any hybrid creation lock), then
refresh/re-resolve the topology from the collection metadata that was actually
created/returned, and only after that call build_dense_point_vector (and the
corresponding sparse/vector construction) using the refreshed topology; apply
the same pattern for the other occurrences that build vectors (the blocks using
build_dense_point_vector and setting PointStruct.vector at the other mentioned
sites).

In `@src/qql/parser.py`:
- Around line 73-76: The parser currently calls parse() ->
_parse_single_statement() and immediately expects TokenKind.EOF, rejecting
trailing SEMICOLON tokens; update parse() to allow and consume any number of
trailing TokenKind.SEMICOLON tokens after the node is parsed (e.g., loop while
next token is TokenKind.SEMICOLON call the existing token-consumption helper
such as self._expect/_advance/_consume), then call self._expect(TokenKind.EOF)
and return the node so top-level statements like "SHOW COLLECTIONS;" succeed.

In `@src/qql/utils.py`:
- Around line 80-93: The function render_parameterized_query currently
serializes parameters via str() and inadequate escaping, causing None to become
"None" and backslashes/control chars to be unescaped; update
render_parameterized_query to: map None to the QQL NULL literal, escape strings
for single quotes and backslashes and encode common control sequences (e.g., \n,
\t, \\) so they are preserved as literals, format booleans as "true"/"false" and
numbers via their canonical representation, and then replace placeholders (keep
the current placeholder logic and sorted key order) with these properly
serialized QQL literals.
- Around line 160-170: The batching logic currently treats every SearchStmt as a
plain "query" which strips group semantics; change the branch so a SearchStmt is
only added to the "query" batch when it has no grouping (i.e., stmt.group_by and
stmt.group_size are falsy). For SearchStmt instances with group_by/group_size,
flush the current group via _append_batch_group and start a new group that is
not the plain "query" path (e.g., set current_type to a distinct token like
"grouped_search" and current_group = [stmt]) so
_execute_query_batch/run_queries_batch won’t treat it as a flat QueryRequest;
keep the existing behavior for RecommendStmt and non-grouped SearchStmt.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 20d846a7-d3f5-4169-a380-a106ae17a1cb

📥 Commits

Reviewing files that changed from the base of the PR and between dbb6713 and 8a066e1.

📒 Files selected for processing (17)
  • README.md
  • docs/getting-started.md
  • docs/programmatic.md
  • docs/reference.md
  • docs/scripts.md
  • src/qql/__init__.py
  • src/qql/ast_nodes.py
  • src/qql/async_connection.py
  • src/qql/async_executor.py
  • src/qql/connection.py
  • src/qql/executor.py
  • src/qql/lexer.py
  • src/qql/parser.py
  • src/qql/script.py
  • src/qql/utils.py
  • tests/test_async_connection.py
  • tests/test_connection.py

Comment thread docs/reference.md

## Project Structure

```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language to fenced code block for lint compliance.

Line 174 opens a fenced block without a language, which trips MD040 and can fail docs linting pipelines.

Proposed fix
-```
+```text
 qql/
 ├── pyproject.toml          # Package config; installs the `qql` CLI command
 ...
-```
+```
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 174-174: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/reference.md` at line 174, Add a language identifier to the opening
fenced code block so the markdown linter (MD040) passes: change the opening ```
to ```text (i.e., update the fenced block that currently starts with ``` before
the listing "qql/" to begin with ```text) so the code block has an explicit
language.

if not self._queries:
return
results = await self.connection.run_queries_batch(self._queries)
for proxy, res in zip(self._proxies, results):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Inspect the relevant section in src/qql/async_connection.py
sed -n '150,230p' src/qql/async_connection.py | nl -ba | sed -n '1,120p'

# 2) Find where `_proxies` is populated/used and where `results` comes from
rg -n "_proxies|results" src/qql/async_connection.py

# 3) Check Python version / ruff config for zip strict support expectations
ls -la
fd -t f "pyproject.toml" . -x printf "%s\n" {}
fd -t f "*ruff*" . -x printf "%s\n" {}
# Show python requires and ruff settings if present
if [ -f pyproject.toml ]; then
  rg -n "requires-python|python|target-version|ruff" pyproject.toml || true
  sed -n '1,200p' pyproject.toml
fi

Repository: pavanjava/qql

Length of output: 100


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Inspect the relevant section in src/qql/async_connection.py
# Prefer cat -n (available) over nl
if [ ! -f src/qql/async_connection.py ]; then
  echo "Missing src/qql/async_connection.py"
  exit 1
fi

sed -n '170,220p' src/qql/async_connection.py | cat -n

# 2) Locate the exact zip usage and surrounding context (more precise)
rg -n "zip\(self\._proxies,\s*results\)" src/qql/async_connection.py
rg -n "zip\(" src/qql/async_connection.py

# 3) Find where `_proxies` is defined/populated and where `results` comes from
rg -n "_proxies" src/qql/async_connection.py
rg -n "results" src/qql/async_connection.py

# 4) Check Python version / ruff config for strict zip expectations
ls -la
fd -t f "pyproject.toml" . -x printf "%s\n" {} || true
fd -t f "*ruff*" . -x printf "%s\n" {} || true

if [ -f pyproject.toml ]; then
  echo "---- pyproject.toml (head) ----"
  sed -n '1,220p' pyproject.toml | cat -n
  echo "---- ruff/python target hints ----"
  rg -n "requires-python|target-version|python-version|ruff" pyproject.toml || true
fi

Repository: pavanjava/qql

Length of output: 6113


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find AsyncConnection and its run_queries_batch implementation
rg -n "class AsyncConnection|def run_queries_batch" -S src

# Show the implementation of run_queries_batch (and any helpers it calls)
rg -n "run_queries_batch" -S src

# Inspect AsyncConnection around the discovered definition
# (Use file/line reporting first, then print a local window.)
matches=$(rg -n "def run_queries_batch" -S src | head -n 5)
echo "$matches"
file=$(echo "$matches" | awk -F: '{print $1}' | head -n 1)
line=$(echo "$matches" | awk -F: '{print $2}' | head -n 1)
echo "Printing $file around line $line"
sed -n "$((line-40)),$((line+120))p" "$file" | cat -n

# Check ruff config for B905 (zip without strict)
if [ -f ruff.toml ]; then
  rg -n "B905|zip|strict" ruff.toml pyproject.toml || true
  sed -n '1,200p' ruff.toml | cat -n
fi

Repository: pavanjava/qql

Length of output: 9098


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate BatchBlockStmt definition and executor handling
rg -n "BatchBlockStmt" -S src/qql

# Inspect AST node definition
files=$(rg -l "class BatchBlockStmt" -S src/qql)
echo "$files"
for f in $files; do
  echo "---- $f ----"
  # print small window around class
  line=$(rg -n "class BatchBlockStmt" "$f" | head -n 1 | awk -F: '{print $2}')
  start=$((line-30))
  end=$((line+120))
  sed -n "${start},${end}p" "$f" | cat -n
done

# Inspect Executor.execute branch/handling for BatchBlockStmt
executor_files=$(rg -l "def execute\(" -S src/qql)
echo "Executor-related files:"
echo "$executor_files" | head -n 20

# Search specifically for BatchBlockStmt inside executor implementation
for f in $executor_files; do
  if rg -q "BatchBlockStmt" "$f"; then
    echo "---- executor handling in $f ----"
    # show relevant region
    rg -n "BatchBlockStmt" "$f"
    # print around first occurrence
    occ_line=$(rg -n "BatchBlockStmt" "$f" | head -n 1 | awk -F: '{print $2}')
    sed -n "$((occ_line-80)),$((occ_line+160))p" "$f" | cat -n
  fi
done

# Also inspect ExecutionResult.data definition / type hints
res_files=$(rg -l "class ExecutionResult|ExecutionResult" -S src/qql)
echo "---- ExecutionResult files ----"
echo "$res_files" | head -n 20
for f in $(echo "$res_files" | head -n 5); do
  echo "---- $f (head) ----"
  sed -n '1,220p' "$f" | cat -n
done

Repository: pavanjava/qql

Length of output: 1243


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Inspect BatchBlockStmt
sed -n '320,430p' src/qql/ast_nodes.py | cat -n

# 2) Inspect sync executor batch execution
sed -n '1600,1760p' src/qql/executor.py | cat -n
rg -n "def _execute_batch_block" src/qql/executor.py
rg -n "BatchBlockStmt" src/qql/executor.py | head -n 20

# 3) Inspect async executor batch execution
sed -n '1160,1260p' src/qql/async_executor.py | cat -n
rg -n "def _execute_batch_block" src/qql/async_executor.py
rg -n "BatchBlockStmt" src/qql/async_executor.py | head -n 20

# 4) Inspect ExecutionResult structure (especially `.data`)
rg -n "class ExecutionResult" -S src/qql
res_file=$(rg -l "class ExecutionResult" -S src/qql | head -n 1 || true)
echo "ExecutionResult file: ${res_file:-<none>}"
if [ -n "${res_file:-}" ]; then
  sed -n '1,220p' "$res_file" | cat -n
fi

Repository: pavanjava/qql

Length of output: 23074


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Find group_batch_statements + inserted_point_results in utils
rg -n "def group_batch_statements|def inserted_point_results" -S src/qql

# 2) Print those helper implementations
utils_file=$(rg -l "def group_batch_statements" -S src/qql | head -n 1)
echo "utils_file=$utils_file"
sed -n '1,260p' "$utils_file" | cat -n

# narrower windows around the functions (avoid huge output)
gline=$(rg -n "def group_batch_statements" "$utils_file" | head -n 1 | awk -F: '{print $2}')
iline=$(rg -n "def inserted_point_results" "$utils_file" | head -n 1 | awk -F: '{print $2}')
sed -n "$((gline-80)),$((gline+200))p" "$utils_file" | cat -n
sed -n "$((iline-80)),$((iline+240))p" "$utils_file" | cat -n

# 3) Inspect _execute_query_batch return shape
rg -n "def _execute_query_batch" -S src/qql/executor.py src/qql/async_executor.py

# Print _execute_query_batch in sync executor enough to see how it builds its result list
sync_file="src/qql/executor.py"
qline=$(rg -n "def _execute_query_batch" "$sync_file" | head -n 1 | awk -F: '{print $2}')
sed -n "$((qline-40)),$((qline+240))p" "$sync_file" | cat -n

# 4) Inspect async executor _execute_query_batch similarly (shorter window)
async_file="src/qql/async_executor.py"
aline=$(rg -n "async def _execute_query_batch" "$async_file" | head -n 1 | awk -F: '{print $2}')
sed -n "$((aline-40)),$((aline+260))p" "$async_file" | cat -n

Repository: pavanjava/qql

Length of output: 10007


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Sync: inspect full _execute_query_batch body boundaries and key return points
sync_file="src/qql/executor.py"
sync_start=$(rg -n "def _execute_query_batch" "$sync_file" | head -n 1 | cut -d: -f1)
sync_next=$(awk -v start="$sync_start" 'NR>start && /^    def / {print NR; exit}' "$sync_file" || true)
echo "sync_start=$sync_start sync_next=${sync_next:-<none>}"

# Print from start through (start..start+260) and also around first return/all_results usage
sed -n "${sync_start},$((sync_start+320))p" "$sync_file" | cat -n | sed -n '1,220p'
rg -n "all_results|return .*res|return .*results|return .*ExecutionResult" -S "$sync_file" | head -n 50

# Async: inspect full _execute_query_batch boundaries
async_file="src/qql/async_executor.py"
async_start=$(rg -n "async def _execute_query_batch" "$async_file" | head -n 1 | cut -d: -f1)
async_next=$(awk -v start="$async_start" 'NR>start && /^    async def / {print NR; exit}' "$async_file" || true)
echo "async_start=$async_start async_next=${async_next:-<none>}"

sed -n "${async_start},$((async_start+320))p" "$async_file" | cat -n | sed -n '1,220p'
rg -n "all_results|return .*res|return .*results|return .*ExecutionResult" -S "$async_file" | head -n 50

Repository: pavanjava/qql

Length of output: 23181


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check ruff B905 config (whether it's selected and whether any ignores exist)
sed -n '1,120p' ruff.toml | cat -n
rg -n "B905" ruff.toml pyproject.toml || true

# Find other zip usages to see if strict zip is commonly used elsewhere
rg -n "zip\(" src/qql | head -n 50

# Specifically find other proxy resolution zips
rg -n "zip\(self\._proxies" -S src/qql

Repository: pavanjava/qql

Length of output: 1799


Guard against silent proxy/result mismatch in batch resolution.

src/qql/async_connection.py resolves proxies with zip(self._proxies, results) without length checking; if run_queries_batch(...) yields fewer results than queued statements, trailing proxies never get resolved (their .result will later raise). The query-batch execution path also doesn’t validate response length vs requests.

Also present in src/qql/connection.py for the sync batch context manager.

Proposed fix
-        for proxy, res in zip(self._proxies, results):
+        for proxy, res in zip(self._proxies, results, strict=True):
             proxy._resolve(res)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for proxy, res in zip(self._proxies, results):
for proxy, res in zip(self._proxies, results, strict=True):
proxy._resolve(res)
🧰 Tools
🪛 Ruff (0.15.13)

[warning] 193-193: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/qql/async_connection.py` at line 193, The batch resolver currently pairs
proxies and results with zip(self._proxies, results) in async_connection.py (and
the analogous loop in connection.py), which silently drops unmatched proxies or
results; update the post-run handling in the async method that calls
run_queries_batch(...) to explicitly compare len(results) to len(self._proxies)
and, if they differ, raise a clear RuntimeError (or set exception on each
unmatched Proxy via Proxy.set_exception) instead of silently leaving proxies
unresolved; for matched entries continue to call proxy.set_result(res) as
before, and mirror the same length-check and unmatched-proxy exception-handling
logic in the sync batch context manager in connection.py so all proxies are
either resolved or fail deterministically.

Comment thread src/qql/async_executor.py
Comment on lines +293 to +301
await self._ensure_collection(
node.collection, len(vector), topology, node.dense_vector
)
point_vector = build_dense_point_vector(
topology,
vector,
node.dense_vector,
self._default_dense_vector_name(),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Re-resolve topology before shaping vectors after the create race.

These branches build point_vector / PointStruct.vector from the pre-lock topology, then _ensure_collection() or the hybrid creation lock can discover that another coroutine already created the collection with a different single named vector. In that race, the upsert still uses the stale vector key and will write to the wrong name or fail.

A safer pattern is: resolve or create the collection first, refresh the topology that actually won, and only then build the point vectors from the resolved dense/sparse names.

Also applies to: 355-367, 370-386, 413-418, 423-425

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/qql/async_executor.py` around lines 293 - 301, The code currently calls
build_dense_point_vector (and similarly build sparse vectors/PointStruct.vector)
using a topology captured before calling self._ensure_collection, which can race
with another coroutine that creates the collection with different vector names;
change the flow so that you first call self._ensure_collection(...) (and any
hybrid creation lock), then refresh/re-resolve the topology from the collection
metadata that was actually created/returned, and only after that call
build_dense_point_vector (and the corresponding sparse/vector construction)
using the refreshed topology; apply the same pattern for the other occurrences
that build vectors (the blocks using build_dense_point_vector and setting
PointStruct.vector at the other mentioned sites).

Comment thread src/qql/parser.py
Comment on lines 73 to +76
def parse(self) -> ASTNode:
node = self._parse_single_statement()
self._expect(TokenKind.EOF)
return node
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Allow trailing semicolons after top-level statements.

SEMICOLON is now tokenized, but parse() still requires EOF immediately after the first statement. That means SHOW COLLECTIONS; and script chunks ending with ; still fail, so semicolons only work inside BEGIN BATCH.

Suggested fix
 def parse(self) -> ASTNode:
     node = self._parse_single_statement()
+    while self._peek().kind == TokenKind.SEMICOLON:
+        self._advance()
     self._expect(TokenKind.EOF)
     return node
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/qql/parser.py` around lines 73 - 76, The parser currently calls parse()
-> _parse_single_statement() and immediately expects TokenKind.EOF, rejecting
trailing SEMICOLON tokens; update parse() to allow and consume any number of
trailing TokenKind.SEMICOLON tokens after the node is parsed (e.g., loop while
next token is TokenKind.SEMICOLON call the existing token-consumption helper
such as self._expect/_advance/_consume), then call self._expect(TokenKind.EOF)
and return the node so top-level statements like "SHOW COLLECTIONS;" succeed.

Comment thread src/qql/utils.py
Comment on lines +80 to +93
def render_parameterized_query(template: str, params: dict[str, Any]) -> str:
query_str = template
for k in sorted(params.keys(), key=len, reverse=True):
val = params[k]
placeholder = f":{k}"
if isinstance(val, str):
escaped_val = val.replace("'", "\\'")
repr_val = f"'{escaped_val}'"
elif isinstance(val, bool):
repr_val = "true" if val else "false"
else:
repr_val = str(val)
query_str = query_str.replace(placeholder, repr_val)
return query_str
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Render real QQL literals here instead of raw str() output.

This helper currently mangles valid parameter values: None becomes None instead of NULL, and string backslashes are not escaped, so values containing \n, \t, or \\ are reinterpreted by the lexer. Parameterized queries can therefore change user input or fail to parse.

Suggested direction
+def _render_qql_literal(val: Any) -> str:
+    if val is None:
+        return "NULL"
+    if isinstance(val, str):
+        escaped = val.replace("\\", "\\\\").replace("'", "\\'")
+        return f"'{escaped}'"
+    if isinstance(val, bool):
+        return "true" if val else "false"
+    if isinstance(val, list):
+        return "[" + ", ".join(_render_qql_literal(v) for v in val) + "]"
+    if isinstance(val, dict):
+        return (
+            "{"
+            + ", ".join(f"'{k}': {_render_qql_literal(v)}" for k, v in val.items())
+            + "}"
+        )
+    return str(val)
+
 def render_parameterized_query(template: str, params: dict[str, Any]) -> str:
     query_str = template
     for k in sorted(params.keys(), key=len, reverse=True):
-        val = params[k]
         placeholder = f":{k}"
-        if isinstance(val, str):
-            escaped_val = val.replace("'", "\\'")
-            repr_val = f"'{escaped_val}'"
-        elif isinstance(val, bool):
-            repr_val = "true" if val else "false"
-        else:
-            repr_val = str(val)
-        query_str = query_str.replace(placeholder, repr_val)
+        query_str = query_str.replace(placeholder, _render_qql_literal(params[k]))
     return query_str
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/qql/utils.py` around lines 80 - 93, The function
render_parameterized_query currently serializes parameters via str() and
inadequate escaping, causing None to become "None" and backslashes/control chars
to be unescaped; update render_parameterized_query to: map None to the QQL NULL
literal, escape strings for single quotes and backslashes and encode common
control sequences (e.g., \n, \t, \\) so they are preserved as literals, format
booleans as "true"/"false" and numbers via their canonical representation, and
then replace placeholders (keep the current placeholder logic and sorted key
order) with these properly serialized QQL literals.

Comment thread src/qql/utils.py
Comment on lines +160 to +170
for stmt in statements:
if isinstance(stmt, (SearchStmt, RecommendStmt)):
coll = stmt.collection
if current_type == "query" and current_collection == coll:
current_group.append(stmt)
continue
_append_batch_group(groups, current_type, current_collection, current_group)
current_type = "query"
current_collection = coll
current_group = [stmt]
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Exclude SEARCH ... GROUP BY ... from the flat query batching path.

This batches every SearchStmt into a "query" group, but both _execute_query_batch() implementations only build plain QueryRequests and read response.points. A grouped search run via BEGIN BATCH or run_queries_batch() will therefore lose group_by/group_size semantics and return a flat result list instead.

Suggested fix
-        if isinstance(stmt, (SearchStmt, RecommendStmt)):
+        if isinstance(stmt, RecommendStmt) or (
+            isinstance(stmt, SearchStmt) and stmt.group_by is None
+        ):
             coll = stmt.collection
             if current_type == "query" and current_collection == coll:
                 current_group.append(stmt)
                 continue
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/qql/utils.py` around lines 160 - 170, The batching logic currently treats
every SearchStmt as a plain "query" which strips group semantics; change the
branch so a SearchStmt is only added to the "query" batch when it has no
grouping (i.e., stmt.group_by and stmt.group_size are falsy). For SearchStmt
instances with group_by/group_size, flush the current group via
_append_batch_group and start a new group that is not the plain "query" path
(e.g., set current_type to a distinct token like "grouped_search" and
current_group = [stmt]) so _execute_query_batch/run_queries_batch won’t treat it
as a flat QueryRequest; keep the existing behavior for RecommendStmt and
non-grouped SearchStmt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant