pavanjava · srimon12 · May 24, 2026 · coderabbitai · May 24, 2026
diff --git a/README.md b/README.md
@@ -5,9 +5,9 @@
 [![PyPI version](https://img.shields.io/pypi/v/qql-cli?color=blue&label=PyPI)](https://pypi.org/project/qql-cli/)
 [![Python 3.12+](https://img.shields.io/pypi/pyversions/qql-cli)](https://pypi.org/project/qql-cli/)
 [![MIT License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
-[![Tests](https://img.shields.io/badge/tests-549%20passing-brightgreen)](tests/)
+[![Tests](https://img.shields.io/badge/tests-635%20passing-brightgreen)](tests/)
 
-Write `INSERT`, `SELECT`, `SEARCH`, `SCROLL`, `RECOMMEND`, `UPDATE`, `DELETE`, and `CREATE COLLECTION` statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, grouped search (GROUP BY), cross-encoder reranking, quantization (scalar, turbo, binary, product), SQL-style `WHERE` filters, script execution, and collection dump/restore.
+Write `INSERT`, `SELECT`, `SEARCH`, `SCROLL`, `RECOMMEND`, `UPDATE`, `DELETE`, and `CREATE COLLECTION` statements instead of Python SDK calls. Supports hybrid dense+sparse vector search, grouped search (GROUP BY), cross-encoder reranking, quantization (scalar, turbo, binary, product), SQL-style `WHERE` filters, script execution, collection dump/restore, async execution, gRPC transport, parameterized queries, and batched query execution.
 
 ```
 qql> INSERT INTO COLLECTION notes VALUES {'text': 'Qdrant is a vector database', 'author': 'alice', 'year': 2024}
@@ -50,16 +50,23 @@ Your query string
 
 When you run `INSERT`, the `text` field is automatically converted into a dense vector using [Fastembed](https://github.com/qdrant/fastembed). In **hybrid mode** (`USING HYBRID`), a sparse BM25 vector is also generated alongside the dense vector, and searches use Qdrant's Reciprocal Rank Fusion (RRF) by default to merge the results of both retrieval methods. You can switch hybrid search to DBSF with `FUSION 'dbsf'`.
 
-QQL also exposes a **programmatic API** for use inside Python applications — no CLI required:
+QQL also exposes a **programmatic API** for use inside Python applications — no CLI required. Use `Connection` for sync code, `AsyncConnection` for async apps, and batch helpers when you want QQL to combine compatible operations into fewer Qdrant requests:
 
 ```python
-from qql import Connection
+from qql import Connection, QQLBatch
 
 with Connection("http://localhost:6333") as conn:
     conn.run_query("INSERT INTO COLLECTION notes VALUES {'text': 'Qdrant is fast'}")
-    result = conn.run_query("SEARCH notes SIMILAR TO 'vector database' LIMIT 5")
-    for hit in result.data:
-        print(hit["score"], hit["payload"])
+    result = conn.run_parameterized_query(
+        "SEARCH notes SIMILAR TO :query LIMIT 5",
+        {"query": "vector database"},
+    )
+
+    with QQLBatch(conn) as batch:
+        neurology = batch.add("SEARCH notes SIMILAR TO 'neurology' LIMIT 5")
+        cardiology = batch.add("SEARCH notes SIMILAR TO 'cardiology' LIMIT 5")
+
+    print(neurology.result.data, cardiology.result.data)
 ```
 
 ---
@@ -97,8 +104,8 @@ Full documentation lives in the [`docs/`](docs/) folder and at **[pavanjava.gith
 | [SEARCH / SELECT / SCROLL / RECOMMEND / Hybrid / GROUP BY / RERANK](docs/search.md) | Semantic search, grouped search, point retrieval, pagination, hybrid, reranking, recommendations |
 | [WHERE Filters](docs/filters.md) | Full SQL-style filter operators |
 | [Collections & Quantization](docs/collections.md) | SHOW, CREATE, DROP, QUANTIZE (scalar/turbo/binary/product), CREATE INDEX, UPDATE VECTOR, UPDATE PAYLOAD |
-| [Scripts: EXECUTE / DUMP](docs/scripts.md) | Script files, collection backup/restore |
-| [Programmatic Usage](docs/programmatic.md) | Use QQL as a Python library via `Connection` or `run_query()` |
+| [Scripts: EXECUTE / DUMP](docs/scripts.md) | Script files, `BEGIN BATCH` blocks, collection backup/restore |
+| [Programmatic Usage](docs/programmatic.md) | Sync/async Python APIs, parameterized queries, batching, gRPC |
 | [Reference: Models / Config / Errors](docs/reference.md) | Embedding models, config file, error reference |
 
 ---
@@ -170,6 +177,12 @@ DELETE FROM articles WHERE year < 2020
 -- Scripts
 EXECUTE /path/to/script.qql
 DUMP articles /path/to/backup.qql
+
+-- Batch block
+BEGIN BATCH;
+  SEARCH articles SIMILAR TO 'query one' LIMIT 5;
+  SEARCH articles SIMILAR TO 'query two' LIMIT 5;
+END BATCH
 ```
 
 ---
@@ -182,7 +195,7 @@ Tests do not require a running Qdrant instance — the Qdrant client is mocked.
 pytest tests/ -v
 ```
 
-Expected: **549 tests passing**.
+Expected: **635 tests passing**.
 
 ---
 

diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -5,7 +5,7 @@ title: "Getting Started"
 
 # Getting Started with QQL
 
-QQL is a SQL-like query language and CLI for [Qdrant](https://qdrant.tech). Instead of writing Python SDK calls you write natural query statements to insert, search, manage, and delete vector data.
+QQL is a SQL-like query language and CLI for [Qdrant](https://qdrant.tech). Instead of writing Python SDK calls you write natural query statements to insert, search, manage, and delete vector data. It can also be used as a sync or async Python library with batching, parameterized queries, and optional gRPC transport.
 
 ---
 
@@ -154,6 +154,12 @@ SHOW COLLECTION notes
 
 -- Retrieve a point by ID
 SELECT * FROM notes WHERE id = 1
+
+-- Run compatible queries as one batch
+BEGIN BATCH;
+  SEARCH notes SIMILAR TO 'vector databases' LIMIT 5;
+  SEARCH notes SIMILAR TO 'semantic search' LIMIT 5;
+END BATCH
 ```
 
 ---
@@ -164,5 +170,6 @@ SELECT * FROM notes WHERE id = 1
 - [SEARCH / SELECT / SCROLL / RECOMMEND / Hybrid / RERANK](search.md) — querying
 - [WHERE Filters](filters.md) — payload filtering
 - [Collections & Quantization](collections.md) — managing collections
-- [Scripts: EXECUTE / DUMP](scripts.md) — automating with script files
+- [Scripts: EXECUTE / DUMP](scripts.md) — automating with script files and batch blocks
+- [Programmatic Usage](programmatic.md) — sync/async APIs, batching, parameterized queries, gRPC
 - [Embedding Models](reference.md#embedding-models) — model reference
diff --git a/docs/programmatic.md b/docs/programmatic.md
@@ -16,6 +16,8 @@ single connection to Qdrant once and reuses it for every `run_query()` call —
 more efficient than the legacy `run_query()` function, which creates a new
 client on every invocation.
 
+Use `AsyncConnection` when your application already runs on `asyncio`.
+
 ### Basic usage
 
 ```python
@@ -70,6 +72,22 @@ with Connection("https://<your-cluster>.qdrant.io", secret="<your-api-key>") as
     print(result.data)
 ```
 
+### gRPC transport
+
+QQL can ask the Qdrant client to prefer gRPC for lower request overhead:
+
+```python
+from qql import Connection
+
+with Connection(
+    "http://localhost:6333",
+    prefer_grpc=True,
+    grpc_port=6334,
+) as conn:
+    result = conn.run_query("SHOW COLLECTIONS")
+    print(result.data)
+```
+
 ### Custom embedding model
 
 ```python
@@ -155,9 +173,117 @@ with Connection("http://localhost:6333") as conn:
 | `url` | `str` | `"http://localhost:6333"` | Qdrant instance URL |
 | `secret` | `str \| None` | `None` | API key; `None` for unauthenticated |
 | `default_model` | `str \| None` | `None` → `sentence-transformers/all-MiniLM-L6-v2` | Dense embedding model used when no `USING MODEL` clause is given |
+| `prefer_grpc` | `bool` | `False` | Passes `prefer_grpc=True` to the Qdrant client |
+| `grpc_port` | `int` | `6334` | gRPC port used when `prefer_grpc=True` |
 | `default_dense_vector_name` | `str` | `"dense"` | Dense vector name used when QQL creates a collection and no explicit `USING VECTOR` name is given |
 | `default_sparse_vector_name` | `str` | `"sparse"` | Sparse vector name used when QQL creates a hybrid collection and no explicit sparse vector name is given |
 
+---
+
+## Parameterized Queries
+
+Parameterized helpers render `:name` placeholders before parsing the QQL statement. String values are quoted and escaped; booleans are rendered as `true` / `false`.
+
+```python
+from qql import Connection
+
+with Connection("http://localhost:6333") as conn:
+    result = conn.run_parameterized_query(
+        "SEARCH notes SIMILAR TO :query LIMIT 5 WHERE author = :author",
+        {"query": "vector database", "author": "alice"},
+    )
+
+    results = conn.run_parameterized_batch(
+        "SEARCH notes SIMILAR TO :query LIMIT 5 WHERE category = :category",
+        [
+            {"query": "brain stroke", "category": "Neurology"},
+            {"query": "heart attack", "category": "Cardiology"},
+        ],
+    )
+```
+
+Parameterized queries are a convenience for building QQL strings safely in application code; they are not sent to Qdrant as server-side prepared statements.
+
+---
+
+## Batch Execution
+
+`run_queries_batch()` parses multiple QQL strings into a `BatchBlockStmt`. The executor groups compatible statements:
+
+- compatible `SEARCH` / `RECOMMEND` statements use Qdrant `query_batch_points`
+- compatible `INSERT` statements become one `INSERT BULK`
+- mixed or incompatible statements still execute in order
+
+```python
+from qql import Connection
+
+with Connection("http://localhost:6333") as conn:
+    results = conn.run_queries_batch([
+        "SEARCH docs SIMILAR TO 'neurology' LIMIT 5",
+        "SEARCH docs SIMILAR TO 'cardiology' LIMIT 5",
+    ])
+
+    for result in results:
+        print(result.message)
+```
+
+For ergonomic batching in application code, use `QQLBatch`:
+
+```python
+from qql import Connection, QQLBatch
+
+with Connection("http://localhost:6333") as conn:
+    with QQLBatch(conn) as batch:
+        neuro = batch.add("SEARCH docs SIMILAR TO 'neurology' LIMIT 5")
+        cardio = batch.add("SEARCH docs SIMILAR TO 'cardiology' LIMIT 5")
+
+    print(neuro.result.data)
+    print(cardio.result.data)
+```
+
+Each proxy's `.result` becomes available after the context manager exits.
+
+---
+
+## Async API
+
+`AsyncConnection` mirrors the sync API for `asyncio` applications and uses `AsyncQdrantClient` under the hood.
+
+```python
+from qql import AsyncConnection
+
+async with AsyncConnection("http://localhost:6333") as conn:
+    await conn.run_query(
+        "INSERT INTO COLLECTION notes VALUES {'text': 'async QQL'}"
+    )
+    result = await conn.run_query(
+        "SEARCH notes SIMILAR TO 'async vector search' LIMIT 5"
+    )
+    print(result.data)
+```
+
+Async batching and parameterized helpers are also available:
+
+```python
+from qql import AsyncConnection, QQLAsyncBatch
+
+async with AsyncConnection("http://localhost:6333", prefer_grpc=True) as conn:
+    result = await conn.run_parameterized_query(
+        "SEARCH docs SIMILAR TO :query LIMIT 5",
+        {"query": "clinical notes"},
+    )
+
+    async with QQLAsyncBatch(conn) as batch:
+        first = batch.add("SEARCH docs SIMILAR TO 'neurology' LIMIT 5")
+        second = batch.add("SEARCH docs SIMILAR TO 'cardiology' LIMIT 5")
+
+    print(first.result.data, second.result.data)
+```
+
+The async executor preserves the same `ExecutionResult` shape as the sync executor.
+
+---
+
 ### Power-user: `executor` property
 
 For low-level access to the pipeline, use `conn.executor` directly:
@@ -250,7 +376,8 @@ class ExecutionResult:
 |---|---|
 | INSERT (dense) | `{"id": int \| "<uuid>", "collection": "<name>"}` |
 | INSERT (hybrid) | `{"id": int \| "<uuid>", "collection": "<name>"}` |
-| INSERT BULK | `None` (count in `result.message`) |
+| INSERT BULK | `{"ids": [int \| "<uuid>", ...]}` |
+| BEGIN BATCH / programmatic batch | `[ExecutionResult, ...]` |
 | SELECT | `{"id": str, "payload": dict}` or `None` when not found |
 | SEARCH | `[{"id": str, "score": float, "payload": dict}, ...]` |
 | SCROLL | `{"points": [{"id": str, "payload": dict}, ...], "next_offset": str \| int \| None}` |

diff --git a/docs/reference.md b/docs/reference.md
@@ -5,7 +5,7 @@ title: "Reference"
 
 # Reference — Models, Config, Project Structure, Errors
 
-Default embedding models, configuration parameters, project layout, and common error codes for troubleshooting.
+Default embedding models, configuration parameters, public APIs, project layout, and common error codes for troubleshooting.
 
 ---
 
@@ -147,30 +147,56 @@ You can edit this file directly to change the default model without reconnecting
 
 ---
 
+## Public Python API
+
+| API | Description |
+|---|---|
+| `Connection` | Stateful sync QQL client backed by `QdrantClient` |
+| `AsyncConnection` | Stateful async QQL client backed by `AsyncQdrantClient` |
+| `QQLBatch` | Sync context manager for collecting statements and resolving per-statement results after execution |
+| `QQLAsyncBatch` | Async context manager equivalent of `QQLBatch` |
+| `Executor` | Low-level sync AST executor |
+| `AsyncExecutor` | Low-level async AST executor |
+| `ExecutionResult` | Standard result object returned by all operations |
+
+Both sync and async connections support:
+
+- `run_query(query)`
+- `run_queries_batch([query, ...])`
+- `run_parameterized_query(template, params)`
+- `run_parameterized_batch(template, [params, ...])`
+- `prefer_grpc=True` and `grpc_port=<port>` connection options
+
+---
+
 ## Project Structure
 
 ```
-```
-```
 qql/
 ├── pyproject.toml          # Package config; installs the `qql` CLI command
 ├── src/
 │   └── qql/
-│       ├── __init__.py     # Public API: Connection, run_query()
+│       ├── __init__.py     # Public API exports: sync, async, batching, parser/executor
 │       ├── cli.py          # CLI entry point: connect, disconnect, execute, dump, REPL
 │       ├── config.py       # QQLConfig dataclass + ~/.qql/config.json I/O
-│       ├── connection.py   # Connection class — stateful programmatic API
+│       ├── connection.py   # Sync Connection, QQLBatch, parameterized query helpers
+│       ├── async_connection.py # AsyncConnection and QQLAsyncBatch
 │       ├── exceptions.py   # QQLError, QQLSyntaxError, QQLRuntimeError
 │       ├── lexer.py        # Tokenizer: string → List[Token]
 │       ├── ast_nodes.py    # Frozen dataclasses for each statement and filter type
 │       ├── parser.py       # Recursive descent parser: tokens → AST node
 │       ├── embedder.py     # Embedder (dense) + SparseEmbedder (BM25) + CrossEncoderEmbedder (rerank)
-│       ├── executor.py     # AST node → Qdrant client call + filter + hybrid search
+│       ├── executor.py     # Sync AST node → Qdrant client call
+│       ├── async_executor.py # Async AST node → AsyncQdrantClient call
+│       ├── utils.py        # Shared pure helpers for parsing, filters, batching, vectors
 │       ├── script.py       # Script runner: parse and execute .qql files statement by statement
 │       └── dumper.py       # Collection exporter: scroll all points → .qql INSERT BULK script
 └── tests/
     ├── test_lexer.py       # Tokenizer unit tests
     ├── test_parser.py      # Parser unit tests
     ├── test_executor.py    # Executor unit tests (mocked Qdrant client)
     ├── test_connection.py  # Connection class unit tests (mocked Qdrant client)
+    ├── test_async_connection.py # AsyncConnection / AsyncExecutor tests
     ├── test_script.py      # Script runner unit tests
     └── test_dumper.py      # Dumper unit tests
 ```
@@ -185,7 +211,7 @@ Tests do not require a running Qdrant instance — the Qdrant client is mocked.
 pytest tests/ -v
 ```
 
-Expected output: **604 tests passing**.
+Expected output: **635 tests passing**.
 
 ---
 
@@ -218,3 +244,6 @@ Expected output: **604 tests passing**.
 | `Unknown index type '...'` | Invalid schema type in CREATE INDEX | Use one of: `keyword`, `integer`, `float`, `bool`, `text`, `geo`, `datetime`, `uuid` |
 | `Unknown CREATE INDEX option '...'` | Unsupported advanced option for the chosen payload index type | Check which `WITH { ... }` keys are supported for `keyword`, `uuid`, or `text` |
 | `Qdrant error during CREATE INDEX: ...` | Qdrant rejected the index creation | Check field name and collection state |
+| `Unterminated batch block; expected END BATCH` | A `BEGIN BATCH` block was not closed | Add `END BATCH` at the end of the block |
+| `Batch has not been executed yet.` | Read a `QQLBatch` proxy result before leaving the context manager | Access `.result` only after the `with QQLBatch(...)` block exits |
+| `AsyncBatch has not been executed yet.` | Read a `QQLAsyncBatch` proxy result before leaving the async context manager | Access `.result` only after the `async with QQLAsyncBatch(...)` block exits |