[Tracking] Embedding Generation in the Python SDK (V0)
Summary
V0 plan (parity with .NET tracking issue Azure/azure-cosmos-dotnet-v3#5830) for adding automatic embedding generation for hybrid / vector queries in the Python Cosmos SDK (azure-cosmos).
Customers running queries like:
SELECT TOP 10 *
FROM c
ORDER BY VectorDistance(c.text, 'big brown cat')
today must compute the embedding for 'big brown cat' themselves before issuing the SDK call. After this change, the SDK will:
- Send the raw user query to the Gateway for a query plan.
- Gateway returns a plan with the rewritten query and a new
embeddingParameterMap. The rewritten query becomes:
SELECT TOP 10 *
FROM c
ORDER BY VectorDistance(c.embedding, @documentdb-hybridsearchquery-embedding-0)
…and embeddingParameterMap carries @documentdb-hybridsearchquery-embedding-0 → 'big brown cat'.
- SDK extracts every entry from the map and calls a customer-supplied
EmbeddingGenerator (in practice the EGS team's implementation, which talks to Microsoft Azure Foundry models) once with the full batch of texts.
- SDK injects the returned vectors as ordinary
parameters on the rewritten query and proceeds to per-partition execution.
- Diagnostics expose the embedding-generation step as a first-class OpenTelemetry span.
Customer-facing API (V0)
from typing import Protocol, Sequence
class EmbeddingGenerator(Protocol):
def generate_embeddings(self, texts: Sequence[str]) -> Sequence[Sequence[float]]: ...
class AsyncEmbeddingGenerator(Protocol):
async def generate_embeddings_async(self, texts: Sequence[str]) -> Sequence[Sequence[float]]: ...
container.query_items(
query="SELECT TOP 10 * FROM c ORDER BY VectorDistance(c.text, 'big brown cat')",
embedding_generator=my_generator, # NEW – sync container
enable_cross_partition_query=True,
)
# Async equivalent (aio):
await async_container.query_items(
query=...,
embedding_generator=my_async_generator, # NEW – async container, AsyncEmbeddingGenerator
enable_cross_partition_query=True,
)
Wire / DTO additions
embeddingParameterMap field on the query-plan response (under both queryInfo and hybridSearchQueryInfo).
- New supported-feature token
EmbeddingGeneration advertised in the supportedQueryFeatures header only when embedding_generator is set on the request.
Notes on Python specifics
- Hard dependency on the Gateway change. Unlike .NET (which has a Windows-x64
serviceinterop DLL fast path for query plans), Python's SDK only obtains query plans through the gateway. So Python's V0 work cannot be exercised end-to-end until the gateway change is live. SDK changes themselves can land behind the new feature flag without breaking existing customers.
- Sync + async dual paths. The SDK has parallel sync (
azure.cosmos.container) and async (azure.cosmos.aio._container) surfaces, with parallel _cosmos_client_connection.py / aio/_cosmos_client_connection_async.py and hybrid_search_aggregator.py / aio/hybrid_search_aggregator.py. Each sub-issue below covers both unless stated.
Sub-issues
Out of scope (V0)
- Caching of
(text → embedding) across queries inside the SDK (customer's generator can do that).
- Streaming / chunked embeddings (single batched call per query attempt).
- A built-in Azure Foundry implementation of
EmbeddingGenerator — that ships in the EGS package.
- Java SDK changes (parity item, tracked in its own repo).
Open items / risks
- Confirm wire shape of
embeddingParameterMap (object vs array of {key,value}) with the GW team.
- Confirm the literal string for the new supported-feature token (Python advertises by name, must match exactly).
- Decide: should
EmbeddingGenerator ship under preview _only first, or behind the existing _QueryFeature flag (which already provides per-request opt-in)?
- Cancellation / timeout: protocols don't take a token; document that customers must honour
asyncio.timeout(...) themselves.
- Telemetry policy: confirm count + latency + generator type are OK to log; raw text / vectors explicitly excluded.
Spec / references
- .NET tracking issue: Azure/azure-cosmos-dotnet-v3#5830 (this work is the Python sibling)
- Python files touched (V0):
sdk/cosmos/azure-cosmos/azure/cosmos/documents.py – _QueryFeature token
sdk/cosmos/azure-cosmos/azure/cosmos/_cosmos_client_connection.py – _GetQueryPlanThroughGateway (sync)
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client_connection_async.py – _GetQueryPlanThroughGateway (async)
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/query_execution_info.py – plan accessor
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/hybrid_search_aggregator.py – sync aggregator
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/aio/hybrid_search_aggregator.py – async aggregator
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/execution_dispatcher.py (+ async sibling) – option threading
sdk/cosmos/azure-cosmos/azure/cosmos/container.py – query_items keyword
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_container.py – async query_items keyword
sdk/cosmos/azure-cosmos/CHANGELOG.md – changelog entry
[Tracking] Embedding Generation in the Python SDK (V0)
Summary
V0 plan (parity with .NET tracking issue Azure/azure-cosmos-dotnet-v3#5830) for adding automatic embedding generation for hybrid / vector queries in the Python Cosmos SDK (
azure-cosmos).Customers running queries like:
today must compute the embedding for
'big brown cat'themselves before issuing the SDK call. After this change, the SDK will:embeddingParameterMap. The rewritten query becomes:embeddingParameterMapcarries@documentdb-hybridsearchquery-embedding-0 → 'big brown cat'.EmbeddingGenerator(in practice the EGS team's implementation, which talks to Microsoft Azure Foundry models) once with the full batch of texts.parameterson the rewritten query and proceeds to per-partition execution.Customer-facing API (V0)
Wire / DTO additions
embeddingParameterMapfield on the query-plan response (under bothqueryInfoandhybridSearchQueryInfo).EmbeddingGenerationadvertised in thesupportedQueryFeaturesheader only whenembedding_generatoris set on the request.Notes on Python specifics
serviceinteropDLL fast path for query plans), Python's SDK only obtains query plans through the gateway. So Python's V0 work cannot be exercised end-to-end until the gateway change is live. SDK changes themselves can land behind the new feature flag without breaking existing customers.azure.cosmos.container) and async (azure.cosmos.aio._container) surfaces, with parallel_cosmos_client_connection.py/aio/_cosmos_client_connection_async.pyandhybrid_search_aggregator.py/aio/hybrid_search_aggregator.py. Each sub-issue below covers both unless stated.Sub-issues
EmbeddingGenerator/AsyncEmbeddingGeneratorprotocols +embedding_generatorkeyword onquery_items(sync + async)_QueryFeature.EmbeddingGeneration+ supported-features advertisement_resolve_embeddingshelper on the hybrid-search aggregator (sync + async)_run_hybrid_search/ async sibling and plumb generator through dispatcher optionscosmos.embedding_generation(count, latency_ms, generator type)Out of scope (V0)
(text → embedding)across queries inside the SDK (customer's generator can do that).EmbeddingGenerator— that ships in the EGS package.Open items / risks
embeddingParameterMap(object vs array of{key,value}) with the GW team.EmbeddingGeneratorship under preview_onlyfirst, or behind the existing_QueryFeatureflag (which already provides per-request opt-in)?asyncio.timeout(...)themselves.Spec / references
sdk/cosmos/azure-cosmos/azure/cosmos/documents.py–_QueryFeaturetokensdk/cosmos/azure-cosmos/azure/cosmos/_cosmos_client_connection.py–_GetQueryPlanThroughGateway(sync)sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client_connection_async.py–_GetQueryPlanThroughGateway(async)sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/query_execution_info.py– plan accessorsdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/hybrid_search_aggregator.py– sync aggregatorsdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/aio/hybrid_search_aggregator.py– async aggregatorsdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/execution_dispatcher.py(+ async sibling) – option threadingsdk/cosmos/azure-cosmos/azure/cosmos/container.py–query_itemskeywordsdk/cosmos/azure-cosmos/azure/cosmos/aio/_container.py– asyncquery_itemskeywordsdk/cosmos/azure-cosmos/CHANGELOG.md– changelog entry