Skip to content

[Cosmos] [Tracking] Embedding Generation in the Python SDK (V0) #46729

@ananth7592

Description

@ananth7592

[Tracking] Embedding Generation in the Python SDK (V0)

Summary

V0 plan (parity with .NET tracking issue Azure/azure-cosmos-dotnet-v3#5830) for adding automatic embedding generation for hybrid / vector queries in the Python Cosmos SDK (azure-cosmos).

Customers running queries like:

SELECT TOP 10 *
FROM c
ORDER BY VectorDistance(c.text, 'big brown cat')

today must compute the embedding for 'big brown cat' themselves before issuing the SDK call. After this change, the SDK will:

  1. Send the raw user query to the Gateway for a query plan.
  2. Gateway returns a plan with the rewritten query and a new embeddingParameterMap. The rewritten query becomes:
    SELECT TOP 10 *
    FROM c
    ORDER BY VectorDistance(c.embedding, @documentdb-hybridsearchquery-embedding-0)
    …and embeddingParameterMap carries @documentdb-hybridsearchquery-embedding-0 → 'big brown cat'.
  3. SDK extracts every entry from the map and calls a customer-supplied EmbeddingGenerator (in practice the EGS team's implementation, which talks to Microsoft Azure Foundry models) once with the full batch of texts.
  4. SDK injects the returned vectors as ordinary parameters on the rewritten query and proceeds to per-partition execution.
  5. Diagnostics expose the embedding-generation step as a first-class OpenTelemetry span.

Customer-facing API (V0)

from typing import Protocol, Sequence

class EmbeddingGenerator(Protocol):
    def generate_embeddings(self, texts: Sequence[str]) -> Sequence[Sequence[float]]: ...

class AsyncEmbeddingGenerator(Protocol):
    async def generate_embeddings_async(self, texts: Sequence[str]) -> Sequence[Sequence[float]]: ...
container.query_items(
    query="SELECT TOP 10 * FROM c ORDER BY VectorDistance(c.text, 'big brown cat')",
    embedding_generator=my_generator,           # NEW – sync container
    enable_cross_partition_query=True,
)

# Async equivalent (aio):
await async_container.query_items(
    query=...,
    embedding_generator=my_async_generator,     # NEW – async container, AsyncEmbeddingGenerator
    enable_cross_partition_query=True,
)

Wire / DTO additions

  • embeddingParameterMap field on the query-plan response (under both queryInfo and hybridSearchQueryInfo).
  • New supported-feature token EmbeddingGeneration advertised in the supportedQueryFeatures header only when embedding_generator is set on the request.

Notes on Python specifics

  • Hard dependency on the Gateway change. Unlike .NET (which has a Windows-x64 serviceinterop DLL fast path for query plans), Python's SDK only obtains query plans through the gateway. So Python's V0 work cannot be exercised end-to-end until the gateway change is live. SDK changes themselves can land behind the new feature flag without breaking existing customers.
  • Sync + async dual paths. The SDK has parallel sync (azure.cosmos.container) and async (azure.cosmos.aio._container) surfaces, with parallel _cosmos_client_connection.py / aio/_cosmos_client_connection_async.py and hybrid_search_aggregator.py / aio/hybrid_search_aggregator.py. Each sub-issue below covers both unless stated.

Sub-issues

Out of scope (V0)

  • Caching of (text → embedding) across queries inside the SDK (customer's generator can do that).
  • Streaming / chunked embeddings (single batched call per query attempt).
  • A built-in Azure Foundry implementation of EmbeddingGenerator — that ships in the EGS package.
  • Java SDK changes (parity item, tracked in its own repo).

Open items / risks

  1. Confirm wire shape of embeddingParameterMap (object vs array of {key,value}) with the GW team.
  2. Confirm the literal string for the new supported-feature token (Python advertises by name, must match exactly).
  3. Decide: should EmbeddingGenerator ship under preview _only first, or behind the existing _QueryFeature flag (which already provides per-request opt-in)?
  4. Cancellation / timeout: protocols don't take a token; document that customers must honour asyncio.timeout(...) themselves.
  5. Telemetry policy: confirm count + latency + generator type are OK to log; raw text / vectors explicitly excluded.

Spec / references

  • .NET tracking issue: Azure/azure-cosmos-dotnet-v3#5830 (this work is the Python sibling)
  • Python files touched (V0):
    • sdk/cosmos/azure-cosmos/azure/cosmos/documents.py_QueryFeature token
    • sdk/cosmos/azure-cosmos/azure/cosmos/_cosmos_client_connection.py_GetQueryPlanThroughGateway (sync)
    • sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client_connection_async.py_GetQueryPlanThroughGateway (async)
    • sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/query_execution_info.py – plan accessor
    • sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/hybrid_search_aggregator.py – sync aggregator
    • sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/aio/hybrid_search_aggregator.py – async aggregator
    • sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/execution_dispatcher.py (+ async sibling) – option threading
    • sdk/cosmos/azure-cosmos/azure/cosmos/container.pyquery_items keyword
    • sdk/cosmos/azure-cosmos/azure/cosmos/aio/_container.py – async query_items keyword
    • sdk/cosmos/azure-cosmos/CHANGELOG.md – changelog entry

Metadata

Metadata

Assignees

Labels

Cosmosfeature-requestThis issue requires a new behavior in the product in order be resolved.

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions