[feature] add zeroentropy embeddings provider by flamerged · Pull Request #1770 · vectorize-io/hindsight

flamerged · 2026-05-26T23:09:24Z

Summary

This adds first-class ZeroEntropy zembed-1 embeddings support to Hindsight.

zembed-1 is a state-of-the-art retrieval embedder, and this integration lets Hindsight use it natively instead of forcing users through an OpenAI-compatible shim or LiteLLM proxy. It also preserves ZeroEntropy's asymmetric retrieval semantics: retained memory content is embedded as document input, while recall/search text is embedded as query input.

Why this matters

Better retrieval stack: Hindsight already supports ZeroEntropy reranking; this completes the native ZeroEntropy retrieval path with zembed-1 embeddings.
No proxy tax: Users can point Hindsight directly at ZeroEntropy's /v1/models/embed endpoint, with no shim layer translating away provider-specific features.
Correct asymmetric embeddings: ZeroEntropy exposes separate query/document modes. Native support lets Hindsight use those modes directly instead of treating all text the same.
Production-friendly dimensions: Hindsight defaults zembed-1 to 1280 dimensions so it works with pgvector HNSW's 2000-dimension index limit out of the box, while still allowing 2560/1280/640/320/160/80/40 for deployments that want different tradeoffs.

What changed

Added a native ZeroEntropyEmbeddings provider for zembed-1.
Added HINDSIGHT_API_EMBEDDINGS_ZEROENTROPY_* config/env settings for API key, model, base URL, dimensions, encoding format, latency, and batch size.
Routed retained content through encode_documents() and recall/search text through encode_query() when the embeddings backend supports asymmetric modes.
Added support for both float and base64 ZeroEntropy embedding responses, normalizing both to float vectors before storage.
Fixed the existing ZeroEntropy reranker factory to honor its configured base URL.
Updated env examples, docs, and generated docs-skill references.
Added focused coverage for config parsing, provider factory wiring, request payloads, response decoding, query/document routing, and reranker base URL wiring.

Validation

uv run pytest tests/test_zeroentropy_embeddings.py -q -> 12 passed
uv run ruff check hindsight_api tests/test_zeroentropy_embeddings.py -> passed
uv run ty check hindsight_api/ -> passed
./scripts/hooks/lint.sh -> passed
CodeRabbit CLI review final pass -> 0 findings
Live ZeroEntropy API smoke test: document and query embeddings returned 1280-dimensional float vectors
Live retain/recall smoke test through MemoryEngine against a scratch pgvector database: retained 1 memory unit and recalled 1 result with embedding_dim=1280

nicoloboschi

LGTM. Solid integration — Pydantic-typed request/response, asymmetric query/document split with a safe default on the base Embeddings class so existing providers don't need touching, and base64 response decoding handled. Bonus catch on the reranker base_url not being threaded through the factory.

Cleanups (encoding_format fallback dead branch, duplicated dimension validation, unused usage field, getattr-vs-Protocol mismatch in embedding_utils, aligning URL handling with the existing ZeroEntropyCrossEncoder) will land in a follow-up PR.

Three integration tests that hit the real ZeroEntropy API. Skipped unless ZEROENTROPY_LIVE_API_KEY is set, so default and CI runs are unaffected. - Embeddings: encode_documents + encode_query against zembed-1 (1280-dim), verifies the same text yields different vectors for document vs query input type (asymmetric encoder). - Embeddings transport parity: base64 and float encoding_format decode to the same vector within float32 tolerance. - Reranker: zerank-2 ranks a relevant passage above unrelated ones, exercising the base_url wiring fixed in #1770. Placed in a dedicated test file so the autouse env-clearing fixture in test_zeroentropy_embeddings.py does not interfere with the live key gate.

…nker Follow-up to #1770: - Hoist the ZeroEntropy host out of cross_encoder.py into a shared DEFAULT_ZEROENTROPY_BASE_URL constant in config.py; reranker and embeddings now both reference it (was duplicated as an inline literal). - Drop ZeroEntropyEmbeddings._embed_url() fuzzy matching; compute self.embed_url once in __init__ via f"{base_url}{EMBED_PATH}", matching the ZeroEntropyCrossEncoder pattern. - Remove the duplicated dimension allowlist check from HindsightConfig.validate() - ZeroEntropyEmbeddings.__init__ already validates with the same set and a clearer error that includes the offending value. - Drop the dead "or DEFAULT_..." fallback after _parse_optional_choice for encoding_format; the helper never returned None in the surrounding code. - Drop the unused _ZeroEntropyEmbedUsage / response usage field. - Simplify _encode_with_input_type in embedding_utils.py to a direct encode_query / encode_documents dispatch; the base Embeddings ABC already supplies defaults, so the getattr-on-type defensive check is moot. - Add a regression test that latency=None is omitted from the outbound payload (relies on exclude_none=True). - Regenerate skills/hindsight-docs/ references to match canonical sources.

Three integration tests that hit the real ZeroEntropy API. Skipped unless ZEROENTROPY_LIVE_API_KEY is set, so default and CI runs are unaffected. - Embeddings: encode_documents + encode_query against zembed-1 (1280-dim), verifies the same text yields different vectors for document vs query input type (asymmetric encoder). - Embeddings transport parity: base64 and float encoding_format decode to the same vector within float32 tolerance. - Reranker: zerank-2 ranks a relevant passage above unrelated ones, exercising the base_url wiring fixed in #1770. Placed in a dedicated test file so the autouse env-clearing fixture in test_zeroentropy_embeddings.py does not interfere with the live key gate.

…nker (#1773) * chore(api): clean up zeroentropy embeddings, dedup base URL with reranker Follow-up to #1770: - Hoist the ZeroEntropy host out of cross_encoder.py into a shared DEFAULT_ZEROENTROPY_BASE_URL constant in config.py; reranker and embeddings now both reference it (was duplicated as an inline literal). - Drop ZeroEntropyEmbeddings._embed_url() fuzzy matching; compute self.embed_url once in __init__ via f"{base_url}{EMBED_PATH}", matching the ZeroEntropyCrossEncoder pattern. - Remove the duplicated dimension allowlist check from HindsightConfig.validate() - ZeroEntropyEmbeddings.__init__ already validates with the same set and a clearer error that includes the offending value. - Drop the dead "or DEFAULT_..." fallback after _parse_optional_choice for encoding_format; the helper never returned None in the surrounding code. - Drop the unused _ZeroEntropyEmbedUsage / response usage field. - Simplify _encode_with_input_type in embedding_utils.py to a direct encode_query / encode_documents dispatch; the base Embeddings ABC already supplies defaults, so the getattr-on-type defensive check is moot. - Add a regression test that latency=None is omitted from the outbound payload (relies on exclude_none=True). - Regenerate skills/hindsight-docs/ references to match canonical sources. * test(zeroentropy): add gated live API tests for embeddings + reranker Three integration tests that hit the real ZeroEntropy API. Skipped unless ZEROENTROPY_LIVE_API_KEY is set, so default and CI runs are unaffected. - Embeddings: encode_documents + encode_query against zembed-1 (1280-dim), verifies the same text yields different vectors for document vs query input type (asymmetric encoder). - Embeddings transport parity: base64 and float encoding_format decode to the same vector within float32 tolerance. - Reranker: zerank-2 ranks a relevant passage above unrelated ones, exercising the base_url wiring fixed in #1770. Placed in a dedicated test file so the autouse env-clearing fixture in test_zeroentropy_embeddings.py does not interfere with the live key gate. * test: stub encode_documents on the alignment-guard mocks The TestEmbeddingsBatchLengthGuarantee tests stubbed `encode` on a MagicMock, but after the embedding_utils.generate_embeddings_batch dispatch was simplified to call encode_documents()/encode_query() directly (no getattr fallback to encode), the stub on `encode` no longer satisfies the default input_type="document" path. The Mock's unstubbed encode_documents returned a fresh Mock whose len() is 0, which then tripped the alignment guard with "returned 0 vectors" instead of the expected mismatched length. Stub `encode_documents` to match the method the function actually invokes. The tests still exercise the same code (the length-mismatch guard in generate_embeddings_batch), just through the correct mock attribute.

add zeroentropy embeddings provider

4a9902d

flamerged changed the title ~~[codex] add zeroentropy embeddings provider~~ add zeroentropy embeddings provider May 26, 2026

flamerged changed the title ~~add zeroentropy embeddings provider~~ [feature] add zeroentropy embeddings provider May 26, 2026

cdbartholomew requested a review from nicoloboschi May 26, 2026 23:42

nicoloboschi approved these changes May 27, 2026

View reviewed changes

nicoloboschi merged commit ec49175 into vectorize-io:main May 27, 2026

nicoloboschi mentioned this pull request May 27, 2026

chore(api): clean up zeroentropy embeddings, dedup base URL with reranker #1773

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] add zeroentropy embeddings provider#1770

[feature] add zeroentropy embeddings provider#1770
nicoloboschi merged 1 commit into
vectorize-io:mainfrom
flamerged:feature/zeroentropy-embeddings

flamerged commented May 26, 2026 •

edited

Loading

Uh oh!

nicoloboschi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

flamerged commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this matters

What changed

Validation

Uh oh!

nicoloboschi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

flamerged commented May 26, 2026 •

edited

Loading