[feature] add zeroentropy embeddings provider#1770
Merged
nicoloboschi merged 1 commit intoMay 27, 2026
Conversation
nicoloboschi
approved these changes
May 27, 2026
Collaborator
nicoloboschi
left a comment
There was a problem hiding this comment.
LGTM. Solid integration — Pydantic-typed request/response, asymmetric query/document split with a safe default on the base Embeddings class so existing providers don't need touching, and base64 response decoding handled. Bonus catch on the reranker base_url not being threaded through the factory.
Cleanups (encoding_format fallback dead branch, duplicated dimension validation, unused usage field, getattr-vs-Protocol mismatch in embedding_utils, aligning URL handling with the existing ZeroEntropyCrossEncoder) will land in a follow-up PR.
4 tasks
nicoloboschi
added a commit
that referenced
this pull request
May 27, 2026
Three integration tests that hit the real ZeroEntropy API. Skipped unless ZEROENTROPY_LIVE_API_KEY is set, so default and CI runs are unaffected. - Embeddings: encode_documents + encode_query against zembed-1 (1280-dim), verifies the same text yields different vectors for document vs query input type (asymmetric encoder). - Embeddings transport parity: base64 and float encoding_format decode to the same vector within float32 tolerance. - Reranker: zerank-2 ranks a relevant passage above unrelated ones, exercising the base_url wiring fixed in #1770. Placed in a dedicated test file so the autouse env-clearing fixture in test_zeroentropy_embeddings.py does not interfere with the live key gate.
nicoloboschi
added a commit
that referenced
this pull request
May 27, 2026
…nker Follow-up to #1770: - Hoist the ZeroEntropy host out of cross_encoder.py into a shared DEFAULT_ZEROENTROPY_BASE_URL constant in config.py; reranker and embeddings now both reference it (was duplicated as an inline literal). - Drop ZeroEntropyEmbeddings._embed_url() fuzzy matching; compute self.embed_url once in __init__ via f"{base_url}{EMBED_PATH}", matching the ZeroEntropyCrossEncoder pattern. - Remove the duplicated dimension allowlist check from HindsightConfig.validate() - ZeroEntropyEmbeddings.__init__ already validates with the same set and a clearer error that includes the offending value. - Drop the dead "or DEFAULT_..." fallback after _parse_optional_choice for encoding_format; the helper never returned None in the surrounding code. - Drop the unused _ZeroEntropyEmbedUsage / response usage field. - Simplify _encode_with_input_type in embedding_utils.py to a direct encode_query / encode_documents dispatch; the base Embeddings ABC already supplies defaults, so the getattr-on-type defensive check is moot. - Add a regression test that latency=None is omitted from the outbound payload (relies on exclude_none=True). - Regenerate skills/hindsight-docs/ references to match canonical sources.
nicoloboschi
added a commit
that referenced
this pull request
May 27, 2026
Three integration tests that hit the real ZeroEntropy API. Skipped unless ZEROENTROPY_LIVE_API_KEY is set, so default and CI runs are unaffected. - Embeddings: encode_documents + encode_query against zembed-1 (1280-dim), verifies the same text yields different vectors for document vs query input type (asymmetric encoder). - Embeddings transport parity: base64 and float encoding_format decode to the same vector within float32 tolerance. - Reranker: zerank-2 ranks a relevant passage above unrelated ones, exercising the base_url wiring fixed in #1770. Placed in a dedicated test file so the autouse env-clearing fixture in test_zeroentropy_embeddings.py does not interfere with the live key gate.
nicoloboschi
added a commit
that referenced
this pull request
May 27, 2026
…nker (#1773) * chore(api): clean up zeroentropy embeddings, dedup base URL with reranker Follow-up to #1770: - Hoist the ZeroEntropy host out of cross_encoder.py into a shared DEFAULT_ZEROENTROPY_BASE_URL constant in config.py; reranker and embeddings now both reference it (was duplicated as an inline literal). - Drop ZeroEntropyEmbeddings._embed_url() fuzzy matching; compute self.embed_url once in __init__ via f"{base_url}{EMBED_PATH}", matching the ZeroEntropyCrossEncoder pattern. - Remove the duplicated dimension allowlist check from HindsightConfig.validate() - ZeroEntropyEmbeddings.__init__ already validates with the same set and a clearer error that includes the offending value. - Drop the dead "or DEFAULT_..." fallback after _parse_optional_choice for encoding_format; the helper never returned None in the surrounding code. - Drop the unused _ZeroEntropyEmbedUsage / response usage field. - Simplify _encode_with_input_type in embedding_utils.py to a direct encode_query / encode_documents dispatch; the base Embeddings ABC already supplies defaults, so the getattr-on-type defensive check is moot. - Add a regression test that latency=None is omitted from the outbound payload (relies on exclude_none=True). - Regenerate skills/hindsight-docs/ references to match canonical sources. * test(zeroentropy): add gated live API tests for embeddings + reranker Three integration tests that hit the real ZeroEntropy API. Skipped unless ZEROENTROPY_LIVE_API_KEY is set, so default and CI runs are unaffected. - Embeddings: encode_documents + encode_query against zembed-1 (1280-dim), verifies the same text yields different vectors for document vs query input type (asymmetric encoder). - Embeddings transport parity: base64 and float encoding_format decode to the same vector within float32 tolerance. - Reranker: zerank-2 ranks a relevant passage above unrelated ones, exercising the base_url wiring fixed in #1770. Placed in a dedicated test file so the autouse env-clearing fixture in test_zeroentropy_embeddings.py does not interfere with the live key gate. * test: stub encode_documents on the alignment-guard mocks The TestEmbeddingsBatchLengthGuarantee tests stubbed `encode` on a MagicMock, but after the embedding_utils.generate_embeddings_batch dispatch was simplified to call encode_documents()/encode_query() directly (no getattr fallback to encode), the stub on `encode` no longer satisfies the default input_type="document" path. The Mock's unstubbed encode_documents returned a fresh Mock whose len() is 0, which then tripped the alignment guard with "returned 0 vectors" instead of the expected mismatched length. Stub `encode_documents` to match the method the function actually invokes. The tests still exercise the same code (the length-mismatch guard in generate_embeddings_batch), just through the correct mock attribute.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This adds first-class ZeroEntropy
zembed-1embeddings support to Hindsight.zembed-1is a state-of-the-art retrieval embedder, and this integration lets Hindsight use it natively instead of forcing users through an OpenAI-compatible shim or LiteLLM proxy. It also preserves ZeroEntropy's asymmetric retrieval semantics: retained memory content is embedded asdocumentinput, while recall/search text is embedded asqueryinput.Why this matters
zembed-1embeddings./v1/models/embedendpoint, with no shim layer translating away provider-specific features.zembed-1to 1280 dimensions so it works with pgvector HNSW's 2000-dimension index limit out of the box, while still allowing 2560/1280/640/320/160/80/40 for deployments that want different tradeoffs.What changed
ZeroEntropyEmbeddingsprovider forzembed-1.HINDSIGHT_API_EMBEDDINGS_ZEROENTROPY_*config/env settings for API key, model, base URL, dimensions, encoding format, latency, and batch size.encode_documents()and recall/search text throughencode_query()when the embeddings backend supports asymmetric modes.floatandbase64ZeroEntropy embedding responses, normalizing both to float vectors before storage.Validation
uv run pytest tests/test_zeroentropy_embeddings.py -q-> 12 passeduv run ruff check hindsight_api tests/test_zeroentropy_embeddings.py-> passeduv run ty check hindsight_api/-> passed./scripts/hooks/lint.sh-> passeddocumentandqueryembeddings returned 1280-dimensional float vectorsMemoryEngineagainst a scratch pgvector database: retained 1 memory unit and recalled 1 result withembedding_dim=1280