chore(api): clean up zeroentropy embeddings, dedup base URL with reranker#1773
Merged
Conversation
…nker Follow-up to #1770: - Hoist the ZeroEntropy host out of cross_encoder.py into a shared DEFAULT_ZEROENTROPY_BASE_URL constant in config.py; reranker and embeddings now both reference it (was duplicated as an inline literal). - Drop ZeroEntropyEmbeddings._embed_url() fuzzy matching; compute self.embed_url once in __init__ via f"{base_url}{EMBED_PATH}", matching the ZeroEntropyCrossEncoder pattern. - Remove the duplicated dimension allowlist check from HindsightConfig.validate() - ZeroEntropyEmbeddings.__init__ already validates with the same set and a clearer error that includes the offending value. - Drop the dead "or DEFAULT_..." fallback after _parse_optional_choice for encoding_format; the helper never returned None in the surrounding code. - Drop the unused _ZeroEntropyEmbedUsage / response usage field. - Simplify _encode_with_input_type in embedding_utils.py to a direct encode_query / encode_documents dispatch; the base Embeddings ABC already supplies defaults, so the getattr-on-type defensive check is moot. - Add a regression test that latency=None is omitted from the outbound payload (relies on exclude_none=True). - Regenerate skills/hindsight-docs/ references to match canonical sources.
Three integration tests that hit the real ZeroEntropy API. Skipped unless ZEROENTROPY_LIVE_API_KEY is set, so default and CI runs are unaffected. - Embeddings: encode_documents + encode_query against zembed-1 (1280-dim), verifies the same text yields different vectors for document vs query input type (asymmetric encoder). - Embeddings transport parity: base64 and float encoding_format decode to the same vector within float32 tolerance. - Reranker: zerank-2 ranks a relevant passage above unrelated ones, exercising the base_url wiring fixed in #1770. Placed in a dedicated test file so the autouse env-clearing fixture in test_zeroentropy_embeddings.py does not interfere with the live key gate.
The TestEmbeddingsBatchLengthGuarantee tests stubbed `encode` on a MagicMock, but after the embedding_utils.generate_embeddings_batch dispatch was simplified to call encode_documents()/encode_query() directly (no getattr fallback to encode), the stub on `encode` no longer satisfies the default input_type="document" path. The Mock's unstubbed encode_documents returned a fresh Mock whose len() is 0, which then tripped the alignment guard with "returned 0 vectors" instead of the expected mismatched length. Stub `encode_documents` to match the method the function actually invokes. The tests still exercise the same code (the length-mismatch guard in generate_embeddings_batch), just through the correct mock attribute.
ae1e906 to
84f7d72
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #1770. Small cleanups discovered during review, and aligns the embeddings provider with the existing
ZeroEntropyCrossEncoderso we stop duplicating the host URL and URL-construction logic between the two.cross_encoder.pyhadDEFAULT_BASE_URL = "https://api.zeroentropy.dev"inline while embeddings importedDEFAULT_EMBEDDINGS_ZEROENTROPY_BASE_URLfrom config. Promoted to a singleDEFAULT_ZEROENTROPY_BASE_URLconstant inconfig.py; both providers now reference it.ZeroEntropyEmbeddings._embed_url()did fuzzy matching for base URLs that already ended in/v1or/v1/models/embed. The reranker doesn't — it just doesf"{base_url}{RERANK_PATH}". Embeddings now follows the same pattern: computeself.embed_urlonce in__init__, drop the helper.HindsightConfig.validate().ZeroEntropyEmbeddings.__init__already enforces the same allowlist with a clearer error message that includes the offending value.or DEFAULT_...fallback after_parse_optional_choicefor encoding_format — the helper could never returnNonebecause the surrounding code already coalesced viaor DEFAULT_...before the call._ZeroEntropyEmbedUsagemodel /usageresponse field — never read._encode_with_input_typeinembedding_utils.pyto a directencode_query/encode_documentsdispatch. The baseEmbeddingsABC supplies defaults, so thegetattr(type(...), ...)defensive check is moot.latency=Noneis omitted from the outbound payload (relies onexclude_none=True).skills/hindsight-docs/references to match canonical sources (pre-commit hook).No public API or env-var changes.
Test plan
uv run pytest tests/test_zeroentropy_embeddings.py -q-> 13 passed (existing 12 + new latency-omission test)uv run ruff check hindsight_api tests/test_zeroentropy_embeddings.py-> passeduv run ty check hindsight_api/-> passedpytest tests/ -k "embedding or zeroentropy or retain or recall"-> exit 0