chore(api): clean up zeroentropy embeddings, dedup base URL with reranker by nicoloboschi · Pull Request #1773 · vectorize-io/hindsight

nicoloboschi · 2026-05-27T07:34:30Z

Summary

Follow-up to #1770. Small cleanups discovered during review, and aligns the embeddings provider with the existing ZeroEntropyCrossEncoder so we stop duplicating the host URL and URL-construction logic between the two.

Dedup the host. cross_encoder.py had DEFAULT_BASE_URL = "https://api.zeroentropy.dev" inline while embeddings imported DEFAULT_EMBEDDINGS_ZEROENTROPY_BASE_URL from config. Promoted to a single DEFAULT_ZEROENTROPY_BASE_URL constant in config.py; both providers now reference it.
Align URL construction. ZeroEntropyEmbeddings._embed_url() did fuzzy matching for base URLs that already ended in /v1 or /v1/models/embed. The reranker doesn't — it just does f"{base_url}{RERANK_PATH}". Embeddings now follows the same pattern: compute self.embed_url once in __init__, drop the helper.
Remove duplicate dimension validation from HindsightConfig.validate(). ZeroEntropyEmbeddings.__init__ already enforces the same allowlist with a clearer error message that includes the offending value.
Fix dead or DEFAULT_... fallback after _parse_optional_choice for encoding_format — the helper could never return None because the surrounding code already coalesced via or DEFAULT_... before the call.
Drop unused _ZeroEntropyEmbedUsage model / usage response field — never read.
Simplify _encode_with_input_type in embedding_utils.py to a direct encode_query / encode_documents dispatch. The base Embeddings ABC supplies defaults, so the getattr(type(...), ...) defensive check is moot.
Add regression test that latency=None is omitted from the outbound payload (relies on exclude_none=True).
Regenerate skills/hindsight-docs/ references to match canonical sources (pre-commit hook).

No public API or env-var changes.

Test plan

uv run pytest tests/test_zeroentropy_embeddings.py -q -> 13 passed (existing 12 + new latency-omission test)
uv run ruff check hindsight_api tests/test_zeroentropy_embeddings.py -> passed
uv run ty check hindsight_api/ -> passed
Broader sweep pytest tests/ -k "embedding or zeroentropy or retain or recall" -> exit 0

…nker Follow-up to #1770: - Hoist the ZeroEntropy host out of cross_encoder.py into a shared DEFAULT_ZEROENTROPY_BASE_URL constant in config.py; reranker and embeddings now both reference it (was duplicated as an inline literal). - Drop ZeroEntropyEmbeddings._embed_url() fuzzy matching; compute self.embed_url once in __init__ via f"{base_url}{EMBED_PATH}", matching the ZeroEntropyCrossEncoder pattern. - Remove the duplicated dimension allowlist check from HindsightConfig.validate() - ZeroEntropyEmbeddings.__init__ already validates with the same set and a clearer error that includes the offending value. - Drop the dead "or DEFAULT_..." fallback after _parse_optional_choice for encoding_format; the helper never returned None in the surrounding code. - Drop the unused _ZeroEntropyEmbedUsage / response usage field. - Simplify _encode_with_input_type in embedding_utils.py to a direct encode_query / encode_documents dispatch; the base Embeddings ABC already supplies defaults, so the getattr-on-type defensive check is moot. - Add a regression test that latency=None is omitted from the outbound payload (relies on exclude_none=True). - Regenerate skills/hindsight-docs/ references to match canonical sources.

Three integration tests that hit the real ZeroEntropy API. Skipped unless ZEROENTROPY_LIVE_API_KEY is set, so default and CI runs are unaffected. - Embeddings: encode_documents + encode_query against zembed-1 (1280-dim), verifies the same text yields different vectors for document vs query input type (asymmetric encoder). - Embeddings transport parity: base64 and float encoding_format decode to the same vector within float32 tolerance. - Reranker: zerank-2 ranks a relevant passage above unrelated ones, exercising the base_url wiring fixed in #1770. Placed in a dedicated test file so the autouse env-clearing fixture in test_zeroentropy_embeddings.py does not interfere with the live key gate.

The TestEmbeddingsBatchLengthGuarantee tests stubbed `encode` on a MagicMock, but after the embedding_utils.generate_embeddings_batch dispatch was simplified to call encode_documents()/encode_query() directly (no getattr fallback to encode), the stub on `encode` no longer satisfies the default input_type="document" path. The Mock's unstubbed encode_documents returned a fresh Mock whose len() is 0, which then tripped the alignment guard with "returned 0 vectors" instead of the expected mismatched length. Stub `encode_documents` to match the method the function actually invokes. The tests still exercise the same code (the length-mismatch guard in generate_embeddings_batch), just through the correct mock attribute.

nicoloboschi added 3 commits May 27, 2026 13:17

nicoloboschi force-pushed the chore/zeroentropy-embeddings-cleanup branch from ae1e906 to 84f7d72 Compare May 27, 2026 11:17

nicoloboschi merged commit fbbc7a5 into main May 27, 2026
72 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(api): clean up zeroentropy embeddings, dedup base URL with reranker#1773

chore(api): clean up zeroentropy embeddings, dedup base URL with reranker#1773
nicoloboschi merged 3 commits into
mainfrom
chore/zeroentropy-embeddings-cleanup

nicoloboschi commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicoloboschi commented May 27, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant