Skip to content

chore(api): clean up zeroentropy embeddings, dedup base URL with reranker#1773

Merged
nicoloboschi merged 3 commits into
mainfrom
chore/zeroentropy-embeddings-cleanup
May 27, 2026
Merged

chore(api): clean up zeroentropy embeddings, dedup base URL with reranker#1773
nicoloboschi merged 3 commits into
mainfrom
chore/zeroentropy-embeddings-cleanup

Conversation

@nicoloboschi
Copy link
Copy Markdown
Collaborator

Summary

Follow-up to #1770. Small cleanups discovered during review, and aligns the embeddings provider with the existing ZeroEntropyCrossEncoder so we stop duplicating the host URL and URL-construction logic between the two.

  • Dedup the host. cross_encoder.py had DEFAULT_BASE_URL = "https://api.zeroentropy.dev" inline while embeddings imported DEFAULT_EMBEDDINGS_ZEROENTROPY_BASE_URL from config. Promoted to a single DEFAULT_ZEROENTROPY_BASE_URL constant in config.py; both providers now reference it.
  • Align URL construction. ZeroEntropyEmbeddings._embed_url() did fuzzy matching for base URLs that already ended in /v1 or /v1/models/embed. The reranker doesn't — it just does f"{base_url}{RERANK_PATH}". Embeddings now follows the same pattern: compute self.embed_url once in __init__, drop the helper.
  • Remove duplicate dimension validation from HindsightConfig.validate(). ZeroEntropyEmbeddings.__init__ already enforces the same allowlist with a clearer error message that includes the offending value.
  • Fix dead or DEFAULT_... fallback after _parse_optional_choice for encoding_format — the helper could never return None because the surrounding code already coalesced via or DEFAULT_... before the call.
  • Drop unused _ZeroEntropyEmbedUsage model / usage response field — never read.
  • Simplify _encode_with_input_type in embedding_utils.py to a direct encode_query / encode_documents dispatch. The base Embeddings ABC supplies defaults, so the getattr(type(...), ...) defensive check is moot.
  • Add regression test that latency=None is omitted from the outbound payload (relies on exclude_none=True).
  • Regenerate skills/hindsight-docs/ references to match canonical sources (pre-commit hook).

No public API or env-var changes.

Test plan

  • uv run pytest tests/test_zeroentropy_embeddings.py -q -> 13 passed (existing 12 + new latency-omission test)
  • uv run ruff check hindsight_api tests/test_zeroentropy_embeddings.py -> passed
  • uv run ty check hindsight_api/ -> passed
  • Broader sweep pytest tests/ -k "embedding or zeroentropy or retain or recall" -> exit 0

…nker

Follow-up to #1770:

- Hoist the ZeroEntropy host out of cross_encoder.py into a shared
  DEFAULT_ZEROENTROPY_BASE_URL constant in config.py; reranker and
  embeddings now both reference it (was duplicated as an inline literal).
- Drop ZeroEntropyEmbeddings._embed_url() fuzzy matching; compute
  self.embed_url once in __init__ via f"{base_url}{EMBED_PATH}", matching
  the ZeroEntropyCrossEncoder pattern.
- Remove the duplicated dimension allowlist check from
  HindsightConfig.validate() - ZeroEntropyEmbeddings.__init__ already
  validates with the same set and a clearer error that includes the
  offending value.
- Drop the dead "or DEFAULT_..." fallback after _parse_optional_choice for
  encoding_format; the helper never returned None in the surrounding code.
- Drop the unused _ZeroEntropyEmbedUsage / response usage field.
- Simplify _encode_with_input_type in embedding_utils.py to a direct
  encode_query / encode_documents dispatch; the base Embeddings ABC already
  supplies defaults, so the getattr-on-type defensive check is moot.
- Add a regression test that latency=None is omitted from the outbound
  payload (relies on exclude_none=True).
- Regenerate skills/hindsight-docs/ references to match canonical sources.
Three integration tests that hit the real ZeroEntropy API. Skipped unless
ZEROENTROPY_LIVE_API_KEY is set, so default and CI runs are unaffected.

- Embeddings: encode_documents + encode_query against zembed-1 (1280-dim),
  verifies the same text yields different vectors for document vs query input
  type (asymmetric encoder).
- Embeddings transport parity: base64 and float encoding_format decode to
  the same vector within float32 tolerance.
- Reranker: zerank-2 ranks a relevant passage above unrelated ones,
  exercising the base_url wiring fixed in #1770.

Placed in a dedicated test file so the autouse env-clearing fixture in
test_zeroentropy_embeddings.py does not interfere with the live key gate.
The TestEmbeddingsBatchLengthGuarantee tests stubbed `encode` on a
MagicMock, but after the embedding_utils.generate_embeddings_batch dispatch
was simplified to call encode_documents()/encode_query() directly (no
getattr fallback to encode), the stub on `encode` no longer satisfies the
default input_type="document" path. The Mock's unstubbed encode_documents
returned a fresh Mock whose len() is 0, which then tripped the alignment
guard with "returned 0 vectors" instead of the expected mismatched length.

Stub `encode_documents` to match the method the function actually invokes.
The tests still exercise the same code (the length-mismatch guard in
generate_embeddings_batch), just through the correct mock attribute.
@nicoloboschi nicoloboschi force-pushed the chore/zeroentropy-embeddings-cleanup branch from ae1e906 to 84f7d72 Compare May 27, 2026 11:17
@nicoloboschi nicoloboschi merged commit fbbc7a5 into main May 27, 2026
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant