Skip to content

[feature] add zeroentropy embeddings provider#1770

Merged
nicoloboschi merged 1 commit into
vectorize-io:mainfrom
flamerged:feature/zeroentropy-embeddings
May 27, 2026
Merged

[feature] add zeroentropy embeddings provider#1770
nicoloboschi merged 1 commit into
vectorize-io:mainfrom
flamerged:feature/zeroentropy-embeddings

Conversation

@flamerged
Copy link
Copy Markdown
Contributor

@flamerged flamerged commented May 26, 2026

Summary

This adds first-class ZeroEntropy zembed-1 embeddings support to Hindsight.

zembed-1 is a state-of-the-art retrieval embedder, and this integration lets Hindsight use it natively instead of forcing users through an OpenAI-compatible shim or LiteLLM proxy. It also preserves ZeroEntropy's asymmetric retrieval semantics: retained memory content is embedded as document input, while recall/search text is embedded as query input.

Why this matters

  • Better retrieval stack: Hindsight already supports ZeroEntropy reranking; this completes the native ZeroEntropy retrieval path with zembed-1 embeddings.
  • No proxy tax: Users can point Hindsight directly at ZeroEntropy's /v1/models/embed endpoint, with no shim layer translating away provider-specific features.
  • Correct asymmetric embeddings: ZeroEntropy exposes separate query/document modes. Native support lets Hindsight use those modes directly instead of treating all text the same.
  • Production-friendly dimensions: Hindsight defaults zembed-1 to 1280 dimensions so it works with pgvector HNSW's 2000-dimension index limit out of the box, while still allowing 2560/1280/640/320/160/80/40 for deployments that want different tradeoffs.

What changed

  • Added a native ZeroEntropyEmbeddings provider for zembed-1.
  • Added HINDSIGHT_API_EMBEDDINGS_ZEROENTROPY_* config/env settings for API key, model, base URL, dimensions, encoding format, latency, and batch size.
  • Routed retained content through encode_documents() and recall/search text through encode_query() when the embeddings backend supports asymmetric modes.
  • Added support for both float and base64 ZeroEntropy embedding responses, normalizing both to float vectors before storage.
  • Fixed the existing ZeroEntropy reranker factory to honor its configured base URL.
  • Updated env examples, docs, and generated docs-skill references.
  • Added focused coverage for config parsing, provider factory wiring, request payloads, response decoding, query/document routing, and reranker base URL wiring.

Validation

  • uv run pytest tests/test_zeroentropy_embeddings.py -q -> 12 passed
  • uv run ruff check hindsight_api tests/test_zeroentropy_embeddings.py -> passed
  • uv run ty check hindsight_api/ -> passed
  • ./scripts/hooks/lint.sh -> passed
  • CodeRabbit CLI review final pass -> 0 findings
  • Live ZeroEntropy API smoke test: document and query embeddings returned 1280-dimensional float vectors
  • Live retain/recall smoke test through MemoryEngine against a scratch pgvector database: retained 1 memory unit and recalled 1 result with embedding_dim=1280

@flamerged flamerged changed the title [codex] add zeroentropy embeddings provider add zeroentropy embeddings provider May 26, 2026
@flamerged flamerged changed the title add zeroentropy embeddings provider [feature] add zeroentropy embeddings provider May 26, 2026
Copy link
Copy Markdown
Collaborator

@nicoloboschi nicoloboschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Solid integration — Pydantic-typed request/response, asymmetric query/document split with a safe default on the base Embeddings class so existing providers don't need touching, and base64 response decoding handled. Bonus catch on the reranker base_url not being threaded through the factory.

Cleanups (encoding_format fallback dead branch, duplicated dimension validation, unused usage field, getattr-vs-Protocol mismatch in embedding_utils, aligning URL handling with the existing ZeroEntropyCrossEncoder) will land in a follow-up PR.

@nicoloboschi nicoloboschi merged commit ec49175 into vectorize-io:main May 27, 2026
nicoloboschi added a commit that referenced this pull request May 27, 2026
Three integration tests that hit the real ZeroEntropy API. Skipped unless
ZEROENTROPY_LIVE_API_KEY is set, so default and CI runs are unaffected.

- Embeddings: encode_documents + encode_query against zembed-1 (1280-dim),
  verifies the same text yields different vectors for document vs query input
  type (asymmetric encoder).
- Embeddings transport parity: base64 and float encoding_format decode to
  the same vector within float32 tolerance.
- Reranker: zerank-2 ranks a relevant passage above unrelated ones,
  exercising the base_url wiring fixed in #1770.

Placed in a dedicated test file so the autouse env-clearing fixture in
test_zeroentropy_embeddings.py does not interfere with the live key gate.
nicoloboschi added a commit that referenced this pull request May 27, 2026
…nker

Follow-up to #1770:

- Hoist the ZeroEntropy host out of cross_encoder.py into a shared
  DEFAULT_ZEROENTROPY_BASE_URL constant in config.py; reranker and
  embeddings now both reference it (was duplicated as an inline literal).
- Drop ZeroEntropyEmbeddings._embed_url() fuzzy matching; compute
  self.embed_url once in __init__ via f"{base_url}{EMBED_PATH}", matching
  the ZeroEntropyCrossEncoder pattern.
- Remove the duplicated dimension allowlist check from
  HindsightConfig.validate() - ZeroEntropyEmbeddings.__init__ already
  validates with the same set and a clearer error that includes the
  offending value.
- Drop the dead "or DEFAULT_..." fallback after _parse_optional_choice for
  encoding_format; the helper never returned None in the surrounding code.
- Drop the unused _ZeroEntropyEmbedUsage / response usage field.
- Simplify _encode_with_input_type in embedding_utils.py to a direct
  encode_query / encode_documents dispatch; the base Embeddings ABC already
  supplies defaults, so the getattr-on-type defensive check is moot.
- Add a regression test that latency=None is omitted from the outbound
  payload (relies on exclude_none=True).
- Regenerate skills/hindsight-docs/ references to match canonical sources.
nicoloboschi added a commit that referenced this pull request May 27, 2026
Three integration tests that hit the real ZeroEntropy API. Skipped unless
ZEROENTROPY_LIVE_API_KEY is set, so default and CI runs are unaffected.

- Embeddings: encode_documents + encode_query against zembed-1 (1280-dim),
  verifies the same text yields different vectors for document vs query input
  type (asymmetric encoder).
- Embeddings transport parity: base64 and float encoding_format decode to
  the same vector within float32 tolerance.
- Reranker: zerank-2 ranks a relevant passage above unrelated ones,
  exercising the base_url wiring fixed in #1770.

Placed in a dedicated test file so the autouse env-clearing fixture in
test_zeroentropy_embeddings.py does not interfere with the live key gate.
nicoloboschi added a commit that referenced this pull request May 27, 2026
…nker (#1773)

* chore(api): clean up zeroentropy embeddings, dedup base URL with reranker

Follow-up to #1770:

- Hoist the ZeroEntropy host out of cross_encoder.py into a shared
  DEFAULT_ZEROENTROPY_BASE_URL constant in config.py; reranker and
  embeddings now both reference it (was duplicated as an inline literal).
- Drop ZeroEntropyEmbeddings._embed_url() fuzzy matching; compute
  self.embed_url once in __init__ via f"{base_url}{EMBED_PATH}", matching
  the ZeroEntropyCrossEncoder pattern.
- Remove the duplicated dimension allowlist check from
  HindsightConfig.validate() - ZeroEntropyEmbeddings.__init__ already
  validates with the same set and a clearer error that includes the
  offending value.
- Drop the dead "or DEFAULT_..." fallback after _parse_optional_choice for
  encoding_format; the helper never returned None in the surrounding code.
- Drop the unused _ZeroEntropyEmbedUsage / response usage field.
- Simplify _encode_with_input_type in embedding_utils.py to a direct
  encode_query / encode_documents dispatch; the base Embeddings ABC already
  supplies defaults, so the getattr-on-type defensive check is moot.
- Add a regression test that latency=None is omitted from the outbound
  payload (relies on exclude_none=True).
- Regenerate skills/hindsight-docs/ references to match canonical sources.

* test(zeroentropy): add gated live API tests for embeddings + reranker

Three integration tests that hit the real ZeroEntropy API. Skipped unless
ZEROENTROPY_LIVE_API_KEY is set, so default and CI runs are unaffected.

- Embeddings: encode_documents + encode_query against zembed-1 (1280-dim),
  verifies the same text yields different vectors for document vs query input
  type (asymmetric encoder).
- Embeddings transport parity: base64 and float encoding_format decode to
  the same vector within float32 tolerance.
- Reranker: zerank-2 ranks a relevant passage above unrelated ones,
  exercising the base_url wiring fixed in #1770.

Placed in a dedicated test file so the autouse env-clearing fixture in
test_zeroentropy_embeddings.py does not interfere with the live key gate.

* test: stub encode_documents on the alignment-guard mocks

The TestEmbeddingsBatchLengthGuarantee tests stubbed `encode` on a
MagicMock, but after the embedding_utils.generate_embeddings_batch dispatch
was simplified to call encode_documents()/encode_query() directly (no
getattr fallback to encode), the stub on `encode` no longer satisfies the
default input_type="document" path. The Mock's unstubbed encode_documents
returned a fresh Mock whose len() is 0, which then tripped the alignment
guard with "returned 0 vectors" instead of the expected mismatched length.

Stub `encode_documents` to match the method the function actually invokes.
The tests still exercise the same code (the length-mismatch guard in
generate_embeddings_batch), just through the correct mock attribute.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants