BE-471, BE-621: Add dedicated embedding search endpoints for entities and entity types#8910
Conversation
…ity types Add specialized `searchEntities` / `searchEntityTypes` endpoints (REST, GraphQL, and TypeScript SDK) for cosine-distance semantic search, replacing the generic `cosineDistance` filter path. The interface is intentionally narrow: a required maximum semantic distance bounded to [0, 2], a capped limit, an exclusive embedding/semanticString input, and a typed filter (entity type ids, web ids, drafts). Results are pinned to the current time and never include archived entities. Make `Filter::CosineDistance` non-deserializable (`#[serde(skip)]`) so it can no longer be injected through the generic query endpoints. Migrate the frontend search bar and the AI worker's find-existing-entity to the new endpoints. Semantic search in the GPT query endpoints is disabled for now (TODO BE-624). Optimizing the query shape and indexing is deferred to BE-618.
endpoints Follow-up to the embedding search endpoints, addressing review feedback: - Add a `SemanticDistance` newtype (finite, within [0, 2]) in the graph store and use it on the search param types, so the invariant holds at the store layer regardless of the caller. - Validate the distance and limit when converting the REST request bodies in `into_params`, surfacing a typed `SearchRequestError` (422) instead of an ad-hoc pre-check, and add `SearchEntityTypesRequest::into_params` for symmetry with the entity endpoint. - Clarify the GPT `query` TODO: the field is currently accepted but ignored (BE-624). - Add unit tests for the distance boundary (including NaN/±inf) and a serialize/deserialize round-trip for the search response.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
PR SummaryMedium Risk Overview Removes public GPT Adds Reviewed by Cursor Bugbot for commit 612b898. Bugbot is set up for automated code reviews on this repo. Configure here. |
…ed-search-queries
There was a problem hiding this comment.
Pull request overview
Adds first-class embedding-similarity search for entities and entity types across the Graph REST API, store layer, GraphQL surface area, and TypeScript SDK/frontend, replacing the prior client-side cosineDistance filter rewriting approach.
Changes:
- Introduces
POST /entities/searchandPOST /entity-types/searchplus corresponding store trait methods and Postgres implementations using an internalCosineDistancefilter. - Adds a validated
SemanticDistancenewtype (range[0, 2]) and removes public exposure of thecosineDistancefilter from OpenAPI/serde. - Updates GraphQL + TS SDK + frontend search bar + AI worker to use the new dedicated search endpoints; temporarily disables semantic search in GPT query endpoints.
Reviewed changes
Copilot reviewed 32 out of 33 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/graph/integration/postgres/lib.rs | Wires new store search methods through the integration test DB API wrapper. |
| libs/@local/hash-isomorphic-utils/src/graphql/type-defs/ontology/entity-type.typedef.ts | Adds GraphQL scalar + query field for searchEntityTypes. |
| libs/@local/hash-isomorphic-utils/src/graphql/type-defs/knowledge/entity.typedef.ts | Adds GraphQL scalar + query field for searchEntities. |
| libs/@local/hash-isomorphic-utils/src/graphql/scalar-mapping.ts | Maps new GraphQL scalars to SDK request/response types. |
| libs/@local/hash-isomorphic-utils/src/graphql/queries/entity.queries.ts | Adds searchEntities GraphQL query document. |
| libs/@local/hash-backend-utils/src/flows.ts | Removes semantic-filter rewriting now that generic queries no longer support it. |
| libs/@local/graph/type-fetcher/src/store.rs | Exposes new search methods through the type-fetcher store wrapper. |
| libs/@local/graph/store/src/filter/semantic_distance.rs | Adds validated SemanticDistance and unit tests for boundaries/non-finite values. |
| libs/@local/graph/store/src/filter/mod.rs | Exports SemanticDistance and hides CosineDistance from serde. |
| libs/@local/graph/store/src/entity/store.rs | Adds EntityStore::search_entities params/response/filter types and trait method. |
| libs/@local/graph/store/src/entity/mod.rs | Re-exports the new entity search types. |
| libs/@local/graph/store/src/entity_type/store.rs | Adds EntityTypeStore::search_entity_types params/response types and trait method. |
| libs/@local/graph/store/src/entity_type/mod.rs | Re-exports the new entity-type search types. |
| libs/@local/graph/sdk/typescript/tests/entity.test.ts | Adds serialization round-trip test for SearchEntitiesResponse. |
| libs/@local/graph/sdk/typescript/src/entity.ts | Adds searchEntities, request/response types, and serialize/deserialize helpers; removes semantic rewrite in query paths. |
| libs/@local/graph/sdk/typescript/src/entity-type.ts | Adds searchEntityTypes and request/response types, with optional embedding computation via Temporal. |
| libs/@local/graph/sdk/typescript/src/embeddings.ts | Replaces filter rewriting with calculateEmbedding(semanticString, temporalClient). |
| libs/@local/graph/postgres-store/src/store/postgres/ontology/entity_type.rs | Implements search_entity_types via CosineDistance filter and current-time query. |
| libs/@local/graph/postgres-store/src/store/postgres/knowledge/entity/mod.rs | Implements search_entities via CosineDistance + scope filters + archived exclusion. |
| libs/@local/graph/api/src/rest/mod.rs | Adds SearchRequestError and removes CosineDistanceFilter from OpenAPI filter schema addon. |
| libs/@local/graph/api/src/rest/entity.rs | Adds /entities/search REST route + handler and improves invalid-argument status attachment. |
| libs/@local/graph/api/src/rest/entity_type.rs | Adds /entity-types/search REST route + handler and request-to-params conversion. |
| libs/@local/graph/api/src/rest/entity_query_request.rs | Adds REST request struct for entity search and conversion to store params. |
| libs/@local/graph/api/openapi/openapi.json | Adds OpenAPI paths/schemas for the new search endpoints and removes cosineDistance filter schema. |
| apps/hash-frontend/src/shared/layout/layout-with-header/search-bar.tsx | Migrates the frontend search bar to the new GraphQL search queries and closed-type map usage. |
| apps/hash-frontend/src/graphql/queries/ontology/entity-type.queries.ts | Adds searchEntityTypes GraphQL query document for the frontend. |
| apps/hash-api/src/graphql/resolvers/ontology/entity-type.ts | Adds GraphQL resolver for searchEntityTypes calling the SDK. |
| apps/hash-api/src/graphql/resolvers/knowledge/entity/entity.ts | Adds GraphQL resolver for searchEntities calling the SDK and serializing the response. |
| apps/hash-api/src/graphql/resolvers/index.ts | Registers new GraphQL resolvers for entity/entity-type search. |
| apps/hash-api/src/ai/gpt/gpt-query-types.ts | Temporarily disables semantic search on GPT type querying (accepts but ignores query). |
| apps/hash-api/src/ai/gpt/gpt-query-entities.ts | Temporarily disables semantic search on GPT entity querying (accepts but ignores query). |
| apps/hash-ai-worker-ts/src/activities/shared/find-existing-entity.ts | Migrates semantic matching to searchEntities instead of cosineDistance filters. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8910 +/- ##
==========================================
- Coverage 59.77% 59.75% -0.02%
==========================================
Files 1348 1349 +1
Lines 131787 132019 +232
Branches 5941 5935 -6
==========================================
+ Hits 78772 78892 +120
- Misses 52107 52220 +113
+ Partials 908 907 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Benchmark results
|
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2002 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1002 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 3314 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 1527 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 2078 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 1033 | Flame Graph |
policy_resolution_medium
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 102 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 52 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 269 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 108 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 133 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 63 | Flame Graph |
policy_resolution_none
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 2 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 8 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 3 | Flame Graph |
policy_resolution_small
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 52 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 26 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 94 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 27 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 66 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 29 | Flame Graph |
read_scaling_complete
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id;one_depth | 1 entities | Flame Graph | |
| entity_by_id;one_depth | 10 entities | Flame Graph | |
| entity_by_id;one_depth | 25 entities | Flame Graph | |
| entity_by_id;one_depth | 5 entities | Flame Graph | |
| entity_by_id;one_depth | 50 entities | Flame Graph | |
| entity_by_id;two_depth | 1 entities | Flame Graph | |
| entity_by_id;two_depth | 10 entities | Flame Graph | |
| entity_by_id;two_depth | 25 entities | Flame Graph | |
| entity_by_id;two_depth | 5 entities | Flame Graph | |
| entity_by_id;two_depth | 50 entities | Flame Graph | |
| entity_by_id;zero_depth | 1 entities | Flame Graph | |
| entity_by_id;zero_depth | 10 entities | Flame Graph | |
| entity_by_id;zero_depth | 25 entities | Flame Graph | |
| entity_by_id;zero_depth | 5 entities | Flame Graph | |
| entity_by_id;zero_depth | 50 entities | Flame Graph |
read_scaling_linkless
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | 1 entities | Flame Graph | |
| entity_by_id | 10 entities | Flame Graph | |
| entity_by_id | 100 entities | Flame Graph | |
| entity_by_id | 1000 entities | Flame Graph | |
| entity_by_id | 10000 entities | Flame Graph |
representative_read_entity
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1
|
Flame Graph |
representative_read_entity_type
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| get_entity_type_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba
|
Flame Graph |
representative_read_multiple_entities
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_property | traversal_paths=0 | 0 | |
| entity_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=0 | 0 | |
| link_by_source_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true |
scenarios
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| full_test | query-limited | Flame Graph | |
| full_test | query-unlimited | Flame Graph | |
| linked_queries | query-limited | Flame Graph | |
| linked_queries | query-unlimited | Flame Graph |
🌟 What is the purpose of this PR?
Adds dedicated embedding-similarity search endpoints for entities and entity types. Previously, semantic search was done by passing a
cosineDistancefilter through the generic query endpoints, rewritten client-side (rewriteSemanticFilter). That approach is replaced by first-classsearchendpoints that compute the embedding (from asemanticString), apply a validated maximum-distance threshold, and return results ordered by ascending cosine distance.🔗 Related links
🚫 Blocked by
🔍 What does this change?
POST /entities/searchandPOST /entity-types/searchon the graph API, withSearchEntitiesRequest/SearchEntityTypesRequestbodies converted viainto_params.EntityStore::search_entities/EntityTypeStore::search_entity_types+ Postgres impls (build aCosineDistancefilter, run against current time, exclude archived entities; delegate to the existing query path).SemanticDistancenewtype (finite, within[0, 2]) enforcing the threshold invariant at the store layer; an invalid distance or over-limit request surfaces a typedSearchRequestError→422.cosineDistancefilter (#[serde(skip)]+ OpenAPI), and the client-siderewriteSemanticFilter(replaced bycalculateEmbedding);queryEntities/queryEntitySubgraphno longer take atemporalClient.searchEntities/searchEntityTypes(+ serialize/deserialize helpers), GraphQL resolvers/typedefs/queries, and the frontend search bar migrated to the new endpoints.query-entities/query-typesendpoints: semantic search is temporarily disabled — thequeryfield is accepted but ignored (tracked in BE-624).Pre-Merge Checklist 🚀
🚢 Has this modified a publishable library?
This PR:
📜 Does this require a change to the docs?
The changes in this PR:
🕸️ Does this require a change to the Turbo Graph?
The changes in this PR:
MIN(<=> ) GROUP BYshape) — tracked in BE-618.query-entities/query-typesendpoints, and on data-type / property-type queries, is removed for now (no dedicatedsearchendpoint exists for those).🐾 Next steps
🛡 What tests cover this?
SemanticDistanceboundary (0,2, out-of-range,NaN,±inf).friendship.http) migrated to the new endpoints.entityTypeIds/webIds/includeDrafts) / archived exclusion — the underlying query path is covered transitively, but a focused integration test is a sensible follow-up.❓ How to test this?
searchEntities/searchEntityTypes(SDK/GraphQL) with asemanticString,maximumSemanticDistance, andlimit.maximumSemanticDistanceoutside[0, 2]returns a422.