feat(search)!: engine- and domain-agnostic query model, Typesense compiler, and GraphQL surface#529
Open
ddeboer wants to merge 35 commits into
Open
feat(search)!: engine- and domain-agnostic query model, Typesense compiler, and GraphQL surface#529ddeboer wants to merge 35 commits into
ddeboer wants to merge 35 commits into
Conversation
ae639ea to
66969a2
Compare
…d result types - replace FieldSpec and Projection with one SearchField/SearchSchema model - add SearchQuery, Filter, Sort and the filter-operator semantics - add the SearchEngine port and result types (SearchResult/SearchHit/ResultDocument/Reference) - add physicalFields (the shared fanout convention) and schema selectors - rewrite projectDocument and projectGraph onto the unified model; projection output unchanged - remove FieldSpec, Projection and the discriminated FieldKind (breaking)
… and SearchEngine - buildCollectionSchema derives a Typesense collection from the unified SearchField model - buildSearchParams compiles SearchQuery into Typesense params (filter_by/sort_by/facet_by/query_by) - createTypesenseSearchEngine implements the SearchEngine port: compile, search, reconstruct - resolve reference and reference-facet labels from the sidecar labels collection in one lookup - add a testcontainer integration test and a generator-stability snapshot
- buildSearchSchema builds an executable GraphQLSchema from any SearchSchema at runtime (no codegen) - one generic resolver maps args to SearchQuery, calls the engine, and maps the result back - derive output, where, orderBy and facet types plus nullability from the field model - best-first Accept-Language output ordering; nullable facet label for reference facets - add printSearchSchema for a consumer SDL snapshot, plus a generator-stability snapshot
- state the decisions directly as the reconciled architecture, not deviations from a draft - remove the deviation/reconcile framing and the deviations-to-reconcile lists - align wording with the stack platform layer
- number fields now project as floats (not truncated like integer) - closes the step-1 gap so an int64-magnitude field mapped to number (Float) indexes
Replace the repo-path breadcrumb with a direct link to the docs site, so the status note points readers at the rendered page rather than a source file path.
… the group companion - Keyed per-type facets object on the GraphQL surface (ValueBucket / RangeBucket), selection-is-the-request with skip-own-filter. - Numeric range facets and an opt-in label cache in the Typesense adapter. - Reconcile ADRs 0003 and 0004 with the implementation. BREAKING CHANGE: remove SearchField.group and its *_group companion field, collection column and query split. Deployments denormalize group tokens into the field values instead, so a group is an ordinary facet value with no engine mechanism.
…@lde/* pins npm ci failed because the lockfile lacked the new @lde/search-api-graphql workspace. Regenerating against npmjs adds it and brings ~24 @lde/* internal deps up to their latest in-range patches; no third-party or duplicate-version changes.
… search-engine test `result.facets` is a `Partial` record, so a facet is `FacetBucket[] | undefined`; guard the two spreads with `?? []` so the `typecheck` target passes (it never ran in CI before the lockfile fix).
…ations Fold the unified-field-model blockquote and the dated Consequences bullet into running text, so the ADR reads as the current design rather than a change log.
- SearchType is one root type declaration (one SHACL NodeShape, one GraphQL object type); SearchSchema now names the whole search declaration: a ReadonlyMap of SearchTypes keyed by type IRI, built with the new searchSchema() factory - projectGraph now consumes a SearchSchema instead of a SearchType array - rename buildSearchSchema / printSearchSchema / BuildSearchSchemaOptions to buildGraphQLSchema / printGraphQLSchema / BuildGraphQLSchemaOptions: they construct a GraphQLSchema rather than the SearchSchema the old names implied - rename schema parameters to searchType where they take one type, and the FacetFieldsOf/OutputFieldsOf/EngineFor/ResultFor generic from Schema to Type - add a Terminology section to the @lde/search README mapping SearchField / SearchType / SearchSchema onto SHACL and GraphQL; update ADRs 3 and 4, the package READMEs and npm descriptions - drop section-divider comments in build-schema.ts and stale grouped-facet mentions in the READMEs BREAKING CHANGE: the per-type interface SearchSchema is renamed to SearchType, and SearchSchema now denotes the type-keyed map built with searchSchema(). projectGraph(quads, types[]) becomes projectGraph(quads, searchSchema(...types)). In @lde/search-api-graphql, buildSearchSchema, printSearchSchema and BuildSearchSchemaOptions are renamed to buildGraphQLSchema, printGraphQLSchema and BuildGraphQLSchemaOptions.
- add the missing @lde/search-api-graphql row to the packages table - add the search, search-typesense, search-api-graphql and text-normalization dependency edges to the architecture diagram, which lacked the search family entirely
…ilters - Add referenceFields, fieldNamed, isRangeFacet, pageForOffset and the date storage codec (isoToUnixSeconds/unixSecondsToIso) to @lde/search, replacing local re-derivations in the Typesense adapter and the GraphQL surface - Route the adapter's localized display and sort field names through physicalFields instead of hand-built name interpolation - Compile a date field's ISO range bounds to the stored Unix seconds; they were previously interpolated verbatim into the int64 filter and could never match - Project boolean fields from a path (xsd:boolean lexical space) instead of silently skipping them - Resolve reference labels in a single multi_search POST and start the cached label load alongside the main search - Remove dead API introduced on this branch: acceptsFilter, filterOperator, ResultFor, PhysicalFields.value, the Sort re-export and the toLanguageStrings package export; drop resolvers that duplicate graphql-js defaults - Trim ADR 4 to the shipped surface, deferring the TS mirror and extension hooks
…archSchema
- buildGraphQLSchema(schema, { types }) emits one root query field per
SearchType, so a single API serves multiple types (e.g. datasets and
people), each searchable in its own way; per-type typeName, queryField and
queryDefaults move into a types record keyed by type IRI, languageOrder
stays global
- create the shared types (LanguageString, buckets, filter inputs) once and
dedupe reference types across root types: Person and CreativeWork both
referencing Agent yield a single Agent type
- omit the where arg for a type with no filterable fields and the facets
field for a type with no facetable fields, which would be invalid empty
GraphQL types
- throw on a type without options, on options naming an unknown type, and on
two types deriving the same root query field
- test multiple root types: per-type derived types, the shared reference
type, per-root-field engine routing, and the build-time errors
- update ADR 4, the READMEs and the npm description accordingly
BREAKING CHANGE: buildGraphQLSchema and printGraphQLSchema take the whole
SearchSchema plus a per-type options record. Migrate
buildGraphQLSchema(searchType, { typeName }) to
buildGraphQLSchema(searchSchema(searchType), { types: { [searchType.type]:
{ typeName } } }).
- describe the unified field model directly rather than by contrast with pre-unification per-field configurations - drop the carried-through consequence bullet, keeping the folding contract (index and query normalize identically via @lde/text-normalization) as a direct claim
…ot its definition - define FieldKind, SearchField, SearchType and SearchSchema on their own terms; state the SHACL mapping as one possible source (a generator can emit declarations from NodeShapes + search: annotations) rather than defining the model as the runtime form of shapes - drop per-property SHACL parentheticals (sh:path, sh:maxCount, sh:minCount) from the SearchField members - align the README terminology intro and the ADR 3 field-model lead with the same framing
- engineFor(searchType, engine) returns the same instance typed as EngineFor<Type>: typo-safe facet and document keys with no generics at the call site (the const type parameter captures the literal) - SearchEngine gains a third Type parameter (default SearchType) so an EngineFor-typed engine also rejects a mismatched search type passed to search() at compile time - point the literal-capture guidance at defineSearchType alongside as const satisfies SearchType
… compiler target - 'compiler target' read as the final output (the engine query), while SearchQuery is the middle: surfaces compile into it, engine adapters compile out of it - reword README, the SearchQuery JSDoc and ADR 3 accordingly
- add a short list under the intro linking the packages that sit on the core's ports: the search-typesense engine adapter and the search-api-graphql surface, with a REST surface to follow
- intro now leads with the family-level value: one declarative SearchSchema, and the projection, collection schema, query semantics and API surface are all derived from it - state the core as engine-, API- and domain-agnostic (API was missing) - adapters plug into the ports (hexagonal parlance) instead of sitting on them
…cles - name the exact port per tier: engine adapters implement SearchEngine; API surfaces drive it, parsing client input into the SearchQuery IR - drop the mixed articles in the four-things list
… as next adapter - the one-field-four-consumers sentence repeated the intro, the unified-field-model bullet and the diagram above it; the capability flags are explained in the Field model section - name OpenSearch as the engine adapter to follow and tighten the derived-artifacts sentence
- align the facetRanges JSDoc with the README, which names OpenSearch as the engine adapter to follow
- types options are an exact join with the schema (build-time errors in both directions), so partial exposure goes through a narrower schema argument, not through omitting options
- SearchType gains a required name (PascalCase, e.g. 'Dataset'), mirroring SearchField.name: the declaration itself names the type in every API surface, so surface config no longer has to - buildGraphQLSchema derives all GraphQL type names and the default root query field from it; the per-type options lose typeName and become optional fine-tuning (queryField, queryDefaults) - document the pipeline as pure data transformations (three chains meeting at the engine) in the search README BREAKING CHANGE: every SearchType declaration must add a name; the GraphQL surface's per-type options no longer accept typeName and the types option is now optional.
…nored-filter reporting - rebuild(client, searchType, documents, options) derives the collection schema internally (buildCollectionSchema); the logical index name is the explicit options.name; options exported as RebuildOptions - buildCollectionSchema no longer assumes Dutch: defaultLocale is a pure opt-in, and without it non-localized search fields stay folded but unstemmed, so no language is silently applied - buildSearchParams now skips a where clause whose operator does not match the field's kind (it previously reached the engine as garbage) and reports every skipped clause via the new onIgnoredFilter callback, also exposed on TypesenseSearchEngineOptions BREAKING CHANGE: rebuild takes a SearchType plus options.name instead of a prebuilt CollectionCreateSchema; buildCollectionSchema no longer defaults defaultLocale to 'nl'.
- add validateQuery/assertValidQuery to the core: structural validation of where (declared, filterable, operator matches kind), facets (declared, facetable) and orderBy (declared or relevance) against the SearchType; vacuous clauses (empty in, boundless range) are no-ops, not issues - the port contract now requires every adapter to reject a structurally invalid query; the Typesense engine enforces it on every search, so validation holds for every caller (queryDefaults policies, in-process callers, weaker-typed surfaces), not only GraphQL-validated input - onIgnoredFilter consequently narrows to vacuous clauses at the engine level; share filterOperator from the core instead of a compiler copy BREAKING CHANGE: TypesenseSearchEngine.search now throws on a structurally invalid query instead of silently dropping the offending clauses.
- link GraphQLSchema to graphql-js and SearchSchema to its definition in the @lde/search terminology table - describe the resolver precisely: one shared implementation, one instance per root field bound to its SearchType - consolidate the no-drift story into @lde/search (the family entry point); keep only the surface-specific frozen-contract guard here - explain the SDL snapshot guard with a code sample and the accept-a-diff workflow
- one convention across the family: a function takes the value it operates on first and the SearchType right after (search(query, type), projectDocument(node, type)); engineFor(engine, type) now complies, and the README states the rule BREAKING CHANGE: engineFor's parameters swapped from (searchType, engine) to (engine, searchType).
…ameter order - export the query-compiler options as BuildSearchParamsOptions (they were public-by-signature but unnameable) and have TypesenseSearchEngineOptions extend them, so maxFacetValues and onIgnoredFilter are declared once and the engine forwards its options wholesale - rebuild(client, documents, searchType, options) now follows the family-wide value-first, declaration-second parameter convention - turn the engine's search-steps sentence into a bulleted list in the README BREAKING CHANGE: rebuild's documents and searchType parameters swapped places.
…e consumer - no codegen step, no generated files to commit and review, no stale artifact drifting from the declaration; name the trade-off and point at the snapshot guard that restores it
- gather the parameter-order rule and the factory-verb vocabulary (define captures a declaration, build is pure data-to-data, create makes a stateful instance) into one API conventions section
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Reworks
@lde/searchand@lde/search-typesenseinto a unified, engine- anddomain-agnostic search API, and adds
@lde/search-api-graphql(the new GraphQL surface).@lde/searchand@lde/search-typesensealready existed in the repo; this reworks them(breaking), it does not introduce them. One declarative search schema drives projection, the
engine collection schema, the query semantics, and the GraphQL surface – so they cannot
drift. The domain types (
Dataset,Person, …) and the engine choice (Typesense, …) are theconsumer’s, configured at the seams; the libraries never name a domain.
Terminology
The model has three levels (see the Terminology section in the
@lde/searchREADME):SearchFieldkind, IRpath, capability flagsSearchTypetypeIRI + fields + derivationsSearchSchemaSearchType, keyed bytypeIRI; built withsearchSchema(…)projectGraphand the GraphQL surface consume aSearchSchema; the engine port executesone
SearchTypeat a time.Review guide
Three tiers by stability; spend review effort accordingly.
1. Stable API Contract (the emitted GraphQL SDL): highest scrutiny. The consumer-facing
surface a Presentation Layer couples to. Its stability is independent of
@ldepackage versionsand is guarded by the
printGraphQLSchemaSDL snapshot, so this is the one part that must stayright.
@lde/search-api-graphqlbuild-schema.ts: output types,where/orderBy/facet inputs,named reference types, nullability.
2.
@ldelibrary API (0.x, still stabilizing): review the design, not for permanence.Developer-facing package APIs; pre-1.0 a breaking change is a routine minor bump (our nx
adjustSemverBumpsForZeroMajorVersion), so review for correctness and shape, not as frozen.@lde/searchengine.ts(SearchEngineport + result types),query.ts(SearchQueryIR /filter operators),
schema.ts(SearchField/SearchType/SearchSchemamodel).3. Internal / swappable: lower scrutiny. Behind the port; changeable without consumer impact
(ADR 0003).
@lde/searchproject.ts/frame-by-type.ts;@lde/search-typesensequery-compiler.ts/collection-schema.ts/search.ts.The neutral-fixture snapshot tests pin each generator; a snapshot diff flags a generated-shape
change, so start there.
Not in this PR: the consumer (Dataset Register) side, including the hand-written
dr:*CONSTRUCTs, lands in a separate DR PR after these packages publish. Those CONSTRUCTs are
provisional (slated for SHACL-driven replacement) and will be guarded there by a
schema/CONSTRUCT contract test, so none of that review burden is here.
Packages
@lde/search(core) — breakingSearchField/SearchTypereplaces the projectionFieldSpec/Projectionand the discriminatedFieldKind. ASearchTypedeclares its ownlogical API
name(Dataset), mirroringSearchField.name, so surfaces derive their typenames from the declaration rather than per-surface config.
SearchSchemais the map ofSearchTypes keyed bytypeIRI,built with the new
searchSchema()factory;projectGraph(quads, schema)consumes it.SearchQuery/Filter/Sort) and filter-operator semantics,plus always-on structural query validation (
validateQuery/assertValidQuery): the portcontract requires every engine adapter to reject a query referencing unknown or
non-filterable fields, mismatched operators or non-facetable facets — enforcement that holds
for every caller (deployment
queryDefaults, in-process callers, weaker-typed surfaces),not only GraphQL-validated input.
SearchEngineport and logical result types (SearchResult/SearchHit/ResultDocument/Reference/LocalizedValue/FacetBucket).physicalFields(the shared physical-fanout convention), the field selectors(
searchableFields,facetableFields, …) and shared helpers (isRangeFacet,pageForOffset, date conversion).defineSearchType(captures a declaration as a literal withoutas const satisfies) andengineFor(narrows aSearchEngineto one type’s facet/outputkeys — and to that type as the only accepted
search()argument — at zero runtime cost).declarations from NodeShapes +
search:annotations), not a dependency.projectDocument/projectGraphonto the unified model; projection output isunchanged — the guardrail test was ported field-for-field.
FieldSpec,Projection, and the discriminatedFieldKindare removed. Theper-type declaration is
SearchType(formerly namedSearchSchema).@lde/search-typesense(engine adapter) — breakingbuildCollectionSchemaderives a Typesense collection from the field model (kind→type, thephysical fanout via
physicalFields, per-locale stemming, required / default-sorting-field).Stemming for non-localized fields is opt-in via
defaultLocale(no more silentnldefault); unset leaves those fields folded but unstemmed.
buildSearchParamscompilesSearchQueryinto Typesense params —filter_by/sort_by/facet_by/query_bywith active-locale weighting and exact membership for non-facetfields (grouped facets are ordinary denormalized values, not a special clause). The engine
validates every query up front (
assertValidQuery, the port contract) and throws onstructural invalidity; a vacuous clause (empty
inlist, boundlessrange) is skipped as ano-op and reported via the
onIgnoredFiltercallback.createTypesenseSearchEngineimplements theSearchEngineport end to end: it reconstructslogical documents and resolves reference (and reference-facet) labels from the sidecar
labelscollection in a single lookup.rebuild(client, documents, searchType, { name, … })blue/green-rebuilds an index straightfrom the declaration: it derives the collection schema internally, so declaration → live
index is one call, with the logical index name explicit in the options (
RebuildOptions).TypesenseSearchEngineOptionsextends the exportedBuildSearchParamsOptions, so each compiler knob (maxFacetValues,onIgnoredFilter) isdeclared once. Parameter order follows the family-wide value-first, declaration-second
convention documented in the
@lde/searchREADME.@lde/search-api-graphql(GraphQL surface) — newbuildGraphQLSchema(schema)builds an executableGraphQLSchemaat runtime from thewhole
SearchSchema(no codegen, no SDL artifact): one root query field perSearchType(e.g.datasetsandpeoplein one API), each searchable in its own waythrough its own output/
where/orderBy/facet types. Type names come from eachSearchType’sname; the per-type options (keyed by type IRI) are optional fine-tuning(
queryField,queryDefaults), andlanguageOrderis global.LanguageString, buckets, filter inputs) are created once; reference typesdedupe across root types (two types referencing
Agentyield oneAgenttype). A typewith no filterable fields gets no
wherearg, one with no facetable fields nofacetsfield (empty GraphQL types are invalid).
where/orderBy/facet inputs, named reference types, and nullability(from
required/array/kind) from the field model; best-firstAccept-Languageoutput ordering; a nullable facet
labelresolved for reference facets only.SearchTypeover anySearchEngine; clear build-time errors for options naming an unknown type and forduplicate root query fields.
printGraphQLSchemafor a consumer-side SDL snapshot guard.Notes
(
SearchTypeper NodeShape,SearchSchemaas the type-keyed map), the whole-schemaGraphQL build (compose-before-build), the
SearchEngineport/adapter naming,size→Float(int64 overflow), the typed-surface design, and facet labels.@lde/search-api-graphqlrow and the architecturediagram gains the search family (
search,search-typesense,search-api-graphql,text-normalization).idOnly/inlinereference strategies, theOutputOf<S>typed-surfaceoverlay, and a REST surface.