Skip to content

feat(search)!: engine- and domain-agnostic query model, Typesense compiler, and GraphQL surface#529

Open
ddeboer wants to merge 35 commits into
mainfrom
feat/search-core
Open

feat(search)!: engine- and domain-agnostic query model, Typesense compiler, and GraphQL surface#529
ddeboer wants to merge 35 commits into
mainfrom
feat/search-core

Conversation

@ddeboer

@ddeboer ddeboer commented Jun 28, 2026

Copy link
Copy Markdown
Member

What

Reworks @lde/search and @lde/search-typesense into a unified, engine- and
domain-agnostic
search API, and adds @lde/search-api-graphql (the new GraphQL surface).
@lde/search and @lde/search-typesense already existed in the repo; this reworks them
(breaking), it does not introduce them. One declarative search schema drives projection, the
engine collection schema, the query semantics, and the GraphQL surface – so they cannot
drift. The domain types (Dataset, Person, …) and the engine choice (Typesense, …) are the
consumer’s, configured at the seams; the libraries never name a domain.

Terminology

The model has three levels (see the Terminology section in the @lde/search README):

Term What it is SHACL GraphQL
SearchField One queryable field: kind, IR path, capability flags property shape field
SearchType One root type’s declaration: type IRI + fields + derivations NodeShape object type
SearchSchema Every SearchType, keyed by type IRI; built with searchSchema(…) shapes graph schema

projectGraph and the GraphQL surface consume a SearchSchema; the engine port executes
one SearchType at a time.

Review guide

Three tiers by stability; spend review effort accordingly.

1. Stable API Contract (the emitted GraphQL SDL): highest scrutiny. The consumer-facing
surface a Presentation Layer couples to. Its stability is independent of @lde package versions
and is guarded by the printGraphQLSchema SDL snapshot, so this is the one part that must stay
right.

  • @lde/search-api-graphql build-schema.ts: output types, where/orderBy/facet inputs,
    named reference types, nullability.

2. @lde library API (0.x, still stabilizing): review the design, not for permanence.
Developer-facing package APIs; pre-1.0 a breaking change is a routine minor bump (our nx
adjustSemverBumpsForZeroMajorVersion), so review for correctness and shape, not as frozen.

  • @lde/search engine.ts (SearchEngine port + result types), query.ts (SearchQuery IR /
    filter operators), schema.ts (SearchField/SearchType/SearchSchema model).

3. Internal / swappable: lower scrutiny. Behind the port; changeable without consumer impact
(ADR 0003).

  • @lde/search project.ts / frame-by-type.ts; @lde/search-typesense query-compiler.ts /
    collection-schema.ts / search.ts.

The neutral-fixture snapshot tests pin each generator; a snapshot diff flags a generated-shape
change, so start there.

Not in this PR: the consumer (Dataset Register) side, including the hand-written dr:*
CONSTRUCTs, lands in a separate DR PR after these packages publish. Those CONSTRUCTs are
provisional (slated for SHACL-driven replacement) and will be guarded there by a
schema/CONSTRUCT contract test, so none of that review burden is here.

Packages

@lde/search (core) — breaking

  • Unifies the field model: one SearchField / SearchType replaces the projection
    FieldSpec/Projection and the discriminated FieldKind. A SearchType declares its own
    logical API name (Dataset), mirroring SearchField.name, so surfaces derive their type
    names from the declaration rather than per-surface config.
  • Names the whole declaration: SearchSchema is the map of SearchTypes keyed by type IRI,
    built with the new searchSchema() factory; projectGraph(quads, schema) consumes it.
  • Adds the neutral query IR (SearchQuery / Filter / Sort) and filter-operator semantics,
    plus always-on structural query validation (validateQuery / assertValidQuery): the port
    contract requires every engine adapter to reject a query referencing unknown or
    non-filterable fields, mismatched operators or non-facetable facets — enforcement that holds
    for every caller (deployment queryDefaults, in-process callers, weaker-typed surfaces),
    not only GraphQL-validated input.
  • Adds the SearchEngine port and logical result types (SearchResult / SearchHit /
    ResultDocument / Reference / LocalizedValue / FacetBucket).
  • Adds physicalFields (the shared physical-fanout convention), the field selectors
    (searchableFields, facetableFields, …) and shared helpers (isRangeFacet,
    pageForOffset, date conversion).
  • Adds defineSearchType (captures a declaration as a literal without
    as const satisfies) and engineFor (narrows a SearchEngine to one type’s facet/output
    keys — and to that type as the only accepted search() argument — at zero runtime cost).
  • Docs define the model on its own terms; SHACL is an optional source (a generator can emit
    declarations from NodeShapes + search: annotations), not a dependency.
  • Rewrites projectDocument/projectGraph onto the unified model; projection output is
    unchanged — the guardrail test was ported field-for-field.
  • BREAKING: FieldSpec, Projection, and the discriminated FieldKind are removed. The
    per-type declaration is SearchType (formerly named SearchSchema).

@lde/search-typesense (engine adapter) — breaking

  • buildCollectionSchema derives a Typesense collection from the field model (kind→type, the
    physical fanout via physicalFields, per-locale stemming, required / default-sorting-field).
    Stemming for non-localized fields is opt-in via defaultLocale (no more silent nl
    default); unset leaves those fields folded but unstemmed.
  • buildSearchParams compiles SearchQuery into Typesense params — filter_by / sort_by /
    facet_by / query_by with active-locale weighting and exact membership for non-facet
    fields (grouped facets are ordinary denormalized values, not a special clause). The engine
    validates every query up front (assertValidQuery, the port contract) and throws on
    structural invalidity; a vacuous clause (empty in list, boundless range) is skipped as a
    no-op and reported via the onIgnoredFilter callback.
  • createTypesenseSearchEngine implements the SearchEngine port end to end: it reconstructs
    logical documents and resolves reference (and reference-facet) labels from the sidecar
    labels collection in a single lookup.
  • rebuild(client, documents, searchType, { name, … }) blue/green-rebuilds an index straight
    from the declaration: it derives the collection schema internally, so declaration → live
    index is one call, with the logical index name explicit in the options (RebuildOptions).
  • One options bag per concern: TypesenseSearchEngineOptions extends the exported
    BuildSearchParamsOptions, so each compiler knob (maxFacetValues, onIgnoredFilter) is
    declared once. Parameter order follows the family-wide value-first, declaration-second
    convention documented in the @lde/search README.
  • Covered by unit tests plus a Typesense testcontainer integration test.

@lde/search-api-graphql (GraphQL surface) — new

  • buildGraphQLSchema(schema) builds an executable GraphQLSchema at runtime from the
    whole SearchSchema (no codegen, no SDL artifact): one root query field per
    SearchType (e.g. datasets and people in one API), each searchable in its own way
    through its own output/where/orderBy/facet types. Type names come from each
    SearchType’s name; the per-type options (keyed by type IRI) are optional fine-tuning
    (queryField, queryDefaults), and languageOrder is global.
  • Shared types (LanguageString, buckets, filter inputs) are created once; reference types
    dedupe across root types (two types referencing Agent yield one Agent type). A type
    with no filterable fields gets no where arg, one with no facetable fields no facets
    field (empty GraphQL types are invalid).
  • Derives output types, where/orderBy/facet inputs, named reference types, and nullability
    (from required / array / kind) from the field model; best-first Accept-Language
    output ordering; a nullable facet label resolved for reference facets only.
  • One generic resolver per root field, routing to its own SearchType over any
    SearchEngine; clear build-time errors for options naming an unknown type and for
    duplicate root query fields.
  • Exports printGraphQLSchema for a consumer-side SDL snapshot guard.

Notes

  • ADRs 0003/0004 state the design as status quo: the unified model and its terminology
    (SearchType per NodeShape, SearchSchema as the type-keyed map), the whole-schema
    GraphQL build (compose-before-build), the SearchEngine port/adapter naming, size
    Float (int64 overflow), the typed-surface design, and facet labels.
  • Root README: the packages table gains the @lde/search-api-graphql row and the architecture
    diagram gains the search family (search, search-typesense, search-api-graphql,
    text-normalization).
  • Each generator ships a neutral-fixture snapshot to pin its output across versions.
  • Deferred: the idOnly/inline reference strategies, the OutputOf<S> typed-surface
    overlay, and a REST surface.

@ddeboer ddeboer force-pushed the feat/search-core branch from ae639ea to 66969a2 Compare June 28, 2026 18:51
@ddeboer ddeboer changed the title feat: unified engine- and domain-agnostic search API (@lde/search* family) feat(search)!: engine- and domain-agnostic query model, Typesense compiler, and GraphQL surface Jul 1, 2026
ddeboer added 14 commits July 3, 2026 09:30
…d result types

- replace FieldSpec and Projection with one SearchField/SearchSchema model

- add SearchQuery, Filter, Sort and the filter-operator semantics

- add the SearchEngine port and result types (SearchResult/SearchHit/ResultDocument/Reference)

- add physicalFields (the shared fanout convention) and schema selectors

- rewrite projectDocument and projectGraph onto the unified model; projection output unchanged

- remove FieldSpec, Projection and the discriminated FieldKind (breaking)
… and SearchEngine

- buildCollectionSchema derives a Typesense collection from the unified SearchField model

- buildSearchParams compiles SearchQuery into Typesense params (filter_by/sort_by/facet_by/query_by)

- createTypesenseSearchEngine implements the SearchEngine port: compile, search, reconstruct

- resolve reference and reference-facet labels from the sidecar labels collection in one lookup

- add a testcontainer integration test and a generator-stability snapshot
- buildSearchSchema builds an executable GraphQLSchema from any SearchSchema at runtime (no codegen)

- one generic resolver maps args to SearchQuery, calls the engine, and maps the result back

- derive output, where, orderBy and facet types plus nullability from the field model

- best-first Accept-Language output ordering; nullable facet label for reference facets

- add printSearchSchema for a consumer SDL snapshot, plus a generator-stability snapshot
- state the decisions directly as the reconciled architecture, not deviations from a draft

- remove the deviation/reconcile framing and the deviations-to-reconcile lists

- align wording with the stack platform layer
- number fields now project as floats (not truncated like integer)

- closes the step-1 gap so an int64-magnitude field mapped to number (Float) indexes
Replace the repo-path breadcrumb with a direct link to the docs site, so the
status note points readers at the rendered page rather than a source file path.
… the group companion

- Keyed per-type facets object on the GraphQL surface (ValueBucket / RangeBucket),
  selection-is-the-request with skip-own-filter.
- Numeric range facets and an opt-in label cache in the Typesense adapter.
- Reconcile ADRs 0003 and 0004 with the implementation.

BREAKING CHANGE: remove SearchField.group and its *_group companion field, collection
column and query split. Deployments denormalize group tokens into the field values
instead, so a group is an ordinary facet value with no engine mechanism.
…@lde/* pins

npm ci failed because the lockfile lacked the new @lde/search-api-graphql workspace.
Regenerating against npmjs adds it and brings ~24 @lde/* internal deps up to their latest
in-range patches; no third-party or duplicate-version changes.
… search-engine test

`result.facets` is a `Partial` record, so a facet is `FacetBucket[] | undefined`; guard the two
spreads with `?? []` so the `typecheck` target passes (it never ran in CI before the lockfile fix).
…ations

Fold the unified-field-model blockquote and the dated Consequences bullet into running
text, so the ADR reads as the current design rather than a change log.
- SearchType is one root type declaration (one SHACL NodeShape, one GraphQL
  object type); SearchSchema now names the whole search declaration: a
  ReadonlyMap of SearchTypes keyed by type IRI, built with the new
  searchSchema() factory
- projectGraph now consumes a SearchSchema instead of a SearchType array
- rename buildSearchSchema / printSearchSchema / BuildSearchSchemaOptions to
  buildGraphQLSchema / printGraphQLSchema / BuildGraphQLSchemaOptions: they
  construct a GraphQLSchema rather than the SearchSchema the old names implied
- rename schema parameters to searchType where they take one type, and the
  FacetFieldsOf/OutputFieldsOf/EngineFor/ResultFor generic from Schema to Type
- add a Terminology section to the @lde/search README mapping SearchField /
  SearchType / SearchSchema onto SHACL and GraphQL; update ADRs 3 and 4, the
  package READMEs and npm descriptions
- drop section-divider comments in build-schema.ts and stale grouped-facet
  mentions in the READMEs

BREAKING CHANGE: the per-type interface SearchSchema is renamed to SearchType,
and SearchSchema now denotes the type-keyed map built with searchSchema().
projectGraph(quads, types[]) becomes projectGraph(quads, searchSchema(...types)).
In @lde/search-api-graphql, buildSearchSchema, printSearchSchema and
BuildSearchSchemaOptions are renamed to buildGraphQLSchema, printGraphQLSchema
and BuildGraphQLSchemaOptions.
- add the missing @lde/search-api-graphql row to the packages table
- add the search, search-typesense, search-api-graphql and text-normalization
  dependency edges to the architecture diagram, which lacked the search family
  entirely
…ilters

- Add referenceFields, fieldNamed, isRangeFacet, pageForOffset and the date
  storage codec (isoToUnixSeconds/unixSecondsToIso) to @lde/search, replacing
  local re-derivations in the Typesense adapter and the GraphQL surface
- Route the adapter's localized display and sort field names through
  physicalFields instead of hand-built name interpolation
- Compile a date field's ISO range bounds to the stored Unix seconds; they were
  previously interpolated verbatim into the int64 filter and could never match
- Project boolean fields from a path (xsd:boolean lexical space) instead of
  silently skipping them
- Resolve reference labels in a single multi_search POST and start the cached
  label load alongside the main search
- Remove dead API introduced on this branch: acceptsFilter, filterOperator,
  ResultFor, PhysicalFields.value, the Sort re-export and the toLanguageStrings
  package export; drop resolvers that duplicate graphql-js defaults
- Trim ADR 4 to the shipped surface, deferring the TS mirror and extension hooks
@ddeboer ddeboer force-pushed the feat/search-core branch from 0b6234c to 0c31d67 Compare July 3, 2026 07:32
ddeboer added 11 commits July 3, 2026 11:11
…archSchema

- buildGraphQLSchema(schema, { types }) emits one root query field per
  SearchType, so a single API serves multiple types (e.g. datasets and
  people), each searchable in its own way; per-type typeName, queryField and
  queryDefaults move into a types record keyed by type IRI, languageOrder
  stays global
- create the shared types (LanguageString, buckets, filter inputs) once and
  dedupe reference types across root types: Person and CreativeWork both
  referencing Agent yield a single Agent type
- omit the where arg for a type with no filterable fields and the facets
  field for a type with no facetable fields, which would be invalid empty
  GraphQL types
- throw on a type without options, on options naming an unknown type, and on
  two types deriving the same root query field
- test multiple root types: per-type derived types, the shared reference
  type, per-root-field engine routing, and the build-time errors
- update ADR 4, the READMEs and the npm description accordingly

BREAKING CHANGE: buildGraphQLSchema and printGraphQLSchema take the whole
SearchSchema plus a per-type options record. Migrate
buildGraphQLSchema(searchType, { typeName }) to
buildGraphQLSchema(searchSchema(searchType), { types: { [searchType.type]:
{ typeName } } }).
- describe the unified field model directly rather than by contrast with
  pre-unification per-field configurations
- drop the carried-through consequence bullet, keeping the folding contract
  (index and query normalize identically via @lde/text-normalization) as a
  direct claim
…ot its definition

- define FieldKind, SearchField, SearchType and SearchSchema on their own
  terms; state the SHACL mapping as one possible source (a generator can emit
  declarations from NodeShapes + search: annotations) rather than defining the
  model as the runtime form of shapes
- drop per-property SHACL parentheticals (sh:path, sh:maxCount, sh:minCount)
  from the SearchField members
- align the README terminology intro and the ADR 3 field-model lead with the
  same framing
- engineFor(searchType, engine) returns the same instance typed as
  EngineFor<Type>: typo-safe facet and document keys with no generics at the
  call site (the const type parameter captures the literal)
- SearchEngine gains a third Type parameter (default SearchType) so an
  EngineFor-typed engine also rejects a mismatched search type passed to
  search() at compile time
- point the literal-capture guidance at defineSearchType alongside
  as const satisfies SearchType
… compiler target

- 'compiler target' read as the final output (the engine query), while
  SearchQuery is the middle: surfaces compile into it, engine adapters
  compile out of it
- reword README, the SearchQuery JSDoc and ADR 3 accordingly
- add a short list under the intro linking the packages that sit on the
  core's ports: the search-typesense engine adapter and the
  search-api-graphql surface, with a REST surface to follow
- intro now leads with the family-level value: one declarative
  SearchSchema, and the projection, collection schema, query semantics
  and API surface are all derived from it
- state the core as engine-, API- and domain-agnostic (API was missing)
- adapters plug into the ports (hexagonal parlance) instead of
  sitting on them
…cles

- name the exact port per tier: engine adapters implement SearchEngine;
  API surfaces drive it, parsing client input into the SearchQuery IR
- drop the mixed articles in the four-things list
… as next adapter

- the one-field-four-consumers sentence repeated the intro, the
  unified-field-model bullet and the diagram above it; the capability
  flags are explained in the Field model section
- name OpenSearch as the engine adapter to follow and tighten the
  derived-artifacts sentence
ddeboer added 10 commits July 3, 2026 13:06
- align the facetRanges JSDoc with the README, which names OpenSearch
  as the engine adapter to follow
- types options are an exact join with the schema (build-time errors in
  both directions), so partial exposure goes through a narrower schema
  argument, not through omitting options
- SearchType gains a required name (PascalCase, e.g. 'Dataset'),
  mirroring SearchField.name: the declaration itself names the type in
  every API surface, so surface config no longer has to
- buildGraphQLSchema derives all GraphQL type names and the default
  root query field from it; the per-type options lose typeName and
  become optional fine-tuning (queryField, queryDefaults)
- document the pipeline as pure data transformations (three chains
  meeting at the engine) in the search README

BREAKING CHANGE: every SearchType declaration must add a name; the
GraphQL surface's per-type options no longer accept typeName and the
types option is now optional.
…nored-filter reporting

- rebuild(client, searchType, documents, options) derives the collection
  schema internally (buildCollectionSchema); the logical index name is
  the explicit options.name; options exported as RebuildOptions
- buildCollectionSchema no longer assumes Dutch: defaultLocale is a pure
  opt-in, and without it non-localized search fields stay folded but
  unstemmed, so no language is silently applied
- buildSearchParams now skips a where clause whose operator does not
  match the field's kind (it previously reached the engine as garbage)
  and reports every skipped clause via the new onIgnoredFilter callback,
  also exposed on TypesenseSearchEngineOptions

BREAKING CHANGE: rebuild takes a SearchType plus options.name instead
of a prebuilt CollectionCreateSchema; buildCollectionSchema no longer
defaults defaultLocale to 'nl'.
- add validateQuery/assertValidQuery to the core: structural validation
  of where (declared, filterable, operator matches kind), facets
  (declared, facetable) and orderBy (declared or relevance) against the
  SearchType; vacuous clauses (empty in, boundless range) are no-ops,
  not issues
- the port contract now requires every adapter to reject a structurally
  invalid query; the Typesense engine enforces it on every search, so
  validation holds for every caller (queryDefaults policies, in-process
  callers, weaker-typed surfaces), not only GraphQL-validated input
- onIgnoredFilter consequently narrows to vacuous clauses at the engine
  level; share filterOperator from the core instead of a compiler copy

BREAKING CHANGE: TypesenseSearchEngine.search now throws on a
structurally invalid query instead of silently dropping the offending
clauses.
- link GraphQLSchema to graphql-js and SearchSchema to its definition
  in the @lde/search terminology table
- describe the resolver precisely: one shared implementation, one
  instance per root field bound to its SearchType
- consolidate the no-drift story into @lde/search (the family entry
  point); keep only the surface-specific frozen-contract guard here
- explain the SDL snapshot guard with a code sample and the
  accept-a-diff workflow
- one convention across the family: a function takes the value it
  operates on first and the SearchType right after (search(query,
  type), projectDocument(node, type)); engineFor(engine, type) now
  complies, and the README states the rule

BREAKING CHANGE: engineFor's parameters swapped from (searchType,
engine) to (engine, searchType).
…ameter order

- export the query-compiler options as BuildSearchParamsOptions (they
  were public-by-signature but unnameable) and have
  TypesenseSearchEngineOptions extend them, so maxFacetValues and
  onIgnoredFilter are declared once and the engine forwards its options
  wholesale
- rebuild(client, documents, searchType, options) now follows the
  family-wide value-first, declaration-second parameter convention
- turn the engine's search-steps sentence into a bulleted list in the
  README

BREAKING CHANGE: rebuild's documents and searchType parameters swapped
places.
…e consumer

- no codegen step, no generated files to commit and review, no stale
  artifact drifting from the declaration; name the trade-off and point
  at the snapshot guard that restores it
- gather the parameter-order rule and the factory-verb vocabulary
  (define captures a declaration, build is pure data-to-data, create
  makes a stateful instance) into one API conventions section
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant