Skip to content

SHACL-driven extraction generator: retire the hand-written dr:* CONSTRUCTs #548

Description

@ddeboer

Retire the provisional hand-written dr:* CONSTRUCTs (and the dr: intermediate namespace) by generating extraction from the search schema. Epic: #534.

Problem

DR's search extraction uses hand-written dr:* CONSTRUCTs (register-source.ts / dkg-source.ts) that emit a DR-private dr: intermediate vocabulary, which the projection then frames against. This is provisional debt: every new record type needs another hand-written CONSTRUCT, and the CONSTRUCTs can silently drift from the SearchSchema. The object grain (#540) is the forcing function — without this, indexing SCHEMA-AP objects means writing a fresh pile of dr:-style CONSTRUCTs.

Scope

  • Generate the per-target-class extraction CONSTRUCT from the SearchSchema's sh:paths (a read-side adapter, @lde/search-sparql-style), so the projection reads the SCHEMA-AP / SHACL vocabulary directly — no dr: intermediate.
  • Store pragmatics as generator config (e.g. the QLever two-CONSTRUCT split, single-subject VALUES binding), not hand-tuned per query.
  • A schema↔CONSTRUCT contract test guarding drift (the generated CONSTRUCT stays in sync with the schema).
  • Retire register-source.ts / dkg-source.ts dr:* CONSTRUCTs once the generator covers them.

Relates to

Acceptance

  • Object + catalog extraction generated from SearchSchema sh:paths; no dr: namespace in the pipeline.
  • Contract test fails on schema / CONSTRUCT drift.

v1 limitation: single-hop SearchField.path + references

SearchField.path is a single predicate IRI today (single-hop sh:path), while SHACL sh:path allows multi-hop sequence paths. v1 accepts this limitation:

  • Option A (v1): the generator maps single-hop sh:paths and follows references for nesting. A multi-hop extraction like creator/name is composed from single-hop segments across referenced schemas — the creator field's path (schema:creator) + the referenced Person schema's name-field path (schema:name) — bounded by the reference strategy (idOnly: stop at the IRI; labelOnly: + the referenced schema's label field; inline: + its projected fields). So the generated CONSTRUCT still traverses ?work schema:creator ?c . ?c schema:name ?name; the depth comes from following refs, not a sequence sh:path. This matches the Stack's phase-2 configurable depth and keeps depth bounded (no unbounded recursion). So refs are enough to generate the CONSTRUCTs.
  • Option B (deferred): extend SearchField.path to a sequence path (string | string[], or a small path expression) for flat fields that flatten a deep value without an intermediate reference. Only needed for pure-flatten-deep cases; deferred until a shape requires it.

Design context: #534.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Fields

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions