Recurse into nested class_derivations in validate-spec#231
Recurse into nested class_derivations in validate-spec#231amc-corey-cox wants to merge 9 commits into
Conversation
Walk nested class_derivations declared on slot_derivations and validate them like top-level CDs, threading the parent CD through so cross-table boundaries can be diagnosed. Adds three checks: - Cross-table mismatch: when a nested CD's populated_from differs from its parent's, emit INFO when an implicit join would synthesize at runtime (#212) and WARNING when no implicit join is possible. Closes #211. - Joined-class attribute refs: expressions like {alias.field} are now validated against the joined class's slot list — for both explicit joins: aliases and nested-CD implicit-join targets. Replaces the previous "drop join aliases from check" behavior. Closes #213. - is_a / mixins resolution: string references resolve against the spec-internal class_derivation name pool first (matches the runtime's _find_class_derivation_by_name), then against target schema classes via SchemaView, else error. Closes #219. ValidationMessage.severity widens to Literal["error", "warning", "info"] to carry the implicit-join INFO.
There was a problem hiding this comment.
Pull request overview
This PR enhances validate-spec semantic validation to traverse nested class_derivations under slot_derivations, enabling earlier detection of cross-table nesting issues, validating {alias.field} expression references against joined classes, and resolving is_a/mixins references against spec-internal derivations and/or the target schema.
Changes:
- Added recursive semantic validation of nested
class_derivations, including a new cross-table joinability diagnostic for parent→childpopulated_frommismatches. - Added
extract_expr_attribute_references()and extended expression validation to check{alias.field}patterns against joined-class slot sets. - Added
is_a/mixinsreference resolution via a top-level class-derivation name pool with fallback to target-schema classes; widenedValidationMessage.severityto include"info".
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/linkml_map/validator.py |
Implements nested-CD recursion, cross-table join diagnostics, joined-class expression attribute validation, and inheritance reference resolution; adds "info" severity. |
tests/test_validator.py |
Adds unit tests covering attribute reference extraction, nested cross-table diagnostics, joined-class attribute validation, and is_a/mixins resolution behavior. |
- Cross-table join: when an explicit joins: entry covers the nested
source, also verify the spec actually has 'join_on' or both
'source_key' and 'lookup_key'. An empty entry (structurally valid
but missing keys) now warns at validate-spec time instead of failing
at transform time with ValueError.
- is_a / mixins: skip the resolution check entirely when target_sv is
None. The "not in pool" half is genuinely ambiguous without the
target schema — the reference could still resolve there. Erroring
on half-checked pools was a false-positive risk.
- Joined-class error wording: include both the resolved class name
and the alias when they differ ('joined class X (alias Y)') so users
can tell which schema class was actually checked when class_named is
set.
- _build_joined_class_map now returns
dict[alias, (class_name, slot_set | None)] so the consumer can
surface the resolved class name in error messages.
Added five new regression tests covering each scenario.
When an implicit join would synthesize at runtime (#212), additionally check whether parent and child share non-join columns. Runtime behavior on those columns is data-dependent (#217 Bug 2): rows with a join match merge and mark shared columns _AMBIGUOUS (unqualified access errors), rows without a match leave the bare parent row in place and silently return the parent's value. Same spec, same dataset, can both error and silently succeed depending on which row is being processed. The hazard is fully determined by the schemas — surface it at validate-spec time instead of waiting for a sparse row to expose it. This addresses the static half of #217 Bug 2. The runtime behavior itself (consistent _AMBIGUOUS insertion regardless of join hit, and nested-object shape on no-match) remains a follow-up.
|
Scope refinement after handover from a runtime-side session: Re #213: The PR body says "partial coverage of #213 — runtime-side eval-time validation is separately needed for the bare Re #217 (new in d4a7ca2): Added a static warning for the column-overlap hazard described in #217 Bug 2. When |
The overlap warning added in d4a7ca2 was unreachable given how pick_join_key works: it returns a non-None key only when exactly one non-identifier column is common (or when only identifier columns are common, as a fallback). Any remaining overlap after subtracting the join key is therefore identifier-only. Filtering identifiers (the only sensible refinement) would make the check fire on zero realistic inputs. The genuine "two non-identifier columns in common, one is the join key, the other is a real data column that resolves ambiguously" hazard is already covered by the existing 'multiple candidate join columns' WARNING — pick_join_key refuses to pick in that scenario, so we fall through to the ambiguous-candidates path, not the implicit-join INFO branch. Dropping the warning and the three tests rather than carrying dead code on speculation that pick_join_key might one day get smarter.
|
Correction to the previous comment re #217: Reverted the column-overlap warning in 481d461. On a second look it was unreachable in practice: The genuine #217 Bug 2 hazard — two non-identifier columns in common where one is the join key and the other is a real data column — is already caught: So the practical position is:
Net: this PR no longer touches #217 with its own diagnostic; the issue's static hazard is incidentally covered by the cross-table check we already had. |
Both messages referred to "joins: block" wholesale, but the path fires whenever a specific entry for nested_source is absent — the parent CD may already have a joins: mapping for other tables. Reword to point at the specific missing entry instead of implying the whole block needs adding.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/linkml_map/validator.py:768
_check_cross_table_join()derivesparent_sourcefromparent_cd.get("populated_from")only. If the parent CD omitspopulated_from(identity mapping), the function returns early and skips cross-table mismatch diagnostics, but the runtime usesparent_class_deriv.populated_from or parent_class_deriv.namewhen resolving nested objects. To avoid missing #211-style warnings/infos, compute the effective parent source the same way (fallback to the parent CD’sname).
nested_source = nested_cd.get("populated_from")
parent_source = parent_cd.get("populated_from")
if not nested_source or not parent_source or nested_source == parent_source:
return
_derive_nested_objects in the runtime uses ``parent_class_deriv.populated_from or parent_class_deriv.name``, so a CD that omits ``populated_from`` falls back to its own name as the source class (the "identity" case). _build_joined_class_map and _check_cross_table_join previously only consulted ``populated_from``, so an identity-style outer CD with a nested cross-table reference silently slipped past validation. Mirror the runtime fallback in both places and add a regression test.
…l-map into validate-nested-cross-table
Closes #211. Closes #213. Closes #219.
Summary
Walk nested
class_derivationsdeclared onslot_derivationsand validate them like top-level CDs, threading the parent CD through so cross-table boundaries can be diagnosed at validate-spec time. Adds three checks plus the recursion shape that enables them.Checks added
Cross-table mismatch — closes #211
When a nested CD's
populated_fromdiffers from its parent's:joins:covers the nested source → no message.pick_join_keywould synthesize a join at runtime (Implicit cross-table join resolution for nested and regular class derivations #212) → INFO naming the column.no columns are sharedormultiple candidate join columns).The warning surfaces the silent-null hazard from the original issue at validate-spec time, instead of waiting until transform.
Joined-class attribute refs — closes #213
Expression refs like
{alias.field}are validated against the joined class's slot list. Previously, the validator dropped{alias.*}patterns from expression-reference checking entirely (a workaround for false positives on join aliases); now those refs are checked against the actual joined class. Covers both:joins:aliases (withclass_namedhonored)populated_fromThis is partial coverage of #213 — runtime-side eval-time validation is separately needed for the bare
{scroe}typo case. With #230 + this PR landed, the silent-null mechanism the issue describes is closed at the validate-spec layer.is_a / mixins resolution — closes #219 (Option C)
is_a(string) and each entry ofmixins(list) resolve in this order:class_derivationname pool — matchesTransformer._find_class_derivation_by_nameSchemaView— matches LinkML's plain-schemais_aconvention'is_a: TotallyMadeUpName' does not resolve to ...)Composes cleanly with #230's structural change: that PR makes the JSON Schema layer accept string references; this PR makes the semantic layer resolve them. They can land in any order; once both are in, the full chain works end-to-end.
Other changes
ValidationMessage.severitywidened toLiteral["error", "warning", "info"]to carry the implicit-join INFO.extract_expr_attribute_references()helper — returns{base: {attr, ...}}for{base.attr}patterns, mirroringextract_expr_slot_references()._build_joined_class_map(),_check_cross_table_join(),_collect_class_derivation_pool(),_check_class_inheritance_refs()helpers._validate_class_derivationnow takes optionalparent_class_deriv,parent_path, andderivation_poolkwargs; recurses into nested CDs._validate_slot_derivation'sjoined_aliases: set[str]becomesjoined_class_slots: dict[str, set[str] | None].Test plan
uv run pytest tests/test_validator.py— 86 passed (62 prior + 18 cross-table + 6 inheritance)uv run pytest— 828 passed, 4 skippeduv run ruff check .+uv run ruff format --check .— cleanpht006467/pht012814shape — emits the expected INFOAgent.is_a: Entityresolves via pool-1 (the emptyEntity:CD)