Skip to content

Forward schema_path/target_class to linkml's delimited file loader#235

Open
Sigfried wants to merge 1 commit into
linkml:mainfrom
Sigfried:forward-schema-path-to-delimited-loader
Open

Forward schema_path/target_class to linkml's delimited file loader#235
Sigfried wants to merge 1 commit into
linkml:mainfrom
Sigfried:forward-schema-path-to-delimited-loader

Conversation

@Sigfried
Copy link
Copy Markdown

Summary

Forward schema_path and target_class through TsvFileLoader, CsvFileLoader, DataLoader, and get_file_loader() so they reach linkml's underlying TsvLoader / CsvLoader. This enables schema-aware type coercion for delimited files, preventing string-ranged and enum-ranged columns from being silently coerced to int/float.

Without these parameters, a column like subject_id containing numeric-looking strings (e.g., "00123") gets loaded as 123 (int) by pandas' default inference, losing the leading zero and breaking downstream lookups.

Background

linkml/linkml#3289 ("Make delimited file loader schema-aware to preserve string/enum columns") added schema-awareness to linkml's _DelimitedFileLoader. It was released in linkml v1.11.0. This PR exposes those parameters through linkml-map's loader API so users of linkml-map can benefit from the same fix.

Changes

  • TsvFileLoader.__init__ / CsvFileLoader.__init__: accept optional schema_path and target_class
  • iter_instances(): forward both params to the underlying linkml loader
  • get_file_loader(): kwargs-only schema_path / target_class, passed only to TSV/CSV loader classes (other formats already do their own type handling)
  • DataLoader.__init__: accept and store both params
  • Tests: 11 new `TestSchemaAware*` tests in `tests/test_loaders/test_data_loader.py` covering string-range preservation, enum-range preservation, and the no-schema coercion fallback for each entry point (`TsvFileLoader`, `CsvFileLoader`, `get_file_loader`, `DataLoader`)

The change is additive — existing callers that don't pass the new params get current behavior. Annotations use PEP 604 `X | Y | None` syntax to match the rest of the file's style (no new `typing` imports needed).

Test plan

  • 42 tests pass in `tests/test_loaders/test_data_loader.py` (31 existing + 11 new)
  • Full `pytest` run: 831 passed, 4 skipped, 1 pre-existing failure (`test_graphviz_compiler` — fails on `main` too due to missing local `dot` binary, unrelated)
  • CI passes

🤖 Generated with Claude Code

Pass through the new schema-aware loading params so that string-ranged
and enum-ranged columns in TSV/CSV files are not coerced to int/float.

Requires linkml >=1.11 (PR linkml/linkml#3289 added schema-awareness to
the underlying _DelimitedFileLoader; released in v1.11.0).

Also imports nothing new from `typing` — the new annotations use PEP 604
`X | Y | None` syntax to match the rest of the file's style.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sigfried added a commit to linkml/dm-bip that referenced this pull request May 14, 2026
linkml/linkml#3289 was released in linkml v1.11.0; schema-automator/#188
was released in v0.5.5. Switch both from git URL pins to PyPI version
specifiers.

linkml-map fix is still unreleased (PR linkml/linkml-map#235 open) — its
git pin stays in place until that ships.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant