Skip to content

feat: OSI (Open Semantic Interchange) v0.1.1 compatibility#229

Draft
hachej wants to merge 4 commits intomainfrom
feat/osi-yaml-compat
Draft

feat: OSI (Open Semantic Interchange) v0.1.1 compatibility#229
hachej wants to merge 4 commits intomainfrom
feat/osi-yaml-compat

Conversation

@hachej
Copy link
Copy Markdown
Collaborator

@hachej hachej commented Apr 2, 2026

Summary

Adds bidirectional conversion between BSL's YAML format and the OSI (Open Semantic Interchange) v0.1.1 spec, addressing #226.

New module osi.py with:

  • to_osi() / to_osi_yaml() — export BSL models to OSI-compliant YAML
  • from_osi() / from_osi_yaml() — import OSI YAML into BSL SemanticModel instances

Key design decisions:

  • BSL Ibis Deferred expressions (_.col.sum()) are translated to SQL strings (SUM(col)) for OSI
  • BSL-specific metadata (is_entity, is_event_timestamp, smallest_time_grain, derived_dimensions) preserved via OSI custom_extensions for round-trip fidelity
  • ai_context field added to Dimension and Measure classes — supports both string and structured object (instructions/synonyms/examples) as per OSI spec
  • Entity dimensions (is_entity=True) automatically map to OSI primary_key
  • Time dimensions map to OSI dimension.is_time

What's included

File Description
src/boring_semantic_layer/osi.py Core converter module (export + import)
src/boring_semantic_layer/ops.py Added ai_context to Dimension/Measure; threaded through _extract_measure_metadata/_make_base_measure
src/boring_semantic_layer/yaml.py Parse ai_context from BSL YAML configs
src/boring_semantic_layer/__init__.py Export new functions
src/boring_semantic_layer/tests/test_osi.py 48 tests: expression conversion, export, import, round-trips
examples/flights_osi.yaml Flights example in OSI format
docs/osi-compatibility.md Gap analysis document

Gaps remaining for full OSI parity

OSI Feature Status Notes
ai_context at all levels Done On dimensions, measures, and top-level model
primary_key / unique_keys Partial is_entity maps to PK; unique_keys not yet modeled
Multi-dialect expressions Partial Exports as ANSI_SQL; multi-dialect input supported on import
custom_extensions Done Used for BSL-specific metadata round-trip
label on fields Not yet Low priority
Relationship join column extraction Partial Lambda predicates are hard to introspect
Complex expression translation Partial Handles common patterns (SUM, AVG, COUNT, etc.)

Test plan

  • 48 unit tests covering expression conversion, export, import, and round-trips
  • Existing test suite passes (no regressions from ai_context or _extract_measure_metadata changes)
  • Manual test with real flights data loading the OSI example

🤖 Generated with Claude Code

boringdata and others added 4 commits April 2, 2026 07:22
Add bidirectional converter between BSL and OSI YAML format:
- to_osi() / to_osi_yaml(): Export BSL models to OSI-compliant YAML
- from_osi() / from_osi_yaml(): Import OSI YAML into BSL models
- ai_context field on Dimension and Measure for LLM metadata
- Expression translation between Ibis Deferred and SQL strings
- BSL-specific metadata preserved via OSI custom_extensions
- Round-trip tested: BSL->OSI->BSL and OSI->BSL->OSI

Closes #226

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of a separate from_osi() conversion layer, from_config() now
auto-detects OSI format (version + semantic_model keys) and parses it
directly. This means from_yaml("model.osi.yaml") just works — BSL
natively speaks OSI.

- OSI parsing logic moved from osi.py into yaml.py
- osi.py slimmed to export-only (to_osi/to_osi_yaml) + expression helpers
- from_osi/from_osi_yaml kept as thin aliases to from_config/from_yaml
- Removed from_osi/from_osi_yaml from top-level __init__.py exports
- Tests updated to use from_config for OSI import (the native path)
- Added format detection tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…olumn joins

Four fixes to make OSI import near-lossless:

1. primary_key -> is_entity: fields matching dataset.primary_key are
   automatically marked is_entity=True on import (no custom_extensions
   needed for standard OSI files)

2. Dataset-level ai_context: added to SemanticTableOp, threaded through
   to_semantic_table/SemanticModel/with_dimensions/with_measures. Stored
   as JSON string internally for ibis hashability, deserialized via
   get_ai_context(). Round-trips through to_osi export.

3. label on Dimension: new optional field, parsed on OSI import, emitted
   on export. Supports the OSI field.label categorization concept.

4. Multi-column relationship joins: all from_columns/to_columns pairs
   are now used to build compound join predicates, not just the first.

All 56 OSI tests pass. Export validates against official OSI JSON schema.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P1 fixes:
- Calculated measures: extract formula from closure/original_expr
  instead of emitting the metric name as a self-reference
- Join key export: introspect lambda predicates by evaluating against
  mock tables and walking the Equals expression tree to extract column
  names, instead of hardcoding ["unknown"]
- Unqualified metrics (COUNT(*)): only assign to the first dataset
  instead of duplicating across all datasets in multi-dataset imports

P2 fixes:
- Relationship cardinality: read from custom_extensions and use
  join_many() when cardinality is "many" instead of always join_one()
- Expression fallback: return None for non-trivial Ibis expressions
  instead of stripping "_." prefix which leaks method syntax as
  invalid SQL

P3 fixes:
- BSL YAML measure ai_context: pass extra_kwargs["ai_context"] through
  to Measure() constructor instead of silently dropping it

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hachej hachej marked this pull request as draft April 7, 2026 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants