Problem
When joining two fact tables at different granularities (e.g. monthly revenue + daily hours), BSL requires manual pre-aggregation to avoid fan-out / double-counting. This is a common star schema pattern that tools like Power BI handle automatically through relationship definitions.
Example:
financials table: year/month grain (revenue)
hours table: year/month/day grain (hours worked)
- Goal: compute
hourly_fee = total_revenue / total_hours per month
Currently, join_many causes fan-out (revenue multiplied by number of days), and join_one expects the right side to already be at the correct grain. The user must manually pre-aggregate:
# Manual pre-aggregation required today
monthly_hours = hours_model.group_by('year', 'month').aggregate('total_hours').to_untagged()
monthly_hours_tbl = con.create_table('monthly_hours', monthly_hours.execute())
monthly_hours_model = to_semantic_table(monthly_hours_tbl, name='monthly_hours')...
combined = financials_model.join_one(monthly_hours_model, on=...)
Proposal
BSL should support grain-aware joins that automatically pre-aggregate fact tables to align granularity before joining. Something like:
# Option A: explicit grain declaration on the join
combined = financials_model.join_one(
hours_model,
on=lambda f, h: (f.year == h.year) & (f.month == h.month),
grain="month", # or inferred from join keys
)
# Option B: grain metadata on the model itself
hours_model = (
to_semantic_table(hours_tbl, name='hours')
.with_grain('year', 'month', 'day')
.with_dimensions(...)
.with_measures(...)
)
# BSL knows hours is day-grain, financials is month-grain,
# and auto-aggregates hours to month before joining
Related
Context
From a user building BI-as-a-Service who previously used Power BI's star schema with multiple fact tables at different granularities. This is a common pattern in financial/operational reporting (e.g. monthly revenue + daily hours → hourly fee).
Problem
When joining two fact tables at different granularities (e.g. monthly revenue + daily hours), BSL requires manual pre-aggregation to avoid fan-out / double-counting. This is a common star schema pattern that tools like Power BI handle automatically through relationship definitions.
Example:
financialstable: year/month grain (revenue)hourstable: year/month/day grain (hours worked)hourly_fee = total_revenue / total_hoursper monthCurrently,
join_manycauses fan-out (revenue multiplied by number of days), andjoin_oneexpects the right side to already be at the correct grain. The user must manually pre-aggregate:Proposal
BSL should support grain-aware joins that automatically pre-aggregate fact tables to align granularity before joining. Something like:
Related
has-oneRelationships between Dimensions #64 —has-onedimension relationships / deferred join_one (post-aggregation join optimization — the inverse of this: join after aggregation for lookups vs aggregate before join for multi-fact)Context
From a user building BI-as-a-Service who previously used Power BI's star schema with multiple fact tables at different granularities. This is a common pattern in financial/operational reporting (e.g. monthly revenue + daily hours → hourly fee).