Skip to content

Multi-fact star schema: auto-aggregate to align grains before join #218

@hachej

Description

@hachej

Problem

When joining two fact tables at different granularities (e.g. monthly revenue + daily hours), BSL requires manual pre-aggregation to avoid fan-out / double-counting. This is a common star schema pattern that tools like Power BI handle automatically through relationship definitions.

Example:

  • financials table: year/month grain (revenue)
  • hours table: year/month/day grain (hours worked)
  • Goal: compute hourly_fee = total_revenue / total_hours per month

Currently, join_many causes fan-out (revenue multiplied by number of days), and join_one expects the right side to already be at the correct grain. The user must manually pre-aggregate:

# Manual pre-aggregation required today
monthly_hours = hours_model.group_by('year', 'month').aggregate('total_hours').to_untagged()
monthly_hours_tbl = con.create_table('monthly_hours', monthly_hours.execute())
monthly_hours_model = to_semantic_table(monthly_hours_tbl, name='monthly_hours')...

combined = financials_model.join_one(monthly_hours_model, on=...)

Proposal

BSL should support grain-aware joins that automatically pre-aggregate fact tables to align granularity before joining. Something like:

# Option A: explicit grain declaration on the join
combined = financials_model.join_one(
    hours_model,
    on=lambda f, h: (f.year == h.year) & (f.month == h.month),
    grain="month",  # or inferred from join keys
)

# Option B: grain metadata on the model itself
hours_model = (
    to_semantic_table(hours_tbl, name='hours')
    .with_grain('year', 'month', 'day')
    .with_dimensions(...)
    .with_measures(...)
)
# BSL knows hours is day-grain, financials is month-grain,
# and auto-aggregates hours to month before joining

Related

Context

From a user building BI-as-a-Service who previously used Power BI's star schema with multiple fact tables at different granularities. This is a common pattern in financial/operational reporting (e.g. monthly revenue + daily hours → hourly fee).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions