Skip to content

feat: prune unused dimension joins from queries#228

Merged
hussainsultan merged 4 commits intomainfrom
fix/227-lazy-join-pruning
Apr 14, 2026
Merged

feat: prune unused dimension joins from queries#228
hussainsultan merged 4 commits intomainfrom
fix/227-lazy-join-pruning

Conversation

@hachej
Copy link
Copy Markdown
Collaborator

@hachej hachej commented Apr 2, 2026

Summary

  • Joins to dimension tables that are not referenced by the query's dimensions or measures are now skipped
  • A pure measure query like model.aggregate("facts.total_sales") no longer joins unused dimension tables
  • Recursive pruning: nested join trees are pruned at every level

Closes #227

How it works

  1. SemanticAggregateOp.to_untagged() extracts table prefixes from query keys/aggs (e.g. stores.store_namestores)
  2. Passes the needed table set as parent_requirements to SemanticJoinOp.to_untagged()
  3. to_untagged() checks if the right-side leaf table is in the needed set — if not, skips the join entirely
  4. Recurses into nested joins for multi-level pruning

Test plan

  • Measure-only query: no dimension tables joined
  • SQL verification: compiled SQL excludes unused tables
  • Single dimension: only that dim table joined
  • Multiple dimensions: only referenced tables joined
  • All dimensions used: no pruning, full join
  • Correctness: pruned results match manual join results

🤖 Generated with Claude Code

boringdata and others added 4 commits April 2, 2026 07:13
When a query only references a subset of the joined tables (via dimension
keys or measure names), joins to unreferenced tables are now skipped.
This avoids expensive joins to dimension tables that contribute nothing
to the result — e.g. a pure SUM() on a fact table no longer joins 4
dimension tables.

The implementation extracts table prefixes from the query's keys and
aggs, passes them as parent_requirements to SemanticJoinOp.to_untagged(),
which recursively prunes right-side leaf tables not in the needed set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The pre-agg path needs all tables for dimension bridges — passing
parent_requirements=None ensures no pruning happens there.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Guard pruning by cardinality and join type: only prune join_one with
how="left". Inner joins act as row filters and must not be removed.
join_many/join_cross are never pruned (join_many is intercepted by
the pre-agg path; join_cross changes row counts).

Adds edge case tests:
- Inner join with orphan rows: pruning correctly preserves inner join
- Filter on dimension table: pruning correctly disabled
- SQL verification: pruned tables absent from compiled SQL

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hussainsultan hussainsultan merged commit 52acdc9 into main Apr 14, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eager join of all dimension tables regardless of which dimensions are used

3 participants