-
Notifications
You must be signed in to change notification settings - Fork 2k
Add case-heavy LEFT JOIN benchmark and debug timing/logging for PushDownFilter hot paths #20664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+276
−6
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
054607b
feat: add benchmarking for case-heavy left join and improve debug log…
kosiew b927fe7
feat: enhance benchmarking for case-heavy left join with push down fi…
kosiew 8ea18ed
Align benchmark helper setup
kosiew 7f6512b
cargo fmt
kosiew 72ebc0e
Add bench comments for push down
kosiew 330ea04
Add benchmarks for non-case left join with push down filter
kosiew File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we really want a benchmark for the case expression.
We want to optimize for the evaluation cost of the filter during pushdown, so perhaps it could be written not using a large case expression as is done currently or adaptive removing filters, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the TPC-H/TPC-DS one is already a good one to optimize for.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that you fear that by using a huge CASE expression we might be tuning for an unrealistic “case expression” workload instead of the more common cost of pushing filters through joins.
The benchmark uses CASE because that form triggered a profiler hotspot in
PushDownFilter— the nullability/type‑inference codepath for filters on non‑inner joins. I don’t believe real‑world queries typically look like this, so the presence of CASE is purely a convenient way to exercise that particular expensive planner path, not the target of optimization.To make this clear and avoid overfitting, I’m going to treat the CASE variant as a narrowly scoped micro‑benchmark and add a companion LEFT JOIN query with a simple predicate instead of a CASE.
With both in place we can separate:
That way the benchmark remains useful for optimization while still reflecting more general planner behaviour.
Agreed. TPC-H/TPC-DS should remain the primary goal for optimization value and regression detection.
The intent here is to complement those suites with a deterministic micro-benchmark that isolates known planner hotspot; macro benchmarks are still required to verify end-to-end relevance and prevent narrow wins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I didn't realize the issue is about planning (not about evaluation cost / pushdown per se), sorry about that!
Perhaps this would make sense to move into the planning benchmarks instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait - it already is 🙈