Skip to content

Add case-heavy LEFT JOIN benchmark and debug timing/logging for PushDownFilter hot paths#20664

Merged
kosiew merged 6 commits intoapache:mainfrom
kosiew:push_down-20002
Mar 7, 2026
Merged

Add case-heavy LEFT JOIN benchmark and debug timing/logging for PushDownFilter hot paths#20664
kosiew merged 6 commits intoapache:mainfrom
kosiew:push_down-20002

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Mar 3, 2026

Which issue does this PR close?

Rationale for this change

The PushDownFilter optimizer rule shows a severe planner-time performance pathology in the sql_planner_extended benchmark, where profiling indicates it dominates total planning CPU time and repeatedly recomputes expression types.

This PR adds a deterministic, CASE-heavy LEFT JOIN benchmark to reliably reproduce the worst-case behavior and introduces lightweight debug-only timing + counters inside push_down_filter to make it easier to pinpoint expensive sub-sections (e.g. predicate simplification and join predicate inference) during profiling.

What changes are included in this PR?

  • Benchmark: add a deterministic CASE-heavy LEFT JOIN workload

    • Adds build_case_heavy_left_join_query and helpers to construct a CASE-nested predicate chain over a LEFT JOIN.

    • Adds a new benchmark logical_plan_optimize_case_heavy_left_join to stress planning/optimization time.

    • Adds an A/B benchmark group push_down_filter_case_heavy_left_join_ab that sweeps predicate counts and CASE depth, comparing:

      • default optimizer with push_down_filter enabled
      • optimizer with push_down_filter removed
  • Optimizer instrumentation (debug-only)

    • Adds a small with_debug_timing helper gated by log_enabled!(Debug) to record microsecond timings for specific sections.

    • Instruments and logs:

      • time spent in infer_join_predicates
      • time spent in simplify_predicates
      • counts of parent predicates, on_filters, inferred join predicates
      • before/after predicate counts for simplification

Are these changes tested?

  • No new unit/integration tests were added because this PR is focused on benchmarking and debug-only instrumentation rather than changing optimizer semantics.

  • Coverage is provided by:

    • compiling/running the sql_planner_extended benchmark
    • validating both benchmark variants (with/without push_down_filter) produce optimized plans without errors
    • enabling RUST_LOG=debug to confirm timing sections and counters emit as expected

Are there any user-facing changes?

  • No user-facing behavior changes.
  • The optimizer logic is unchanged; only debug logging is added (emits only when RUST_LOG enables Debug for the relevant modules).
  • Benchmark suite additions only affect developers running benches.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Mar 3, 2026
@kosiew kosiew marked this pull request as ready for review March 5, 2026 09:05
query.push_str(" AND ");
}

let mut expr = format!("length(l.c{})", i % 20);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we really want a benchmark for the case expression.
We want to optimize for the evaluation cost of the filter during pushdown, so perhaps it could be written not using a large case expression as is done currently or adaptive removing filters, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the TPC-H/TPC-DS one is already a good one to optimize for.

Copy link
Contributor Author

@kosiew kosiew Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think we really want a benchmark for the case expression.
We want to optimize for the evaluation cost of the filter during pushdown, so perhaps it could be written not using a large case expression as is done currently or adaptive removing filters, etc.

I believe that you fear that by using a huge CASE expression we might be tuning for an unrealistic “case expression” workload instead of the more common cost of pushing filters through joins.

The benchmark uses CASE because that form triggered a profiler hotspot in PushDownFilter — the nullability/type‑inference codepath for filters on non‑inner joins. I don’t believe real‑world queries typically look like this, so the presence of CASE is purely a convenient way to exercise that particular expensive planner path, not the target of optimization.

To make this clear and avoid overfitting, I’m going to treat the CASE variant as a narrowly scoped micro‑benchmark and add a companion LEFT JOIN query with a simple predicate instead of a CASE.
With both in place we can separate:

  1. the baseline cost of pushing a filter through a join, and
  2. the extra work incurred when a CASE expression forces nullability inference.

That way the benchmark remains useful for optimization while still reflecting more general planner behaviour.

So the TPC-H/TPC-DS one is already a good one to optimize for.

Agreed. TPC-H/TPC-DS should remain the primary goal for optimization value and regression detection.

The intent here is to complement those suites with a deterministic micro-benchmark that isolates known planner hotspot; macro benchmarks are still required to verify end-to-end relevance and prevent narrow wins.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I didn't realize the issue is about planning (not about evaluation cost / pushdown per se), sorry about that!

Perhaps this would make sense to move into the planning benchmarks instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait - it already is 🙈

@kosiew kosiew force-pushed the push_down-20002 branch from 0500e0c to 330ea04 Compare March 6, 2026 08:06
@Dandandan
Copy link
Contributor

@adriangb FYI

Can we reconsider creating a large crazy large expression for the dynamic filters?

now the size of the dynamic expression is something like

  • number_of_join_keys * number_of_partitions which creates extremely large expressions on large core machines.

Perhaps create a EvaluateByIdExpr PhysicalExpr or something that has Vec<PhysicalExpr> that evaluates them by id, or disabling dynamic filters when having partitioned joins for the moment.

@adriangb
Copy link
Contributor

adriangb commented Mar 6, 2026

@adriangb FYI

Can we reconsider creating a large crazy large expression for the dynamic filters?

now the size of the dynamic expression is something like

* `number_of_join_keys` * `number_of_partitions` which creates _extremely large expressions_ on large core machines.

Perhaps create a EvaluateByIdExpr PhysicalExpr or something that has Vec<PhysicalExpr> that evaluates them by id, or disabling dynamic filters when having partitioned joins for the moment.

I'm open to suggestions. We should find a solution that keeps the performance wins for small number of join keys / CPUs without degrading for large combinations of those. My hope was that CASE is basically already an optimized version of what you are suggesting (afaik it has a hash map internally to do the routing), but I guess not.

@kosiew
Copy link
Contributor Author

kosiew commented Mar 7, 2026

@Dandandan
Thanks for the review

@kosiew kosiew added this pull request to the merge queue Mar 7, 2026
Merged via the queue into apache:main with commit 0ac434d Mar 7, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants