Skip to content

Rewrite SUM(expr + scalar) --> SUM(expr) + scalar*COUNT(expr) #20665

Closed
alamb wants to merge 2 commits intoapache:mainfrom
alamb:alamb/clickbenchmaxxing
Closed

Rewrite SUM(expr + scalar) --> SUM(expr) + scalar*COUNT(expr) #20665
alamb wants to merge 2 commits intoapache:mainfrom
alamb:alamb/clickbenchmaxxing

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Mar 3, 2026

Draft until:

Which issue does this PR close?

Rationale for this change

I want DataFusion to be the fastest paruqet engine on ClickBench. One of the queries where DataFusion is significantly slower is Query 29 which has a very strange pattern of many aggregate functions that are offset by a constant:

SELECT SUM("ResolutionWidth"), SUM("ResolutionWidth" + 1), SUM("ResolutionWidth" + 2), SUM("ResolutionWidth" + 3), SUM("ResolutionWidth" + 4), SUM("ResolutionWidth" + 5), SUM("ResolutionWidth" + 6), SUM("ResolutionWidth" + 7), SUM("ResolutionWidth" + 8), SUM("ResolutionWidth" + 9), SUM("ResolutionWidth" + 10), SUM("ResolutionWidth" + 11), SUM("ResolutionWidth" + 12), SUM("ResolutionWidth" + 13), SUM("ResolutionWidth" + 14), SUM("ResolutionWidth" + 15), SUM("ResolutionWidth" + 16), SUM("ResolutionWidth" + 17), SUM("ResolutionWidth" + 18), SUM("ResolutionWidth" + 19), SUM("ResolutionWidth" + 20), SUM("ResolutionWidth" + 21), SUM("ResolutionWidth" + 22), SUM("ResolutionWidth" + 23), SUM("ResolutionWidth" + 24), SUM("ResolutionWidth" + 25), SUM("ResolutionWidth" + 26), SUM("ResolutionWidth" + 27), SUM("ResolutionWidth" + 28), SUM("ResolutionWidth" + 29), SUM("ResolutionWidth" + 30), SUM("ResolutionWidth" + 31), SUM("ResolutionWidth" + 32), SUM("ResolutionWidth" + 33), SUM("ResolutionWidth" + 34), SUM("ResolutionWidth" + 35), SUM("ResolutionWidth" + 36), SUM("ResolutionWidth" + 37), SUM("ResolutionWidth" + 38), SUM("ResolutionWidth" + 39), SUM("ResolutionWidth" + 40), SUM("ResolutionWidth" + 41), SUM("ResolutionWidth" + 42), SUM("ResolutionWidth" + 43), SUM("ResolutionWidth" + 44), SUM("ResolutionWidth" + 45), SUM("ResolutionWidth" + 46), SUM("ResolutionWidth" + 47), SUM("ResolutionWidth" + 48), SUM("ResolutionWidth" + 49), SUM("ResolutionWidth" + 50), SUM("ResolutionWidth" + 51), SUM("ResolutionWidth" + 52), SUM("ResolutionWidth" + 53), SUM("ResolutionWidth" + 54), SUM("ResolutionWidth" + 55), SUM("ResolutionWidth" + 56), SUM("ResolutionWidth" + 57), SUM("ResolutionWidth" + 58), SUM("ResolutionWidth" + 59), SUM("ResolutionWidth" + 60), SUM("ResolutionWidth" + 61), SUM("ResolutionWidth" + 62), SUM("ResolutionWidth" + 63), SUM("ResolutionWidth" + 64), SUM("ResolutionWidth" + 65), SUM("ResolutionWidth" + 66), SUM("ResolutionWidth" + 67), SUM("ResolutionWidth" + 68), SUM("ResolutionWidth" + 69), SUM("ResolutionWidth" + 70), SUM("ResolutionWidth" + 71), SUM("ResolutionWidth" + 72), SUM("ResolutionWidth" + 73), SUM("ResolutionWidth" + 74), SUM("ResolutionWidth" + 75), SUM("ResolutionWidth" + 76), SUM("ResolutionWidth" + 77), SUM("ResolutionWidth" + 78), SUM("ResolutionWidth" + 79), SUM("ResolutionWidth" + 80), SUM("ResolutionWidth" + 81), SUM("ResolutionWidth" + 82), SUM("ResolutionWidth" + 83), SUM("ResolutionWidth" + 84), SUM("ResolutionWidth" + 85), SUM("ResolutionWidth" + 86), SUM("ResolutionWidth" + 87), SUM("ResolutionWidth" + 88), SUM("ResolutionWidth" + 89) FROM hits;

This is not a pattern I have ever seen in a real query, but it seems like the engine currently at the top of the ClickBench leaderboard has a special case for this pattern. See

Thus I reluctantly conclude that we should have one too.

What changes are included in this PR?

  1. Add a rewrite SUM(expr + scalar) --> SUM(expr) + scalar*COUNT(expr)
  2. Tests for same

This is implemented as a AggregateUDF::simplify rule as discussed on #20180 (comment) and suggested by @UBarney

Note there are quite a few other ideas to potentially make this more general on #15524 but I am going with the simple thing of making it work for the usecase we have in hand (ClickBench)

Are these changes tested?

Yes, new tests are added

Are there any user-facing changes?

Faster performance

@alamb alamb added the performance Make DataFusion faster label Mar 3, 2026
@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Mar 3, 2026
@AdamGS
Copy link
Contributor

AdamGS commented Mar 3, 2026

I was curious about overflow behavior which I think is right, but I have noticed one difference (at least when going through SQL):

statement ok
CREATE TABLE IF NOT EXISTS tbl (val INTEGER UNSIGNED);

statement ok
INSERT INTO tbl VALUES (4294967295);

statement ok
INSERT INTO tbl VALUES (4294967295);

query II
SELECT SUM(val + 1), SUM(val + 2) FROM tbl;
----
8589934592 8589934594

query TT
EXPLAIN SELECT SUM(val + 1), SUM(val + 2)  FROM tbl;
----
logical_plan
01)Aggregate: groupBy=[[]], aggr=[[sum(__common_expr_1 AS tbl.val + Int64(1)), sum(__common_expr_1 AS tbl.val + Int64(2))]]
02)--Projection: CAST(tbl.val AS Int64) AS __common_expr_1
03)----TableScan: tbl projection=[val]
physical_plan
01)AggregateExec: mode=Single, gby=[], aggr=[sum(tbl.val + Int64(1)), sum(tbl.val + Int64(2))]
02)--ProjectionExec: expr=[CAST(val@0 AS Int64) as __common_expr_1]
03)----DataSourceExec: partitions=1, partition_sizes=[2]

query RR
SELECT SUM(val) + 1 * COUNT(val), SUM(val) + 2 * COUNT(val)  FROM tbl;
----
8589934592 8589934594

query TT
EXPLAIN SELECT SUM(val) + 1 * COUNT(val), SUM(val) + 2 * COUNT(val)  FROM tbl;
----
logical_plan
01)Projection: __common_expr_1 + CAST(count(tbl.val) AS Decimal128(20, 0)) AS sum(tbl.val) + Int64(1) * count(tbl.val), __common_expr_1 AS sum(tbl.val) + CAST(Int64(2) * count(tbl.val) AS Decimal128(20, 0))
02)--Projection: CAST(sum(tbl.val) AS Decimal128(20, 0)) AS __common_expr_1, count(tbl.val)
03)----Aggregate: groupBy=[[]], aggr=[[sum(CAST(tbl.val AS UInt64)), count(tbl.val)]]
04)------TableScan: tbl projection=[val]
physical_plan
01)ProjectionExec: expr=[__common_expr_1@0 + CAST(count(tbl.val)@1 AS Decimal128(20, 0)) as sum(tbl.val) + Int64(1) * count(tbl.val), __common_expr_1@0 + CAST(2 * count(tbl.val)@1 AS Decimal128(20, 0)) as sum(tbl.val) + Int64(2) * count(tbl.val)]
02)--ProjectionExec: expr=[CAST(sum(tbl.val)@0 AS Decimal128(20, 0)) as __common_expr_1, count(tbl.val)@1 as count(tbl.val)]
03)----AggregateExec: mode=Single, gby=[], aggr=[sum(tbl.val), count(tbl.val)]
04)------DataSourceExec: partitions=1, partition_sizes=[2]

statement ok
DROP TABLE IF EXISTS tbl;

The "rewritten" form returns floats for some reason? not sure what that is about

@alamb
Copy link
Contributor Author

alamb commented Mar 3, 2026

The "rewritten" form retruns floats for some reason? not sure what that is about

Thank you -- I will investigate

github-merge-queue bot pushed a commit that referenced this pull request Mar 4, 2026
## Which issue does this PR close?

- Part of #18489 
- Related to #20180

## Rationale for this change

This looks like a monster PR but I think it will be quite easy to review
(it just adds some new `EXPLAIN` tests). If it would be helpful I can
break it into smaller pieces

I want to improve the plans for ClickBench Query 29

However, the plans for the ClickBench queries are not in our tests
anywhere (so when I make the improvements in
#20665 no explain plan tests
change)

So to start, let's start with adding the explain plans for all the
queries in clickbench.slt to so it is clear what our current plans look
like as well as to make it clear what the change of plans are

## What changes are included in this PR?

Add explain plans to some .slt tests

## Are these changes tested?

Only tests

## Are there any user-facing changes?
No, this only adds tests
@alamb alamb force-pushed the alamb/clickbenchmaxxing branch from 4ff7980 to b4725d1 Compare March 4, 2026 21:52
@alamb alamb force-pushed the alamb/clickbenchmaxxing branch from b4725d1 to 6f76ea8 Compare March 5, 2026 13:13
@alamb
Copy link
Contributor Author

alamb commented Mar 5, 2026

The "rewritten" form retruns floats for some reason? not sure what that is about

Thank you -- I will investigate

I added these tests in

@alamb alamb force-pushed the alamb/clickbenchmaxxing branch 2 times, most recently from 4fca8f4 to 4d28038 Compare March 5, 2026 15:08
@github-actions github-actions bot added the optimizer Optimizer rules label Mar 5, 2026
@alamb alamb force-pushed the alamb/clickbenchmaxxing branch from 4d28038 to c048aa0 Compare March 5, 2026 15:09
github-merge-queue bot pushed a commit that referenced this pull request Mar 5, 2026
…Impl::simplify` (#20712)

## Rationale for this change

While working on #20665 I was
somewhat confused about how the AggregateUDF simplify worked. So I added
some more clarifying comments

## What changes are included in this PR?

1. Improve documentation strings

## Are these changes tested?

Yes by CI 

## Are there any user-facing changes?
Better API docs

---------

Co-authored-by: Yongting You <2010youy01@gmail.com>
@alamb alamb force-pushed the alamb/clickbenchmaxxing branch from 8509263 to 6245420 Compare March 5, 2026 21:35
@alamb
Copy link
Contributor Author

alamb commented Mar 5, 2026

run benchmark clickbench_partitioned

@alamb
Copy link
Contributor Author

alamb commented Mar 5, 2026

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/clickbenchmaxxing (6245420) to a95da70 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and alamb_clickbenchmaxxing
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ alamb_clickbenchmaxxing ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2551.45 ms │              2509.06 ms │     no change │
│ QQuery 1 │  1047.84 ms │               914.02 ms │ +1.15x faster │
│ QQuery 2 │  1992.74 ms │              1808.08 ms │ +1.10x faster │
│ QQuery 3 │  1134.64 ms │              1137.55 ms │     no change │
│ QQuery 4 │  2325.73 ms │              2424.73 ms │     no change │
│ QQuery 5 │ 26662.53 ms │             27196.55 ms │     no change │
│ QQuery 6 │  4018.96 ms │              4007.98 ms │     no change │
│ QQuery 7 │  2964.80 ms │              2967.11 ms │     no change │
└──────────┴─────────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 42698.69ms │
│ Total Time (alamb_clickbenchmaxxing)   │ 42965.08ms │
│ Average Time (HEAD)                    │  5337.34ms │
│ Average Time (alamb_clickbenchmaxxing) │  5370.64ms │
│ Queries Faster                         │          2 │
│ Queries Slower                         │          0 │
│ Queries with No Change                 │          6 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ alamb_clickbenchmaxxing ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.57 ms │                 2.65 ms │     no change │
│ QQuery 1  │    52.35 ms │                51.44 ms │     no change │
│ QQuery 2  │   160.16 ms │               165.16 ms │     no change │
│ QQuery 3  │   170.59 ms │               166.83 ms │     no change │
│ QQuery 4  │  1072.89 ms │              1116.51 ms │     no change │
│ QQuery 5  │  1323.46 ms │              1301.68 ms │     no change │
│ QQuery 6  │     6.84 ms │                 7.40 ms │  1.08x slower │
│ QQuery 7  │    55.19 ms │                55.82 ms │     no change │
│ QQuery 8  │  1489.10 ms │              1562.86 ms │     no change │
│ QQuery 9  │  1879.40 ms │              1958.11 ms │     no change │
│ QQuery 10 │   364.91 ms │               347.56 ms │     no change │
│ QQuery 11 │   412.97 ms │               402.08 ms │     no change │
│ QQuery 12 │  1214.94 ms │              1238.54 ms │     no change │
│ QQuery 13 │  2034.48 ms │              2064.96 ms │     no change │
│ QQuery 14 │  1214.14 ms │              1243.86 ms │     no change │
│ QQuery 15 │  1225.20 ms │              1277.65 ms │     no change │
│ QQuery 16 │  2634.13 ms │              2736.70 ms │     no change │
│ QQuery 17 │  2636.18 ms │              2670.51 ms │     no change │
│ QQuery 18 │  5682.28 ms │              5295.16 ms │ +1.07x faster │
│ QQuery 19 │   128.52 ms │               127.50 ms │     no change │
│ QQuery 20 │  1999.28 ms │              1902.88 ms │     no change │
│ QQuery 21 │  2301.19 ms │              2148.92 ms │ +1.07x faster │
│ QQuery 22 │  3954.26 ms │              3722.66 ms │ +1.06x faster │
│ QQuery 23 │ 17289.74 ms │             12034.35 ms │ +1.44x faster │
│ QQuery 24 │   198.39 ms │               203.68 ms │     no change │
│ QQuery 25 │   472.64 ms │               441.71 ms │ +1.07x faster │
│ QQuery 26 │   234.73 ms │               193.57 ms │ +1.21x faster │
│ QQuery 27 │  2875.93 ms │              2667.40 ms │ +1.08x faster │
│ QQuery 28 │ 24823.17 ms │             21743.20 ms │ +1.14x faster │
│ QQuery 29 │  1058.94 ms │               137.62 ms │ +7.69x faster │
│ QQuery 30 │  1277.56 ms │              1208.60 ms │ +1.06x faster │
│ QQuery 31 │  1395.31 ms │              1285.94 ms │ +1.09x faster │
│ QQuery 32 │  4814.70 ms │              4112.25 ms │ +1.17x faster │
│ QQuery 33 │  5821.38 ms │              5201.24 ms │ +1.12x faster │
│ QQuery 34 │  6371.51 ms │              5906.20 ms │ +1.08x faster │
│ QQuery 35 │  1258.69 ms │              1202.38 ms │     no change │
│ QQuery 36 │   191.91 ms │               185.82 ms │     no change │
│ QQuery 37 │    71.91 ms │                71.72 ms │     no change │
│ QQuery 38 │   121.37 ms │               107.71 ms │ +1.13x faster │
│ QQuery 39 │   348.32 ms │               323.58 ms │ +1.08x faster │
│ QQuery 40 │    40.48 ms │                42.36 ms │     no change │
│ QQuery 41 │    36.65 ms │                35.35 ms │     no change │
│ QQuery 42 │    32.04 ms │                31.45 ms │     no change │
└───────────┴─────────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 100750.37ms │
│ Total Time (alamb_clickbenchmaxxing)   │  88703.57ms │
│ Average Time (HEAD)                    │   2343.03ms │
│ Average Time (alamb_clickbenchmaxxing) │   2062.87ms │
│ Queries Faster                         │          16 │
│ Queries Slower                         │           1 │
│ Queries with No Change                 │          26 │
│ Queries with Failure                   │           0 │
└────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ alamb_clickbenchmaxxing ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 126.14 ms │               127.49 ms │    no change │
│ QQuery 2  │  32.61 ms │                32.37 ms │    no change │
│ QQuery 3  │  39.22 ms │                41.83 ms │ 1.07x slower │
│ QQuery 4  │  35.47 ms │                35.75 ms │    no change │
│ QQuery 5  │  91.09 ms │                94.51 ms │    no change │
│ QQuery 6  │  25.31 ms │                24.64 ms │    no change │
│ QQuery 7  │ 155.96 ms │               160.76 ms │    no change │
│ QQuery 8  │  39.03 ms │                40.35 ms │    no change │
│ QQuery 9  │ 108.85 ms │               108.91 ms │    no change │
│ QQuery 10 │  72.61 ms │                74.10 ms │    no change │
│ QQuery 11 │  18.49 ms │                19.00 ms │    no change │
│ QQuery 12 │  65.90 ms │                66.66 ms │    no change │
│ QQuery 13 │  55.14 ms │                55.46 ms │    no change │
│ QQuery 14 │  15.94 ms │                16.38 ms │    no change │
│ QQuery 15 │  33.42 ms │                33.65 ms │    no change │
│ QQuery 16 │  30.18 ms │                31.23 ms │    no change │
│ QQuery 17 │ 171.34 ms │               170.02 ms │    no change │
│ QQuery 18 │ 305.33 ms │               307.66 ms │    no change │
│ QQuery 19 │  53.29 ms │                51.56 ms │    no change │
│ QQuery 20 │  61.21 ms │                63.29 ms │    no change │
│ QQuery 21 │ 193.13 ms │               201.65 ms │    no change │
│ QQuery 22 │  23.54 ms │                24.53 ms │    no change │
└───────────┴───────────┴─────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 1753.19ms │
│ Total Time (alamb_clickbenchmaxxing)   │ 1781.81ms │
│ Average Time (HEAD)                    │   79.69ms │
│ Average Time (alamb_clickbenchmaxxing) │   80.99ms │
│ Queries Faster                         │         0 │
│ Queries Slower                         │         1 │
│ Queries with No Change                 │        21 │
│ Queries with Failure                   │         0 │
└────────────────────────────────────────┴───────────┘

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/clickbenchmaxxing (6245420) to a95da70 diff using: clickbench_partitioned
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and alamb_clickbenchmaxxing
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ alamb_clickbenchmaxxing ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.62 ms │                 2.66 ms │     no change │
│ QQuery 1  │    50.70 ms │                51.91 ms │     no change │
│ QQuery 2  │   158.81 ms │               167.45 ms │  1.05x slower │
│ QQuery 3  │   174.03 ms │               169.02 ms │     no change │
│ QQuery 4  │  1114.37 ms │              1297.11 ms │  1.16x slower │
│ QQuery 5  │  1367.18 ms │              1406.49 ms │     no change │
│ QQuery 6  │     7.17 ms │                 6.69 ms │ +1.07x faster │
│ QQuery 7  │    55.19 ms │                55.12 ms │     no change │
│ QQuery 8  │  1573.50 ms │              1590.98 ms │     no change │
│ QQuery 9  │  1934.30 ms │              1966.38 ms │     no change │
│ QQuery 10 │   366.17 ms │               346.05 ms │ +1.06x faster │
│ QQuery 11 │   411.88 ms │               404.57 ms │     no change │
│ QQuery 12 │  1267.16 ms │              1311.46 ms │     no change │
│ QQuery 13 │  2088.52 ms │              2088.10 ms │     no change │
│ QQuery 14 │  1265.39 ms │              1336.31 ms │  1.06x slower │
│ QQuery 15 │  1289.43 ms │              1354.26 ms │  1.05x slower │
│ QQuery 16 │  2697.05 ms │              2678.20 ms │     no change │
│ QQuery 17 │  2682.10 ms │              2698.26 ms │     no change │
│ QQuery 18 │  6110.92 ms │              5300.33 ms │ +1.15x faster │
│ QQuery 19 │   126.11 ms │               123.61 ms │     no change │
│ QQuery 20 │  1997.18 ms │              1856.45 ms │ +1.08x faster │
│ QQuery 21 │  2288.14 ms │              2147.13 ms │ +1.07x faster │
│ QQuery 22 │  3971.36 ms │              3682.39 ms │ +1.08x faster │
│ QQuery 23 │ 25151.55 ms │             11723.85 ms │ +2.15x faster │
│ QQuery 24 │   214.54 ms │               189.02 ms │ +1.13x faster │
│ QQuery 25 │   463.66 ms │               439.30 ms │ +1.06x faster │
│ QQuery 26 │   214.92 ms │               194.62 ms │ +1.10x faster │
│ QQuery 27 │  2779.86 ms │              2703.30 ms │     no change │
│ QQuery 28 │ 24444.04 ms │             22143.55 ms │ +1.10x faster │
│ QQuery 29 │  1051.88 ms │               135.16 ms │ +7.78x faster │
│ QQuery 30 │  1302.92 ms │              1310.67 ms │     no change │
│ QQuery 31 │  1426.05 ms │              1402.00 ms │     no change │
│ QQuery 32 │  4945.35 ms │              4659.18 ms │ +1.06x faster │
│ QQuery 33 │  6188.18 ms │              5748.51 ms │ +1.08x faster │
│ QQuery 34 │  6436.49 ms │              6198.28 ms │     no change │
│ QQuery 35 │  1272.89 ms │              1264.34 ms │     no change │
│ QQuery 36 │   186.79 ms │               187.53 ms │     no change │
│ QQuery 37 │    77.49 ms │                70.17 ms │ +1.10x faster │
│ QQuery 38 │   116.66 ms │               107.97 ms │ +1.08x faster │
│ QQuery 39 │   345.35 ms │               342.42 ms │     no change │
│ QQuery 40 │    41.40 ms │                38.36 ms │ +1.08x faster │
│ QQuery 41 │    35.70 ms │                37.15 ms │     no change │
│ QQuery 42 │    31.29 ms │                31.61 ms │     no change │
└───────────┴─────────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 109726.31ms │
│ Total Time (alamb_clickbenchmaxxing)   │  90967.93ms │
│ Average Time (HEAD)                    │   2551.77ms │
│ Average Time (alamb_clickbenchmaxxing) │   2115.53ms │
│ Queries Faster                         │          17 │
│ Queries Slower                         │           4 │
│ Queries with No Change                 │          22 │
│ Queries with Failure                   │           0 │
└────────────────────────────────────────┴─────────────┘

@Dandandan
Copy link
Contributor

│ QQuery 29 │ 1051.88 ms │ 135.16 ms │ +7.78x faster │ 🚀 🚀 🚀

@Dandandan
Copy link
Contributor

Dandandan commented Mar 6, 2026

│ QQuery 23 │ 25151.55 ms │ 11723.85 ms │ +2.15x faster │

@alamb btw this was the query I was talking about...

  • Is 2-3 times slower 50% of the time in main
  • It is always the left/main branch which is slower.
  • Sometimes some other queries show similar issues for the main branch (making the overall comparison off)

When running locally, I don't have the problem...
I think because it's the most IO heavy one, it must be something related to IOPs, swapping...

@Dandandan
Copy link
Contributor

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/clickbenchmaxxing (6245420) to a95da70 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Mar 6, 2026

│ QQuery 23 │ 25151.55 ms │ 11723.85 ms │ +2.15x faster │

@alamb btw this was the query I was talking about...

  • Is 2-3 times slower 50% of the time in main
  • It is always the left/main branch which is slower.
  • Sometimes some other queries show similar issues for the main branch (making the overall comparison off)

When running locally, I don't have the problem... I think because it's the most IO heavy one, it must be something related to IOPs, swapping...

@Dandandan I think it is this one:

SELECT * FROM hits WHERE "URL" LIKE '%google%' ORDER BY "EventTime" LIMIT 10;

Last tiem @adriangb looked into that I think he concluded that a bunch of the variability is related to if the TopK finds the top values quickly enough to skip some of the files entirely but that depends on a race to read the "right" rows.

Q23 is also the one that gets stupidly faster (like 20x) when we turn on filter pushdown

@alamb
Copy link
Contributor Author

alamb commented Mar 6, 2026

run benchmark clickbench_partitioned

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and alamb_clickbenchmaxxing
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ alamb_clickbenchmaxxing ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2477.61 ms │              2358.11 ms │     no change │
│ QQuery 1 │   959.00 ms │               915.84 ms │     no change │
│ QQuery 2 │  1943.14 ms │              1825.77 ms │ +1.06x faster │
│ QQuery 3 │  1160.14 ms │              1098.17 ms │ +1.06x faster │
│ QQuery 4 │  2322.80 ms │              2333.17 ms │     no change │
│ QQuery 5 │ 27293.22 ms │             27123.72 ms │     no change │
│ QQuery 6 │  4202.21 ms │              3959.13 ms │ +1.06x faster │
│ QQuery 7 │  2726.74 ms │              2707.37 ms │     no change │
└──────────┴─────────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 43084.85ms │
│ Total Time (alamb_clickbenchmaxxing)   │ 42321.27ms │
│ Average Time (HEAD)                    │  5385.61ms │
│ Average Time (alamb_clickbenchmaxxing) │  5290.16ms │
│ Queries Faster                         │          3 │
│ Queries Slower                         │          0 │
│ Queries with No Change                 │          5 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ alamb_clickbenchmaxxing ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.62 ms │                 2.68 ms │     no change │
│ QQuery 1  │    50.21 ms │                49.91 ms │     no change │
│ QQuery 2  │   163.62 ms │               163.56 ms │     no change │
│ QQuery 3  │   172.73 ms │               171.17 ms │     no change │
│ QQuery 4  │  1061.07 ms │              1096.26 ms │     no change │
│ QQuery 5  │  1329.09 ms │              1279.15 ms │     no change │
│ QQuery 6  │     6.85 ms │                 7.17 ms │     no change │
│ QQuery 7  │    55.19 ms │                57.23 ms │     no change │
│ QQuery 8  │  1494.57 ms │              1504.77 ms │     no change │
│ QQuery 9  │  1909.15 ms │              1953.51 ms │     no change │
│ QQuery 10 │   359.16 ms │               344.18 ms │     no change │
│ QQuery 11 │   414.85 ms │               395.09 ms │     no change │
│ QQuery 12 │  1243.10 ms │              1205.65 ms │     no change │
│ QQuery 13 │  2013.90 ms │              2035.72 ms │     no change │
│ QQuery 14 │  1243.62 ms │              1228.97 ms │     no change │
│ QQuery 15 │  1235.24 ms │              1245.39 ms │     no change │
│ QQuery 16 │  2596.81 ms │              2577.09 ms │     no change │
│ QQuery 17 │  2581.98 ms │              2563.37 ms │     no change │
│ QQuery 18 │  5032.06 ms │              5010.14 ms │     no change │
│ QQuery 19 │   125.36 ms │               126.67 ms │     no change │
│ QQuery 20 │  2003.23 ms │              1859.12 ms │ +1.08x faster │
│ QQuery 21 │  2221.32 ms │              2096.68 ms │ +1.06x faster │
│ QQuery 22 │  3923.86 ms │              3687.45 ms │ +1.06x faster │
│ QQuery 23 │ 12187.77 ms │             11823.24 ms │     no change │
│ QQuery 24 │   206.11 ms │               187.82 ms │ +1.10x faster │
│ QQuery 25 │   444.49 ms │               437.75 ms │     no change │
│ QQuery 26 │   198.12 ms │               194.89 ms │     no change │
│ QQuery 27 │  2774.44 ms │              2656.03 ms │     no change │
│ QQuery 28 │ 23929.01 ms │             22312.11 ms │ +1.07x faster │
│ QQuery 29 │  1039.30 ms │               135.56 ms │ +7.67x faster │
│ QQuery 30 │  1262.00 ms │              1251.70 ms │     no change │
│ QQuery 31 │  1379.73 ms │              1354.98 ms │     no change │
│ QQuery 32 │  4497.42 ms │              4473.07 ms │     no change │
│ QQuery 33 │  5714.97 ms │              5651.70 ms │     no change │
│ QQuery 34 │  6456.95 ms │              6134.10 ms │ +1.05x faster │
│ QQuery 35 │  1189.81 ms │              1193.48 ms │     no change │
│ QQuery 36 │   192.42 ms │               183.67 ms │     no change │
│ QQuery 37 │    72.35 ms │                71.46 ms │     no change │
│ QQuery 38 │   112.83 ms │               109.74 ms │     no change │
│ QQuery 39 │   333.55 ms │               343.30 ms │     no change │
│ QQuery 40 │    39.65 ms │                41.54 ms │     no change │
│ QQuery 41 │    35.17 ms │                35.20 ms │     no change │
│ QQuery 42 │    32.01 ms │                31.40 ms │     no change │
└───────────┴─────────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 93337.70ms │
│ Total Time (alamb_clickbenchmaxxing)   │ 89283.66ms │
│ Average Time (HEAD)                    │  2170.64ms │
│ Average Time (alamb_clickbenchmaxxing) │  2076.36ms │
│ Queries Faster                         │          7 │
│ Queries Slower                         │          0 │
│ Queries with No Change                 │         36 │
│ Queries with Failure                   │          0 │
└────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ alamb_clickbenchmaxxing ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 127.44 ms │               129.09 ms │    no change │
│ QQuery 2  │  32.38 ms │                32.43 ms │    no change │
│ QQuery 3  │  39.81 ms │                40.36 ms │    no change │
│ QQuery 4  │  34.14 ms │                36.28 ms │ 1.06x slower │
│ QQuery 5  │  91.13 ms │                94.36 ms │    no change │
│ QQuery 6  │  24.96 ms │                24.59 ms │    no change │
│ QQuery 7  │ 153.22 ms │               162.04 ms │ 1.06x slower │
│ QQuery 8  │  41.10 ms │                40.89 ms │    no change │
│ QQuery 9  │ 106.05 ms │               112.43 ms │ 1.06x slower │
│ QQuery 10 │  72.32 ms │                72.80 ms │    no change │
│ QQuery 11 │  19.17 ms │                18.93 ms │    no change │
│ QQuery 12 │  68.72 ms │                66.92 ms │    no change │
│ QQuery 13 │  53.94 ms │                54.82 ms │    no change │
│ QQuery 14 │  15.75 ms │                15.98 ms │    no change │
│ QQuery 15 │  33.01 ms │                33.53 ms │    no change │
│ QQuery 16 │  29.85 ms │                30.16 ms │    no change │
│ QQuery 17 │ 158.25 ms │               168.64 ms │ 1.07x slower │
│ QQuery 18 │ 300.66 ms │               297.92 ms │    no change │
│ QQuery 19 │  53.63 ms │                52.82 ms │    no change │
│ QQuery 20 │  63.56 ms │                62.46 ms │    no change │
│ QQuery 21 │ 195.36 ms │               205.15 ms │ 1.05x slower │
│ QQuery 22 │  23.66 ms │                24.82 ms │    no change │
└───────────┴───────────┴─────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 1738.10ms │
│ Total Time (alamb_clickbenchmaxxing)   │ 1777.44ms │
│ Average Time (HEAD)                    │   79.00ms │
│ Average Time (alamb_clickbenchmaxxing) │   80.79ms │
│ Queries Faster                         │         0 │
│ Queries Slower                         │         5 │
│ Queries with No Change                 │        17 │
│ Queries with Failure                   │         0 │
└────────────────────────────────────────┴───────────┘

@alamb alamb force-pushed the alamb/clickbenchmaxxing branch from b031939 to 25350af Compare March 6, 2026 08:37
@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/clickbenchmaxxing (25350af) to 678d1ad diff using: clickbench_partitioned
Results will be posted here when complete

@adriangb
Copy link
Contributor

adriangb commented Mar 6, 2026

Last tiem @adriangb looked into that I think he concluded that a bunch of the variability is related to if the TopK finds the top values quickly enough to skip some of the files entirely but that depends on a race to read the "right" rows.

My hope is that after #20362 we can get #20304 across the line which should nuke the issue of a race with randomly ordered files. In theory the filter pushdown should only be essential when there is no stats sorting between files (i.e. the values being ordered by are randomly distributed).

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and alamb_clickbenchmaxxing
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ alamb_clickbenchmaxxing ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.54 ms │                 2.62 ms │     no change │
│ QQuery 1  │    50.14 ms │                51.43 ms │     no change │
│ QQuery 2  │   168.04 ms │               167.34 ms │     no change │
│ QQuery 3  │   163.78 ms │               167.29 ms │     no change │
│ QQuery 4  │  1067.73 ms │              1113.58 ms │     no change │
│ QQuery 5  │  1264.03 ms │              1344.50 ms │  1.06x slower │
│ QQuery 6  │     7.21 ms │                 6.96 ms │     no change │
│ QQuery 7  │    63.34 ms │                55.27 ms │ +1.15x faster │
│ QQuery 8  │  1487.70 ms │              1551.31 ms │     no change │
│ QQuery 9  │  1888.48 ms │              1875.31 ms │     no change │
│ QQuery 10 │   341.17 ms │               349.54 ms │     no change │
│ QQuery 11 │   388.38 ms │               394.92 ms │     no change │
│ QQuery 12 │  1194.21 ms │              1264.94 ms │  1.06x slower │
│ QQuery 13 │  2017.22 ms │              1983.95 ms │     no change │
│ QQuery 14 │  1224.31 ms │              1257.11 ms │     no change │
│ QQuery 15 │  1237.39 ms │              1324.31 ms │  1.07x slower │
│ QQuery 16 │  2642.31 ms │              2715.13 ms │     no change │
│ QQuery 17 │  2616.95 ms │              2707.49 ms │     no change │
│ QQuery 18 │  5492.45 ms │              5432.77 ms │     no change │
│ QQuery 19 │   130.62 ms │               129.56 ms │     no change │
│ QQuery 20 │  1910.25 ms │              1969.18 ms │     no change │
│ QQuery 21 │  2176.39 ms │              2212.04 ms │     no change │
│ QQuery 22 │  5545.26 ms │              3889.41 ms │ +1.43x faster │
│ QQuery 23 │ 32967.46 ms │             12245.87 ms │ +2.69x faster │
│ QQuery 24 │   208.06 ms │               203.45 ms │     no change │
│ QQuery 25 │   442.96 ms │               457.15 ms │     no change │
│ QQuery 26 │   190.14 ms │               206.12 ms │  1.08x slower │
│ QQuery 27 │  2782.68 ms │              2807.31 ms │     no change │
│ QQuery 28 │ 22973.34 ms │             25500.72 ms │  1.11x slower │
│ QQuery 29 │  1031.29 ms │               140.26 ms │ +7.35x faster │
│ QQuery 30 │  1270.05 ms │              1279.33 ms │     no change │
│ QQuery 31 │  1382.37 ms │              1412.74 ms │     no change │
│ QQuery 32 │  4863.40 ms │              4745.70 ms │     no change │
│ QQuery 33 │  5985.35 ms │              6089.61 ms │     no change │
│ QQuery 34 │  6529.29 ms │              6273.36 ms │     no change │
│ QQuery 35 │  1250.56 ms │              1244.65 ms │     no change │
│ QQuery 36 │   191.57 ms │               197.65 ms │     no change │
│ QQuery 37 │    71.54 ms │                72.82 ms │     no change │
│ QQuery 38 │   109.77 ms │               112.78 ms │     no change │
│ QQuery 39 │   337.49 ms │               355.67 ms │  1.05x slower │
│ QQuery 40 │    42.29 ms │                41.28 ms │     no change │
│ QQuery 41 │    33.11 ms │                35.94 ms │  1.09x slower │
│ QQuery 42 │    31.36 ms │                32.05 ms │     no change │
└───────────┴─────────────┴─────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                      ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                      │ 115773.95ms │
│ Total Time (alamb_clickbenchmaxxing)   │  95420.42ms │
│ Average Time (HEAD)                    │   2692.42ms │
│ Average Time (alamb_clickbenchmaxxing) │   2219.08ms │
│ Queries Faster                         │           4 │
│ Queries Slower                         │           7 │
│ Queries with No Change                 │          32 │
│ Queries with Failure                   │           0 │
└────────────────────────────────────────┴─────────────┘

@alamb
Copy link
Contributor Author

alamb commented Mar 6, 2026

While I have the tests passing and I think this PR is functioanlly good now, I really don't like the API as it makes things more complicated and slow for what is basically a one-off optimization.

I have an idea of how to restructure the APIs to be more isolated, which I think befits such a special optimization

@Dandandan
Copy link
Contributor

Last tiem @adriangb looked into that I think he concluded that a bunch of the variability is related to if the TopK finds the top values quickly enough to skip some of the files entirely but that depends on a race to read the "right" rows.

My hope is that after #20362 we can get #20304 across the line which should nuke the issue of a race with randomly ordered files. In theory the filter pushdown should only be essential when there is no stats sorting between files (i.e. the values being ordered by are randomly distributed).

I agree - I saw also some huge variability in speedups for #20481 - I think with some great (row group or otherwise) ordering, some queries could have 3x+ speedups

@alamb
Copy link
Contributor Author

alamb commented Mar 6, 2026

I have a better API in a new PR

Closing this one for the time being

@alamb alamb closed this Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules performance Make DataFusion faster sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make Clickbench Q29 5x faster for datafusion by extracting SUM(..) clauses

5 participants