Skip to content

Rewrite SUM(expr + scalar) --> SUM(expr) + scalar*COUNT(expr) #20749

Draft
alamb wants to merge 3 commits intoapache:mainfrom
alamb:alamb/clickbenchmaxxing_2
Draft

Rewrite SUM(expr + scalar) --> SUM(expr) + scalar*COUNT(expr) #20749
alamb wants to merge 3 commits intoapache:mainfrom
alamb:alamb/clickbenchmaxxing_2

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Mar 6, 2026

Draft until:

Which issue does this PR close?

Rationale for this change

I want DataFusion to be the fastest parquet engine on ClickBench. One of the queries where DataFusion is significantly slower is Query 29 which has a very strange pattern of many aggregate functions that are offset by a constant:

SELECT SUM("ResolutionWidth"), SUM("ResolutionWidth" + 1), SUM("ResolutionWidth" + 2), SUM("ResolutionWidth" + 3), SUM("ResolutionWidth" + 4), SUM("ResolutionWidth" + 5), SUM("ResolutionWidth" + 6), SUM("ResolutionWidth" + 7), SUM("ResolutionWidth" + 8), SUM("ResolutionWidth" + 9), SUM("ResolutionWidth" + 10), SUM("ResolutionWidth" + 11), SUM("ResolutionWidth" + 12), SUM("ResolutionWidth" + 13), SUM("ResolutionWidth" + 14), SUM("ResolutionWidth" + 15), SUM("ResolutionWidth" + 16), SUM("ResolutionWidth" + 17), SUM("ResolutionWidth" + 18), SUM("ResolutionWidth" + 19), SUM("ResolutionWidth" + 20), SUM("ResolutionWidth" + 21), SUM("ResolutionWidth" + 22), SUM("ResolutionWidth" + 23), SUM("ResolutionWidth" + 24), SUM("ResolutionWidth" + 25), SUM("ResolutionWidth" + 26), SUM("ResolutionWidth" + 27), SUM("ResolutionWidth" + 28), SUM("ResolutionWidth" + 29), SUM("ResolutionWidth" + 30), SUM("ResolutionWidth" + 31), SUM("ResolutionWidth" + 32), SUM("ResolutionWidth" + 33), SUM("ResolutionWidth" + 34), SUM("ResolutionWidth" + 35), SUM("ResolutionWidth" + 36), SUM("ResolutionWidth" + 37), SUM("ResolutionWidth" + 38), SUM("ResolutionWidth" + 39), SUM("ResolutionWidth" + 40), SUM("ResolutionWidth" + 41), SUM("ResolutionWidth" + 42), SUM("ResolutionWidth" + 43), SUM("ResolutionWidth" + 44), SUM("ResolutionWidth" + 45), SUM("ResolutionWidth" + 46), SUM("ResolutionWidth" + 47), SUM("ResolutionWidth" + 48), SUM("ResolutionWidth" + 49), SUM("ResolutionWidth" + 50), SUM("ResolutionWidth" + 51), SUM("ResolutionWidth" + 52), SUM("ResolutionWidth" + 53), SUM("ResolutionWidth" + 54), SUM("ResolutionWidth" + 55), SUM("ResolutionWidth" + 56), SUM("ResolutionWidth" + 57), SUM("ResolutionWidth" + 58), SUM("ResolutionWidth" + 59), SUM("ResolutionWidth" + 60), SUM("ResolutionWidth" + 61), SUM("ResolutionWidth" + 62), SUM("ResolutionWidth" + 63), SUM("ResolutionWidth" + 64), SUM("ResolutionWidth" + 65), SUM("ResolutionWidth" + 66), SUM("ResolutionWidth" + 67), SUM("ResolutionWidth" + 68), SUM("ResolutionWidth" + 69), SUM("ResolutionWidth" + 70), SUM("ResolutionWidth" + 71), SUM("ResolutionWidth" + 72), SUM("ResolutionWidth" + 73), SUM("ResolutionWidth" + 74), SUM("ResolutionWidth" + 75), SUM("ResolutionWidth" + 76), SUM("ResolutionWidth" + 77), SUM("ResolutionWidth" + 78), SUM("ResolutionWidth" + 79), SUM("ResolutionWidth" + 80), SUM("ResolutionWidth" + 81), SUM("ResolutionWidth" + 82), SUM("ResolutionWidth" + 83), SUM("ResolutionWidth" + 84), SUM("ResolutionWidth" + 85), SUM("ResolutionWidth" + 86), SUM("ResolutionWidth" + 87), SUM("ResolutionWidth" + 88), SUM("ResolutionWidth" + 89) FROM hits;

This is not a pattern I have ever seen in a real query, but it seems like the engine currently at the top of the ClickBench leaderboard has a special case for this pattern. ClickHouse probably does too. See

Thus I reluctantly conclude that we should have one too.

What changes are included in this PR?

This is an alternate to my first attempt.

In particular, since this is such a ClickBench specific rule, I wanted to

  1. Minimize the downstream API / upgrade impact (aka not change existing APIs)

  2. Optimize performance for the case where this rewrite will not apply (most times)

  3. Add a rewrite SUM(expr + scalar) --> SUM(expr) + scalar*COUNT(expr)

  4. Tests for same

Note there are quite a few other ideas to potentially make this more general on #15524 but I am going with the simple thing of making it work for the usecase we have in hand (ClickBench)

Are these changes tested?

Yes, new tests are added

Are there any user-facing changes?

Faster performance

🚀

│ QQuery 29 │  1012.63 ms │                 139.02 ms │ +7.28x faster │

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Mar 6, 2026
@alamb alamb force-pushed the alamb/clickbenchmaxxing_2 branch 2 times, most recently from a9d1598 to c20bf1e Compare March 6, 2026 17:07
@alamb
Copy link
Contributor Author

alamb commented Mar 6, 2026

run benchmark clickbench_partitioned

1 similar comment
@alamb
Copy link
Contributor Author

alamb commented Mar 6, 2026

run benchmark clickbench_partitioned

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/clickbenchmaxxing_2 (567eab3) to 678d1ad diff using: clickbench_partitioned
Results will be posted here when complete

@alamb alamb force-pushed the alamb/clickbenchmaxxing_2 branch from 567eab3 to 7491c4d Compare March 6, 2026 17:35
@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and alamb_clickbenchmaxxing_2
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ alamb_clickbenchmaxxing_2 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.58 ms │                   2.60 ms │     no change │
│ QQuery 1  │    49.77 ms │                  51.48 ms │     no change │
│ QQuery 2  │   165.77 ms │                 172.08 ms │     no change │
│ QQuery 3  │   167.96 ms │                 173.29 ms │     no change │
│ QQuery 4  │  1089.67 ms │                1130.01 ms │     no change │
│ QQuery 5  │  1269.09 ms │                1382.95 ms │  1.09x slower │
│ QQuery 6  │     6.20 ms │                   6.72 ms │  1.08x slower │
│ QQuery 7  │    56.44 ms │                  55.75 ms │     no change │
│ QQuery 8  │  1537.36 ms │                1567.47 ms │     no change │
│ QQuery 9  │  1940.46 ms │                1989.69 ms │     no change │
│ QQuery 10 │   341.33 ms │                 359.79 ms │  1.05x slower │
│ QQuery 11 │   390.77 ms │                 423.57 ms │  1.08x slower │
│ QQuery 12 │  1230.89 ms │                1264.62 ms │     no change │
│ QQuery 13 │  2069.63 ms │                2092.79 ms │     no change │
│ QQuery 14 │  1282.77 ms │                1287.20 ms │     no change │
│ QQuery 15 │  1305.25 ms │                1291.93 ms │     no change │
│ QQuery 16 │  2707.23 ms │                2740.51 ms │     no change │
│ QQuery 17 │  2674.21 ms │                2715.77 ms │     no change │
│ QQuery 18 │  6469.70 ms │                5152.86 ms │ +1.26x faster │
│ QQuery 19 │   129.19 ms │                 127.01 ms │     no change │
│ QQuery 20 │  2026.85 ms │                1939.63 ms │     no change │
│ QQuery 21 │  2116.43 ms │                2310.85 ms │  1.09x slower │
│ QQuery 22 │  3686.05 ms │                3911.03 ms │  1.06x slower │
│ QQuery 23 │ 12041.76 ms │               11900.00 ms │     no change │
│ QQuery 24 │   192.00 ms │                 202.90 ms │  1.06x slower │
│ QQuery 25 │   431.98 ms │                 466.72 ms │  1.08x slower │
│ QQuery 26 │   193.34 ms │                 222.16 ms │  1.15x slower │
│ QQuery 27 │  2671.57 ms │                2807.22 ms │  1.05x slower │
│ QQuery 28 │ 23520.83 ms │               23643.79 ms │     no change │
│ QQuery 29 │  1022.85 ms │                 138.29 ms │ +7.40x faster │
│ QQuery 30 │  1266.42 ms │                1316.91 ms │     no change │
│ QQuery 31 │  1367.17 ms │                1396.91 ms │     no change │
│ QQuery 32 │  5018.90 ms │                5079.89 ms │     no change │
│ QQuery 33 │  5754.15 ms │                5877.53 ms │     no change │
│ QQuery 34 │  5741.05 ms │                6478.24 ms │  1.13x slower │
│ QQuery 35 │  1183.70 ms │                1250.69 ms │  1.06x slower │
│ QQuery 36 │   185.65 ms │                 192.05 ms │     no change │
│ QQuery 37 │    71.31 ms │                  74.05 ms │     no change │
│ QQuery 38 │   109.96 ms │                 113.68 ms │     no change │
│ QQuery 39 │   335.85 ms │                 344.13 ms │     no change │
│ QQuery 40 │    39.56 ms │                  39.34 ms │     no change │
│ QQuery 41 │    33.53 ms │                  34.98 ms │     no change │
│ QQuery 42 │    32.50 ms │                  30.96 ms │     no change │
└───────────┴─────────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 93929.69ms │
│ Total Time (alamb_clickbenchmaxxing_2)   │ 93760.04ms │
│ Average Time (HEAD)                      │  2184.41ms │
│ Average Time (alamb_clickbenchmaxxing_2) │  2180.47ms │
│ Queries Faster                           │          2 │
│ Queries Slower                           │         12 │
│ Queries with No Change                   │         29 │
│ Queries with Failure                     │          0 │
└──────────────────────────────────────────┴────────────┘

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/clickbenchmaxxing_2 (7491c4d) to 678d1ad diff using: clickbench_partitioned
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and alamb_clickbenchmaxxing_2
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ alamb_clickbenchmaxxing_2 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     3.88 ms │                   2.62 ms │ +1.48x faster │
│ QQuery 1  │    53.70 ms │                  51.54 ms │     no change │
│ QQuery 2  │   166.46 ms │                 166.54 ms │     no change │
│ QQuery 3  │   164.55 ms │                 171.38 ms │     no change │
│ QQuery 4  │  1129.40 ms │                1171.09 ms │     no change │
│ QQuery 5  │  1307.70 ms │                1393.57 ms │  1.07x slower │
│ QQuery 6  │     6.53 ms │                   6.82 ms │     no change │
│ QQuery 7  │    54.64 ms │                  56.75 ms │     no change │
│ QQuery 8  │  1561.30 ms │                1612.65 ms │     no change │
│ QQuery 9  │  1975.50 ms │                1946.14 ms │     no change │
│ QQuery 10 │   343.22 ms │                 357.39 ms │     no change │
│ QQuery 11 │   388.76 ms │                 405.29 ms │     no change │
│ QQuery 12 │  1251.20 ms │                1286.63 ms │     no change │
│ QQuery 13 │  2065.39 ms │                2062.19 ms │     no change │
│ QQuery 14 │  1238.23 ms │                1325.71 ms │  1.07x slower │
│ QQuery 15 │  1302.27 ms │                1348.43 ms │     no change │
│ QQuery 16 │  2659.16 ms │                2704.48 ms │     no change │
│ QQuery 17 │  2648.03 ms │                2686.79 ms │     no change │
│ QQuery 18 │  5129.66 ms │                5212.17 ms │     no change │
│ QQuery 19 │   125.28 ms │                 126.28 ms │     no change │
│ QQuery 20 │  1800.68 ms │                1890.69 ms │     no change │
│ QQuery 21 │  2087.10 ms │                2094.83 ms │     no change │
│ QQuery 22 │  3683.49 ms │                3688.84 ms │     no change │
│ QQuery 23 │ 11806.65 ms │               11722.96 ms │     no change │
│ QQuery 24 │   200.88 ms │                 186.29 ms │ +1.08x faster │
│ QQuery 25 │   438.95 ms │                 429.40 ms │     no change │
│ QQuery 26 │   192.54 ms │                 198.15 ms │     no change │
│ QQuery 27 │  2676.75 ms │                2705.09 ms │     no change │
│ QQuery 28 │ 22055.53 ms │               24466.75 ms │  1.11x slower │
│ QQuery 29 │  1012.63 ms │                 139.02 ms │ +7.28x faster │
│ QQuery 30 │  1273.25 ms │                1300.03 ms │     no change │
│ QQuery 31 │  1325.74 ms │                1396.83 ms │  1.05x slower │
│ QQuery 32 │  4686.56 ms │                5073.49 ms │  1.08x slower │
│ QQuery 33 │  5733.62 ms │                5943.81 ms │     no change │
│ QQuery 34 │  5913.31 ms │                6211.14 ms │  1.05x slower │
│ QQuery 35 │  1231.33 ms │                1262.30 ms │     no change │
│ QQuery 36 │   181.83 ms │                 183.93 ms │     no change │
│ QQuery 37 │    73.81 ms │                  73.30 ms │     no change │
│ QQuery 38 │   110.81 ms │                 112.88 ms │     no change │
│ QQuery 39 │   337.69 ms │                 339.17 ms │     no change │
│ QQuery 40 │    42.42 ms │                  39.50 ms │ +1.07x faster │
│ QQuery 41 │    35.07 ms │                  36.70 ms │     no change │
│ QQuery 42 │    31.74 ms │                  33.63 ms │  1.06x slower │
└───────────┴─────────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 90507.22ms │
│ Total Time (alamb_clickbenchmaxxing_2)   │ 93623.19ms │
│ Average Time (HEAD)                      │  2104.82ms │
│ Average Time (alamb_clickbenchmaxxing_2) │  2177.28ms │
│ Queries Faster                           │          4 │
│ Queries Slower                           │          7 │
│ Queries with No Change                   │         32 │
│ Queries with Failure                     │          0 │
└──────────────────────────────────────────┴────────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make Clickbench Q29 5x faster for datafusion by extracting SUM(..) clauses

2 participants