Skip to content

[bench]#20764

Closed
Dandandan wants to merge 1 commit intoapache:mainfrom
Dandandan:claude/refactor-repartitionexec-join-all-yjREg
Closed

[bench]#20764
Dandandan wants to merge 1 commit intoapache:mainfrom
Dandandan:claude/refactor-repartitionexec-join-all-yjREg

Conversation

@Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Mar 6, 2026

Which issue does this PR close?

  • Closes #.

Rationale for this change

The current implementation sends error messages to output partitions sequentially using a for loop. This can be inefficient when there are many output partitions, as each send() operation is awaited individually. By using futures::future::join_all(), we can parallelize these operations, allowing multiple sends to happen concurrently rather than sequentially.

This change improves performance in error scenarios by reducing the total time spent notifying all output partitions of errors or completion.

What changes are included in this PR?

Are these changes tested?

The changes are covered by existing tests in the DataFusion test suite. The refactoring maintains the same functional behavior (all error messages and completion signals are still sent to all output partitions), only changing the execution model from sequential to concurrent.

Are there any user-facing changes?

No user-facing changes. This is an internal optimization to the physical execution layer that improves performance without changing the external API or behavior.

https://claude.ai/code/session_01GDTBavJzih6tSSBd9SRNmk

Replace sequential send loops in wait_for_task with
futures::future::join_all to send to all output partitions
concurrently instead of one at a time.

https://claude.ai/code/session_01GDTBavJzih6tSSBd9SRNmk
@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Mar 6, 2026
@Dandandan
Copy link
Contributor Author

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing claude/refactor-repartitionexec-join-all-yjREg (ce2d8c0) to d025869 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and claude_refactor-repartitionexec-join-all-yjREg
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ claude_refactor-repartitionexec-join-all-yjREg ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2464.74 ms │                                     2405.46 ms │     no change │
│ QQuery 1 │   870.74 ms │                                      991.87 ms │  1.14x slower │
│ QQuery 2 │  1801.42 ms │                                     1872.13 ms │     no change │
│ QQuery 3 │  1111.50 ms │                                     1151.79 ms │     no change │
│ QQuery 4 │  2419.04 ms │                                     2236.88 ms │ +1.08x faster │
│ QQuery 5 │ 28116.71 ms │                                    26641.09 ms │ +1.06x faster │
│ QQuery 6 │  4003.98 ms │                                     3909.21 ms │     no change │
│ QQuery 7 │  2689.47 ms │                                     2839.63 ms │  1.06x slower │
└──────────┴─────────────┴────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                             │ 43477.60ms │
│ Total Time (claude_refactor-repartitionexec-join-all-yjREg)   │ 42048.05ms │
│ Average Time (HEAD)                                           │  5434.70ms │
│ Average Time (claude_refactor-repartitionexec-join-all-yjREg) │  5256.01ms │
│ Queries Faster                                                │          2 │
│ Queries Slower                                                │          2 │
│ Queries with No Change                                        │          4 │
│ Queries with Failure                                          │          0 │
└───────────────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ claude_refactor-repartitionexec-join-all-yjREg ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.60 ms │                                        2.63 ms │     no change │
│ QQuery 1  │    50.62 ms │                                       51.20 ms │     no change │
│ QQuery 2  │   166.50 ms │                                      162.21 ms │     no change │
│ QQuery 3  │   172.80 ms │                                      171.30 ms │     no change │
│ QQuery 4  │  1100.58 ms │                                     1136.76 ms │     no change │
│ QQuery 5  │  1322.44 ms │                                     1344.36 ms │     no change │
│ QQuery 6  │     6.86 ms │                                        7.32 ms │  1.07x slower │
│ QQuery 7  │    56.96 ms │                                       55.41 ms │     no change │
│ QQuery 8  │  1500.90 ms │                                     1590.76 ms │  1.06x slower │
│ QQuery 9  │  1911.13 ms │                                     1935.15 ms │     no change │
│ QQuery 10 │   344.37 ms │                                      354.16 ms │     no change │
│ QQuery 11 │   398.73 ms │                                      396.48 ms │     no change │
│ QQuery 12 │  1233.38 ms │                                     1279.32 ms │     no change │
│ QQuery 13 │  2065.48 ms │                                     2092.35 ms │     no change │
│ QQuery 14 │  1266.99 ms │                                     1294.58 ms │     no change │
│ QQuery 15 │  1331.78 ms │                                     1353.69 ms │     no change │
│ QQuery 16 │  2669.35 ms │                                     2754.94 ms │     no change │
│ QQuery 17 │  2730.11 ms │                                     2730.33 ms │     no change │
│ QQuery 18 │  5941.24 ms │                                     5383.38 ms │ +1.10x faster │
│ QQuery 19 │   126.06 ms │                                      129.53 ms │     no change │
│ QQuery 20 │  1861.74 ms │                                     1911.35 ms │     no change │
│ QQuery 21 │  2140.54 ms │                                     2172.56 ms │     no change │
│ QQuery 22 │  3916.65 ms │                                     3888.18 ms │     no change │
│ QQuery 23 │ 34125.46 ms │                                    11951.03 ms │ +2.86x faster │
│ QQuery 24 │   190.59 ms │                                      209.75 ms │  1.10x slower │
│ QQuery 25 │   448.13 ms │                                      449.15 ms │     no change │
│ QQuery 26 │   211.96 ms │                                      210.03 ms │     no change │
│ QQuery 27 │  2740.39 ms │                                     2856.45 ms │     no change │
│ QQuery 28 │ 23545.48 ms │                                    26230.69 ms │  1.11x slower │
│ QQuery 29 │  1034.82 ms │                                     1059.78 ms │     no change │
│ QQuery 30 │  1269.40 ms │                                     1285.51 ms │     no change │
│ QQuery 31 │  1395.03 ms │                                     1430.58 ms │     no change │
│ QQuery 32 │  5071.02 ms │                                     4946.06 ms │     no change │
│ QQuery 33 │  5985.68 ms │                                     6072.74 ms │     no change │
│ QQuery 34 │  6618.94 ms │                                     6521.18 ms │     no change │
│ QQuery 35 │  2060.94 ms │                                     2017.64 ms │     no change │
│ QQuery 36 │   188.67 ms │                                      188.71 ms │     no change │
│ QQuery 37 │    69.22 ms │                                       75.66 ms │  1.09x slower │
│ QQuery 38 │   107.36 ms │                                      116.06 ms │  1.08x slower │
│ QQuery 39 │   347.99 ms │                                      349.40 ms │     no change │
│ QQuery 40 │    38.27 ms │                                       40.13 ms │     no change │
│ QQuery 41 │    36.41 ms │                                       34.45 ms │ +1.06x faster │
│ QQuery 42 │    31.80 ms │                                       31.87 ms │     no change │
└───────────┴─────────────┴────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                                             ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                                             │ 117835.35ms │
│ Total Time (claude_refactor-repartitionexec-join-all-yjREg)   │  98274.82ms │
│ Average Time (HEAD)                                           │   2740.36ms │
│ Average Time (claude_refactor-repartitionexec-join-all-yjREg) │   2285.46ms │
│ Queries Faster                                                │           3 │
│ Queries Slower                                                │           6 │
│ Queries with No Change                                        │          34 │
│ Queries with Failure                                          │           0 │
└───────────────────────────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ claude_refactor-repartitionexec-join-all-yjREg ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 130.19 ms │                                      129.35 ms │    no change │
│ QQuery 2  │  32.23 ms │                                       32.46 ms │    no change │
│ QQuery 3  │  41.61 ms │                                       41.25 ms │    no change │
│ QQuery 4  │  35.11 ms │                                       35.22 ms │    no change │
│ QQuery 5  │  92.34 ms │                                       92.90 ms │    no change │
│ QQuery 6  │  24.50 ms │                                       24.29 ms │    no change │
│ QQuery 7  │ 155.27 ms │                                      157.12 ms │    no change │
│ QQuery 8  │  40.63 ms │                                       41.41 ms │    no change │
│ QQuery 9  │ 115.82 ms │                                      113.96 ms │    no change │
│ QQuery 10 │  71.93 ms │                                       76.11 ms │ 1.06x slower │
│ QQuery 11 │  19.17 ms │                                       19.02 ms │    no change │
│ QQuery 12 │  66.56 ms │                                       67.89 ms │    no change │
│ QQuery 13 │  56.36 ms │                                       54.82 ms │    no change │
│ QQuery 14 │  15.89 ms │                                       15.54 ms │    no change │
│ QQuery 15 │  32.95 ms │                                       32.89 ms │    no change │
│ QQuery 16 │  31.60 ms │                                       30.57 ms │    no change │
│ QQuery 17 │ 168.66 ms │                                      172.29 ms │    no change │
│ QQuery 18 │ 301.52 ms │                                      294.74 ms │    no change │
│ QQuery 19 │  53.85 ms │                                       53.86 ms │    no change │
│ QQuery 20 │  60.96 ms │                                       61.87 ms │    no change │
│ QQuery 21 │ 202.06 ms │                                      201.42 ms │    no change │
│ QQuery 22 │  25.62 ms │                                       24.93 ms │    no change │
└───────────┴───────────┴────────────────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                             ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                             │ 1774.85ms │
│ Total Time (claude_refactor-repartitionexec-join-all-yjREg)   │ 1773.91ms │
│ Average Time (HEAD)                                           │   80.67ms │
│ Average Time (claude_refactor-repartitionexec-join-all-yjREg) │   80.63ms │
│ Queries Faster                                                │         0 │
│ Queries Slower                                                │         1 │
│ Queries with No Change                                        │        21 │
│ Queries with Failure                                          │         0 │
└───────────────────────────────────────────────────────────────┴───────────┘

@Dandandan Dandandan closed this Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants