Coalesce hash repartition batches using BatchCoalescer#20741
Coalesce hash repartition batches using BatchCoalescer#20741Dandandan wants to merge 6 commits intoapache:mainfrom
Conversation
|
run benchmarks |
|
🤖 |
In hash repartitioning, input batches are split into sub-batches per output partition. With many partitions, these sub-batches can be very small (e.g. 8192/16 = ~512 rows). Previously each small sub-batch was sent immediately through the channel. This change uses Arrow's BatchCoalescer per output partition to accumulate rows via push_batch_with_indices until target_batch_size is reached before sending. This reduces channel traffic and produces properly-sized output batches. For hash repartitioning, receiver-side coalescing is removed since the sender now produces properly-sized batches. Round-robin and preserve-order modes retain their existing coalescing behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
15f10c8 to
ef571e2
Compare
|
🤖: Benchmark completed Details
|
|
run benchmark tpch |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
run benchmarks |
|
run benchmark tpch |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
run benchmark tpch_mem tpch tpcds |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
Looks slightly faster with the (now I think unnecessary) yielding removed. |
|
🤖: Benchmark completed Details
|
|
run benchmarks |
|
🤖 |
|
🤖: Benchmark completed Details
|
…rtition Use LimitedBatchCoalescer on the sender side for all partitioning schemes, removing the separate receiver-side coalescing and the hash_partition_indices optimization. This simplifies the code significantly (-155 lines net). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integrate LimitedBatchCoalescer into BatchPartitioner so hash repartitioning uses push_batch_with_indices directly, avoiding materializing intermediate sub-batches. Skip sender-side coalescing for preserve_order mode since the subsequent merge sort already does batching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
run benchmarks |
|
run benchmark tpch tpch_mem |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
Summary
Use coalesce batches during repartition, not when pulling repartitions. Now we can avoid skipping yielding every partition batches, as the upstream plan will have some batch to work with (rather than really small batches).