Skip to content

Perf: Pre-size buffer allocations to avoid intermediate allocations#10262

Open
Rich-T-kid wants to merge 3 commits into
apache:mainfrom
Rich-T-kid:rich-T-kid/re-use-buffers
Open

Perf: Pre-size buffer allocations to avoid intermediate allocations#10262
Rich-T-kid wants to merge 3 commits into
apache:mainfrom
Rich-T-kid:rich-T-kid/re-use-buffers

Conversation

@Rich-T-kid

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

TLDR: Its useful to pre-allocate vectors when you know the amount of data it will require

When IpcDataGenerator uses the IpcBodySink::Write variant, record batch buffer bytes are written directly into a Vec. If that Vec is undersized, it repeatedly reallocates and copies bytes into a larger buffer, growing exponentially (1, 4, 16, 32 ... KB ... MB) and paying two costs on each reallocation:

  1. an OS memory request and
  2. a full copy of existing bytes into the new buffer.

For large batches this cascade is expensive, and paying it fresh on every record batch chunk compounds the problem further. Since FlightDataEncoder::split_batch_for_grpc_response splits record batches into roughly equal-sized chunks, we exploit this by using the previous buffer's final capacity as an estimate for the next call, keeping a correctly-sized Vec alive across iterations and avoiding repeated reallocation on the hot path.

why not pre-allocate the buffers using an estimate with the length split_batch_for_grpc_response uses?

Using the final capacity rather than the uncompressed dictionary size is intentional, since IPC encoding and compression both affect the actual bytes written, the final capacity naturally adapts to whatever encoding and compression settings are in effect rather than consistently overprovisioning.

What changes are included in this PR?

  • Move the scratch buffer out of ipc_write_context via mem::take (zero copy)
  • Write the IPC bytes into the buffer
  • Record the final capacity
  • Pre-allocate a fresh scratch buffer at that capacity for the next call

Are these changes tested?

n/a

Are there any user-facing changes?

no

@github-actions github-actions Bot added the arrow Changes to the arrow crate label Jul 2, 2026
@Rich-T-kid

Copy link
Copy Markdown
Contributor Author

pretty big descriptions for a (+3,-2) PR 😅

@Jefffrey Jefffrey left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty big descriptions for a (+3,-2) PR 😅

you love to see it 🙂

Comment thread arrow-ipc/src/writer.rs Outdated
@Jefffrey

Jefffrey commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

run benchmark flight

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4863304054-797-gbhwf 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing rich-T-kid/re-use-buffers (53f1c95) to 32bba5a (merge-base) diff
BENCH_NAME=flight
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench flight
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                     main                                    rich-T-kid_re-use-buffers
-----                                     ----                                    -------------------------
decode/fixed/65536x1                      1.00     48.3±0.21µs    40.4 GB/sec     1.06     51.1±0.56µs    38.2 GB/sec
decode/fixed/65536x4                      1.01    264.0±4.32µs    29.6 GB/sec     1.00    260.9±2.75µs    29.9 GB/sec
decode/fixed/65536x8                      1.00   580.1±12.93µs    26.9 GB/sec     7.11      4.1±0.06ms     3.8 GB/sec
decode/fixed/8192x1                       1.01      7.8±0.05µs    31.2 GB/sec     1.00      7.7±0.08µs    31.6 GB/sec
decode/fixed/8192x4                       1.00     28.4±0.21µs    34.4 GB/sec     1.02     28.9±0.16µs    33.8 GB/sec
decode/fixed/8192x8                       1.08     66.5±1.18µs    29.4 GB/sec     1.00     61.8±1.16µs    31.6 GB/sec
decode/nested/65536x1                     1.00  685.2±167.09µs     7.1 GB/sec     1.01  690.9±165.54µs     7.1 GB/sec
decode/nested/65536x4                     1.01      3.1±0.68ms     6.3 GB/sec     1.00      3.1±0.68ms     6.4 GB/sec
decode/nested/65536x8                     2.21     14.7±1.35ms     2.7 GB/sec     1.00      6.6±1.33ms     5.9 GB/sec
decode/nested/8192x1                      1.00    82.9±20.64µs     7.4 GB/sec     1.02    84.5±20.74µs     7.2 GB/sec
decode/nested/8192x4                      1.00   352.6±82.85µs     6.9 GB/sec     1.01   354.6±84.64µs     6.9 GB/sec
decode/nested/8192x8                      1.00  721.6±166.32µs     6.8 GB/sec     1.01  730.3±167.18µs     6.7 GB/sec
decode/variable/65536x1                   1.01  1216.0±184.37µs     7.2 GB/sec    1.00  1204.3±185.93µs     7.3 GB/sec
decode/variable/65536x4                   1.01      5.7±0.63ms     6.2 GB/sec     1.00      5.6±0.72ms     6.3 GB/sec
decode/variable/65536x8                   1.38     15.7±1.45ms     4.5 GB/sec     1.00     11.4±1.38ms     6.2 GB/sec
decode/variable/8192x1                    1.00   134.0±22.30µs     8.2 GB/sec     1.01   135.6±20.98µs     8.1 GB/sec
decode/variable/8192x4                    1.00   586.5±91.85µs     7.5 GB/sec     1.00   587.5±87.81µs     7.5 GB/sec
decode/variable/8192x8                    1.00  1216.8±181.84µs     7.2 GB/sec    1.04  1263.3±164.65µs     7.0 GB/sec
decode_stream/dict/65536x1x4              1.00   182.1±34.61µs     5.4 GB/sec     1.07   194.9±27.46µs     5.0 GB/sec
decode_stream/dict/65536x4x4              1.01  773.6±118.81µs     5.1 GB/sec     1.00  767.2±134.83µs     5.1 GB/sec
decode_stream/dict/65536x8x4              1.00  1596.7±170.94µs     4.9 GB/sec    1.04  1652.8±282.59µs     4.8 GB/sec
decode_stream/dict/8192x1x4               1.00     26.0±0.34µs     4.9 GB/sec     1.01     26.2±0.40µs     4.9 GB/sec
decode_stream/dict/8192x4x4               1.00    101.2±2.36µs     5.0 GB/sec     1.03    104.1±9.79µs     4.9 GB/sec
decode_stream/dict/8192x8x4               1.00    205.5±2.23µs     5.0 GB/sec     1.02    209.8±6.14µs     4.9 GB/sec
decode_stream/fixed/65536x1x4             1.07     52.2±0.52µs    37.4 GB/sec     1.00     48.8±0.16µs    40.0 GB/sec
decode_stream/fixed/65536x4x4             1.01    269.1±2.33µs    29.0 GB/sec     1.00    267.4±3.07µs    29.2 GB/sec
decode_stream/fixed/65536x8x4             1.00  590.0±119.26µs    26.5 GB/sec     1.01   594.1±98.28µs    26.3 GB/sec
decode_stream/fixed/8192x1x4              1.00      7.8±0.03µs    31.5 GB/sec     1.00      7.7±0.06µs    31.7 GB/sec
decode_stream/fixed/8192x4x4              1.00     28.4±0.35µs    34.4 GB/sec     1.04     29.5±0.10µs    33.2 GB/sec
decode_stream/fixed/8192x8x4              1.00     67.0±0.25µs    29.2 GB/sec     1.00     66.8±1.11µs    29.3 GB/sec
decode_stream/nested/65536x1x4            1.00  683.4±165.09µs     7.1 GB/sec     1.01  692.7±167.62µs     7.0 GB/sec
decode_stream/nested/65536x4x4            1.08      3.3±0.68ms     6.0 GB/sec     1.00      3.0±0.68ms     6.4 GB/sec
decode_stream/nested/65536x8x4            1.00      6.5±1.36ms     6.1 GB/sec     1.01      6.5±1.35ms     6.0 GB/sec
decode_stream/nested/8192x1x4             1.00    83.9±20.63µs     7.3 GB/sec     1.01    84.6±20.73µs     7.2 GB/sec
decode_stream/nested/8192x4x4             1.00   350.4±82.81µs     7.0 GB/sec     1.01   352.9±84.00µs     6.9 GB/sec
decode_stream/nested/8192x8x4             1.00  720.6±166.19µs     6.8 GB/sec     1.01  730.7±168.83µs     6.7 GB/sec
decode_stream/variable/65536x1x4          1.00  1217.2±183.86µs     7.2 GB/sec    1.02  1247.4±171.40µs     7.0 GB/sec
decode_stream/variable/65536x4x4          1.01      5.8±0.59ms     6.1 GB/sec     1.00      5.7±0.57ms     6.1 GB/sec
decode_stream/variable/65536x8x4          1.00     11.6±1.38ms     6.1 GB/sec     1.58     18.3±1.35ms     3.8 GB/sec
decode_stream/variable/8192x1x4           1.03   140.3±19.42µs     7.8 GB/sec     1.00   136.1±21.22µs     8.1 GB/sec
decode_stream/variable/8192x4x4           1.04   606.0±78.94µs     7.3 GB/sec     1.00   583.1±89.02µs     7.5 GB/sec
decode_stream/variable/8192x8x4           1.01  1232.0±178.50µs     7.1 GB/sec    1.00  1216.0±181.84µs     7.2 GB/sec
do_put_dictionary/dict/hydrate/65536x1    1.00    376.9±6.55µs   667.0 MB/sec     1.01    379.4±5.60µs   662.7 MB/sec
do_put_dictionary/dict/hydrate/65536x4    1.00  1402.1±19.91µs   717.3 MB/sec     1.03  1447.9±73.81µs   694.5 MB/sec
do_put_dictionary/dict/hydrate/65536x8    1.00      3.4±0.27ms   593.4 MB/sec     1.09      3.7±0.34ms   542.4 MB/sec
do_put_dictionary/dict/hydrate/8192x1     1.01     91.5±1.19µs   356.8 MB/sec     1.00     90.5±1.28µs   360.7 MB/sec
do_put_dictionary/dict/hydrate/8192x4     1.00    205.1±2.89µs   636.9 MB/sec     1.03    210.5±3.32µs   620.6 MB/sec
do_put_dictionary/dict/hydrate/8192x8     1.00    371.3±5.57µs   703.7 MB/sec     1.00    371.8±7.04µs   702.8 MB/sec
do_put_dictionary/dict/resend/65536x1     1.00    108.5±1.60µs     2.3 GB/sec     1.00    108.0±2.74µs     2.3 GB/sec
do_put_dictionary/dict/resend/65536x4     1.00    292.5±3.50µs     3.4 GB/sec     1.00    292.0±3.33µs     3.4 GB/sec
do_put_dictionary/dict/resend/65536x8     1.02    521.1±8.03µs     3.8 GB/sec     1.00    510.0±5.90µs     3.9 GB/sec
do_put_dictionary/dict/resend/8192x1      1.03     61.0±1.06µs   535.6 MB/sec     1.00     59.4±0.78µs   549.4 MB/sec
do_put_dictionary/dict/resend/8192x4      1.01     83.1±0.94µs  1571.9 MB/sec     1.00     82.3±1.14µs  1586.8 MB/sec
do_put_dictionary/dict/resend/8192x8      1.00    114.6±1.67µs     2.2 GB/sec     1.00    114.8±1.80µs     2.2 GB/sec
encode/fixed/65536x1                      1.00     10.1±0.04µs    48.4 GB/sec     1.03     10.4±0.01µs    46.9 GB/sec
encode/fixed/65536x4                      1.00     51.4±0.27µs    38.0 GB/sec     9.64    495.7±1.02µs     3.9 GB/sec
encode/fixed/65536x8                      1.00   1063.8±2.72µs     3.7 GB/sec     1.04   1101.3±3.94µs     3.5 GB/sec
encode/fixed/8192x1                       1.00      3.2±0.01µs    18.9 GB/sec     1.03      3.4±0.01µs    18.2 GB/sec
encode/fixed/8192x4                       1.00      9.0±0.02µs    27.2 GB/sec     1.22     11.0±0.02µs    22.3 GB/sec
encode/fixed/8192x8                       1.00     18.0±0.03µs    27.2 GB/sec     1.24     22.3±0.05µs    22.0 GB/sec
encode/nested/65536x1                     1.00     28.7±0.41µs    42.6 GB/sec     1.02     29.1±0.26µs    41.9 GB/sec
encode/nested/65536x4                     1.00   1415.5±5.89µs     3.5 GB/sec     1.02   1446.3±4.81µs     3.4 GB/sec
encode/nested/65536x8                     1.05      3.2±0.04ms     3.1 GB/sec     1.00      3.0±0.04ms     3.2 GB/sec
encode/nested/8192x1                      1.00      5.8±0.01µs    26.5 GB/sec     1.14      6.6±0.01µs    23.3 GB/sec
encode/nested/8192x4                      1.00     21.2±0.05µs    28.9 GB/sec     1.03     21.9±0.04µs    27.9 GB/sec
encode/nested/8192x8                      1.01     48.4±0.12µs    25.3 GB/sec     1.00     48.0±0.08µs    25.4 GB/sec
encode/variable/65536x1                   1.08     64.7±0.27µs    33.9 GB/sec     1.00     59.9±0.37µs    36.7 GB/sec
encode/variable/65536x4                   1.04      2.5±0.03ms     3.6 GB/sec     1.00      2.4±0.02ms     3.7 GB/sec
encode/variable/65536x8                   1.09      5.7±0.05ms     3.1 GB/sec     1.00      5.2±0.07ms     3.4 GB/sec
encode/variable/8192x1                    1.00      6.9±0.01µs    39.6 GB/sec     1.42      9.9±0.01µs    27.9 GB/sec
encode/variable/8192x4                    1.02     26.8±0.06µs    41.0 GB/sec     1.00     26.3±0.05µs    41.7 GB/sec
encode/variable/8192x8                    1.10     86.9±0.26µs    25.3 GB/sec     1.00     78.8±0.21µs    27.9 GB/sec
roundtrip/fixed/65536x1                   1.01    315.1±4.21µs  1587.2 MB/sec     1.00    311.5±3.39µs  1605.6 MB/sec
roundtrip/fixed/65536x4                   1.01  1221.8±16.47µs  1637.2 MB/sec     1.00  1209.4±20.20µs  1654.0 MB/sec
roundtrip/fixed/65536x8                   1.00      2.2±0.02ms  1779.6 MB/sec     1.02      2.3±0.05ms  1744.4 MB/sec
roundtrip/fixed/8192x1                    1.03     92.1±1.29µs   679.6 MB/sec     1.00     89.9±1.10µs   696.6 MB/sec
roundtrip/fixed/8192x4                    1.00    201.0±2.08µs  1245.7 MB/sec     1.00    201.0±2.16µs  1245.8 MB/sec
roundtrip/fixed/8192x8                    1.00    345.3±4.77µs  1450.2 MB/sec     1.01    347.8±4.13µs  1439.8 MB/sec
roundtrip/nested/65536x1                  1.01   884.3±44.76µs  1413.8 MB/sec     1.00   879.8±43.93µs  1421.0 MB/sec
roundtrip/nested/65536x4                  1.00      4.3±0.14ms  1151.3 MB/sec     1.00      4.4±0.12ms  1149.1 MB/sec
roundtrip/nested/65536x8                  1.04      9.1±0.36ms  1098.6 MB/sec     1.00      8.7±0.30ms  1146.8 MB/sec
roundtrip/nested/8192x1                   1.03    161.8±6.33µs   966.7 MB/sec     1.00    157.1±5.09µs   995.9 MB/sec
roundtrip/nested/8192x4                   1.01   479.2±21.25µs  1306.0 MB/sec     1.00   472.6±21.80µs  1324.3 MB/sec
roundtrip/nested/8192x8                   1.02   945.4±40.91µs  1323.9 MB/sec     1.00   926.2±43.62µs  1351.4 MB/sec
roundtrip/variable/65536x1                1.04  1364.7±84.52µs  1648.8 MB/sec     1.00  1308.1±61.99µs  1720.2 MB/sec
roundtrip/variable/65536x4                1.07      8.4±0.28ms  1065.7 MB/sec     1.00      7.9±0.36ms  1143.7 MB/sec
roundtrip/variable/65536x8                1.08     14.9±0.42ms  1206.1 MB/sec     1.00     13.9±0.47ms  1298.9 MB/sec
roundtrip/variable/8192x1                 1.01    210.1±5.65µs  1339.3 MB/sec     1.00    208.6±5.46µs  1349.3 MB/sec
roundtrip/variable/8192x4                 1.00   703.5±22.70µs  1600.2 MB/sec     1.02   715.1±24.40µs  1574.3 MB/sec
roundtrip/variable/8192x8                 1.01  1259.8±24.13µs  1787.1 MB/sec     1.00  1244.6±24.20µs  1808.9 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 915.2s
Peak memory 171.1 MiB
Avg memory 63.7 MiB
CPU user 920.5s
CPU sys 136.5s
Peak spill 0 B

branch

Metric Value
Wall time 940.2s
Peak memory 165.1 MiB
Avg memory 66.0 MiB
CPU user 930.7s
CPU sys 150.3s
Peak spill 0 B

File an issue against this benchmark runner

@alamb

alamb commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Updating to get the "use single threaded executor" in the benchmarks

@alamb

alamb commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

run benchmark flight

1 similar comment
@alamb

alamb commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

run benchmark flight

@alamb

alamb commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Running twice to make sure results are reproducable

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4864901424-800-64jkw 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing rich-T-kid/re-use-buffers (7197cd2) to 8c7df18 (merge-base) diff
BENCH_NAME=flight
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench flight
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4864902071-801-7hbdx 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing rich-T-kid/re-use-buffers (7197cd2) to 8c7df18 (merge-base) diff
BENCH_NAME=flight
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench flight
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                     main                                    rich-T-kid_re-use-buffers
-----                                     ----                                    -------------------------
decode/fixed/65536x4                      1.02    261.6±1.35µs    29.9 GB/sec     1.00    256.2±4.78µs    30.5 GB/sec
decode/fixed/65536x8                      1.03    565.7±7.02µs    27.6 GB/sec     1.00    548.1±7.10µs    28.5 GB/sec
decode/fixed/8192x4                       1.03     28.8±0.35µs    34.0 GB/sec     1.00     27.8±0.33µs    35.1 GB/sec
decode/fixed/8192x8                       1.03     63.4±0.51µs    30.8 GB/sec     1.00     61.7±1.27µs    31.7 GB/sec
decode/nested/65536x4                     1.00      2.9±0.68ms     6.7 GB/sec     1.00      2.9±0.67ms     6.7 GB/sec
decode/nested/65536x8                     1.02      6.1±1.33ms     6.4 GB/sec     1.00      6.0±1.32ms     6.5 GB/sec
decode/nested/8192x4                      1.02   352.2±83.32µs     6.9 GB/sec     1.00   346.6±82.42µs     7.1 GB/sec
decode/nested/8192x8                      1.02  721.5±166.94µs     6.8 GB/sec     1.00  709.6±164.32µs     6.9 GB/sec
decode/variable/65536x4                   1.06      5.3±0.74ms     6.6 GB/sec     1.00      5.0±0.75ms     7.0 GB/sec
decode/variable/65536x8                   1.03     11.3±1.39ms     6.2 GB/sec     1.00     10.9±1.40ms     6.4 GB/sec
decode/variable/8192x4                    1.00   586.4±90.57µs     7.5 GB/sec     1.00   586.4±79.42µs     7.5 GB/sec
decode/variable/8192x8                    1.01  1194.4±179.68µs     7.4 GB/sec    1.00  1186.6±180.64µs     7.4 GB/sec
decode_stream/dict/65536x4x4              1.01  765.7±121.90µs     5.1 GB/sec     1.00  757.6±120.14µs     5.2 GB/sec
decode_stream/dict/65536x8x4              1.00  1600.2±247.50µs     4.9 GB/sec    1.01  1620.7±277.02µs     4.8 GB/sec
decode_stream/dict/8192x4x4               1.00    100.0±1.64µs     5.1 GB/sec     1.04    104.3±7.26µs     4.9 GB/sec
decode_stream/dict/8192x8x4               1.01    205.9±2.49µs     5.0 GB/sec     1.00    203.7±1.50µs     5.0 GB/sec
decode_stream/fixed/65536x4x4             1.04    264.1±1.12µs    29.6 GB/sec     1.00    253.2±1.55µs    30.9 GB/sec
decode_stream/fixed/65536x8x4             1.04    564.8±6.13µs    27.7 GB/sec     1.00    543.8±3.83µs    28.7 GB/sec
decode_stream/fixed/8192x4x4              1.01     28.4±0.34µs    34.5 GB/sec     1.00     28.2±0.09µs    34.7 GB/sec
decode_stream/fixed/8192x8x4              1.00     61.2±0.36µs    31.9 GB/sec     1.05     64.1±1.70µs    30.5 GB/sec
decode_stream/nested/65536x4x4            1.05      3.0±0.67ms     6.6 GB/sec     1.00      2.8±0.66ms     6.9 GB/sec
decode_stream/nested/65536x8x4            1.02      6.1±1.32ms     6.4 GB/sec     1.00      6.0±1.34ms     6.5 GB/sec
decode_stream/nested/8192x4x4             1.01   352.1±82.81µs     6.9 GB/sec     1.00   347.8±82.70µs     7.0 GB/sec
decode_stream/nested/8192x8x4             1.02  721.8±166.67µs     6.8 GB/sec     1.00  708.4±164.78µs     6.9 GB/sec
decode_stream/variable/65536x4x4          2.09     10.6±0.74ms     3.3 GB/sec     1.00      5.1±0.72ms     6.9 GB/sec
decode_stream/variable/65536x8x4          1.69     19.2±1.44ms     3.7 GB/sec     1.00     11.3±1.97ms     6.2 GB/sec
decode_stream/variable/8192x4x4           1.00   588.9±89.12µs     7.5 GB/sec     1.04   611.6±73.96µs     7.2 GB/sec
decode_stream/variable/8192x8x4           1.01  1212.4±176.58µs     7.3 GB/sec    1.00  1199.2±167.28µs     7.3 GB/sec
do_put_dictionary/dict/hydrate/65536x4    1.00   1211.4±3.86µs   830.1 MB/sec     1.03   1245.2±3.88µs   807.6 MB/sec
do_put_dictionary/dict/hydrate/65536x8    1.00      2.5±0.01ms   819.5 MB/sec     1.02      2.5±0.01ms   800.8 MB/sec
do_put_dictionary/dict/hydrate/8192x4     1.00    170.0±0.72µs   768.5 MB/sec     1.04    177.2±0.76µs   737.2 MB/sec
do_put_dictionary/dict/hydrate/8192x8     1.00    330.5±0.90µs   790.7 MB/sec     1.02    337.9±0.96µs   773.2 MB/sec
do_put_dictionary/dict/resend/65536x4     1.00    237.6±0.60µs     4.1 GB/sec     1.04    246.0±0.89µs     4.0 GB/sec
do_put_dictionary/dict/resend/65536x8     1.00    443.0±1.38µs     4.4 GB/sec     1.03    454.8±1.04µs     4.3 GB/sec
do_put_dictionary/dict/resend/8192x4      1.00     41.1±0.18µs     3.1 GB/sec     1.00     41.3±0.23µs     3.1 GB/sec
do_put_dictionary/dict/resend/8192x8      1.01     69.8±0.39µs     3.7 GB/sec     1.00     69.4±0.72µs     3.7 GB/sec
encode/fixed/65536x4                      1.00     50.1±0.26µs    39.0 GB/sec     10.10   506.2±1.42µs     3.9 GB/sec
encode/fixed/65536x8                      1.01   1062.1±2.37µs     3.7 GB/sec     1.00   1056.1±2.30µs     3.7 GB/sec
encode/fixed/8192x4                       1.00      8.3±0.01µs    29.4 GB/sec     1.01      8.4±0.03µs    29.0 GB/sec
encode/fixed/8192x8                       1.06     17.1±0.05µs    28.6 GB/sec     1.00     16.1±0.03µs    30.4 GB/sec
encode/nested/65536x4                     1.00   1411.3±1.73µs     3.5 GB/sec     1.00   1408.3±2.64µs     3.5 GB/sec
encode/nested/65536x8                     1.01      2.9±0.01ms     3.4 GB/sec     1.00      2.9±0.01ms     3.4 GB/sec
encode/nested/8192x4                      1.04     21.8±0.05µs    28.0 GB/sec     1.00     21.0±0.04µs    29.0 GB/sec
encode/nested/8192x8                      1.00     46.4±0.13µs    26.3 GB/sec     1.03     47.7±0.07µs    25.6 GB/sec
encode/variable/65536x4                   1.05      2.4±0.01ms     3.7 GB/sec     1.00      2.2±0.00ms     3.9 GB/sec
encode/variable/65536x8                   1.10      5.1±0.06ms     3.4 GB/sec     1.00      4.6±0.02ms     3.8 GB/sec
encode/variable/8192x4                    1.00     25.9±0.07µs    42.5 GB/sec     1.20     31.1±0.07µs    35.4 GB/sec
encode/variable/8192x8                    1.07     82.3±0.24µs    26.7 GB/sec     1.00     77.2±0.15µs    28.5 GB/sec
roundtrip/fixed/65536x4                   1.02  1192.4±26.22µs  1677.7 MB/sec     1.00  1171.3±11.53µs  1707.9 MB/sec
roundtrip/fixed/65536x8                   1.00      2.2±0.02ms  1850.3 MB/sec     1.02      2.2±0.05ms  1808.6 MB/sec
roundtrip/fixed/8192x4                    1.00    194.6±2.80µs  1286.6 MB/sec     1.01    197.0±2.40µs  1270.8 MB/sec
roundtrip/fixed/8192x8                    1.00    332.6±4.64µs  1505.7 MB/sec     1.03    341.2±3.15µs  1467.6 MB/sec
roundtrip/nested/65536x4                  1.00      4.0±0.13ms  1236.1 MB/sec     1.01      4.1±0.11ms  1226.7 MB/sec
roundtrip/nested/65536x8                  1.00      8.3±0.30ms  1203.4 MB/sec     1.04      8.6±0.36ms  1157.7 MB/sec
roundtrip/nested/8192x4                   1.00   458.4±21.60µs  1365.1 MB/sec     1.02   468.8±21.38µs  1334.8 MB/sec
roundtrip/nested/8192x8                   1.00   910.0±45.30µs  1375.4 MB/sec     1.01   921.3±42.46µs  1358.5 MB/sec
roundtrip/variable/65536x4                1.00      7.4±0.34ms  1219.1 MB/sec     1.06      7.8±0.23ms  1148.9 MB/sec
roundtrip/variable/65536x8                1.03     14.4±0.43ms  1251.4 MB/sec     1.00     14.0±0.36ms  1288.2 MB/sec
roundtrip/variable/8192x4                 1.00   668.6±22.53µs  1683.7 MB/sec     1.03   688.1±23.93µs  1635.9 MB/sec
roundtrip/variable/8192x8                 1.00  1219.2±28.36µs  1846.7 MB/sec     1.01  1231.4±25.10µs  1828.4 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 640.1s
Peak memory 183.1 MiB
Avg memory 72.2 MiB
CPU user 602.2s
CPU sys 91.7s
Peak spill 0 B

branch

Metric Value
Wall time 615.1s
Peak memory 233.6 MiB
Avg memory 115.0 MiB
CPU user 576.8s
CPU sys 91.0s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                     main                                    rich-T-kid_re-use-buffers
-----                                     ----                                    -------------------------
decode/fixed/65536x4                      1.00    256.4±5.47µs    30.5 GB/sec     1.00    255.6±2.65µs    30.6 GB/sec
decode/fixed/65536x8                      1.03  566.1±153.49µs    27.6 GB/sec     1.00   547.5±10.75µs    28.5 GB/sec
decode/fixed/8192x4                       1.00     27.8±0.39µs    35.2 GB/sec     1.00     27.7±0.26µs    35.3 GB/sec
decode/fixed/8192x8                       1.00     61.9±0.49µs    31.6 GB/sec     1.00     61.9±1.41µs    31.6 GB/sec
decode/nested/65536x4                     1.11      3.2±0.68ms     6.2 GB/sec     1.00      2.9±0.67ms     6.9 GB/sec
decode/nested/65536x8                     1.00     14.5±1.32ms     2.7 GB/sec     1.17     16.9±1.44ms     2.3 GB/sec
decode/nested/8192x4                      1.01   351.0±82.99µs     7.0 GB/sec     1.00   347.6±82.36µs     7.0 GB/sec
decode/nested/8192x8                      1.02  719.4±167.02µs     6.8 GB/sec     1.00  708.3±165.14µs     6.9 GB/sec
decode/variable/65536x4                   1.16      6.0±0.77ms     5.9 GB/sec     1.00      5.2±0.75ms     6.8 GB/sec
decode/variable/65536x8                   1.00     18.5±1.42ms     3.8 GB/sec     1.14     21.1±1.49ms     3.3 GB/sec
decode/variable/8192x4                    1.00   580.1±90.00µs     7.6 GB/sec     1.00   578.1±91.39µs     7.6 GB/sec
decode/variable/8192x8                    1.04  1233.8±150.93µs     7.1 GB/sec    1.00  1182.1±178.77µs     7.4 GB/sec
decode_stream/dict/65536x4x4              1.01  764.8±114.63µs     5.1 GB/sec     1.00  754.2±114.08µs     5.2 GB/sec
decode_stream/dict/65536x8x4              1.07  1654.0±301.61µs     4.7 GB/sec    1.00  1547.9±194.79µs     5.1 GB/sec
decode_stream/dict/8192x4x4               1.00    101.2±2.80µs     5.0 GB/sec     1.01    102.6±5.56µs     5.0 GB/sec
decode_stream/dict/8192x8x4               1.01   211.4±12.22µs     4.8 GB/sec     1.00    209.5±7.47µs     4.9 GB/sec
decode_stream/fixed/65536x4x4             1.01    257.3±2.05µs    30.4 GB/sec     1.00    254.8±1.74µs    30.7 GB/sec
decode_stream/fixed/65536x8x4             1.00    549.0±6.22µs    28.5 GB/sec     1.02  558.0±127.04µs    28.0 GB/sec
decode_stream/fixed/8192x4x4              1.00     28.4±0.10µs    34.4 GB/sec     1.06     30.3±0.08µs    32.3 GB/sec
decode_stream/fixed/8192x8x4              1.04     65.6±0.29µs    29.8 GB/sec     1.00     63.0±0.49µs    31.0 GB/sec
decode_stream/nested/65536x4x4            1.07      3.1±0.66ms     6.4 GB/sec     1.00      2.8±0.67ms     6.9 GB/sec
decode_stream/nested/65536x8x4            2.35     14.8±1.45ms     2.6 GB/sec     1.00      6.3±1.35ms     6.2 GB/sec
decode_stream/nested/8192x4x4             1.01   350.6±83.19µs     7.0 GB/sec     1.00   348.2±82.59µs     7.0 GB/sec
decode_stream/nested/8192x8x4             1.02  719.2±165.85µs     6.8 GB/sec     1.00  708.3±165.20µs     6.9 GB/sec
decode_stream/variable/65536x4x4          1.01      5.2±0.74ms     6.8 GB/sec     1.00      5.1±0.78ms     6.9 GB/sec
decode_stream/variable/65536x8x4          1.00     18.6±1.47ms     3.8 GB/sec     1.15     21.4±1.44ms     3.3 GB/sec
decode_stream/variable/8192x4x4           1.01   581.6±90.72µs     7.6 GB/sec     1.00   577.7±91.18µs     7.6 GB/sec
decode_stream/variable/8192x8x4           1.00  1190.8±178.30µs     7.4 GB/sec    1.00  1190.3±174.10µs     7.4 GB/sec
do_put_dictionary/dict/hydrate/65536x4    1.01   1229.7±7.69µs   817.8 MB/sec     1.00  1216.1±40.23µs   826.9 MB/sec
do_put_dictionary/dict/hydrate/65536x8    1.02      2.6±0.04ms   787.2 MB/sec     1.00      2.5±0.06ms   799.7 MB/sec
do_put_dictionary/dict/hydrate/8192x4     1.01    172.5±0.55µs   757.2 MB/sec     1.00    170.3±0.91µs   767.1 MB/sec
do_put_dictionary/dict/hydrate/8192x8     1.01    334.0±0.90µs   782.3 MB/sec     1.00    331.3±1.57µs   788.8 MB/sec
do_put_dictionary/dict/resend/65536x4     1.02    241.5±0.47µs     4.1 GB/sec     1.00    237.9±0.53µs     4.1 GB/sec
do_put_dictionary/dict/resend/65536x8     1.01    456.5±2.06µs     4.3 GB/sec     1.00    450.7±2.93µs     4.4 GB/sec
do_put_dictionary/dict/resend/8192x4      1.02     41.4±0.19µs     3.1 GB/sec     1.00     40.7±0.26µs     3.1 GB/sec
do_put_dictionary/dict/resend/8192x8      1.02     70.2±0.45µs     3.6 GB/sec     1.00     68.7±0.47µs     3.7 GB/sec
encode/fixed/65536x4                      1.00     49.9±0.13µs    39.1 GB/sec     10.23   510.9±1.77µs     3.8 GB/sec
encode/fixed/65536x8                      1.01   1070.6±2.02µs     3.6 GB/sec     1.00   1064.3±3.28µs     3.7 GB/sec
encode/fixed/8192x4                       1.00      8.3±0.03µs    29.4 GB/sec     1.01      8.4±0.02µs    29.2 GB/sec
encode/fixed/8192x8                       1.00     15.9±0.06µs    30.8 GB/sec     1.08     17.1±0.03µs    28.6 GB/sec
encode/nested/65536x4                     1.02   1455.1±6.46µs     3.4 GB/sec     1.00   1424.7±3.92µs     3.4 GB/sec
encode/nested/65536x8                     1.00      3.0±0.03ms     3.3 GB/sec     1.00      2.9±0.03ms     3.3 GB/sec
encode/nested/8192x4                      1.00     20.9±0.03µs    29.3 GB/sec     1.05     21.8±0.03µs    28.0 GB/sec
encode/nested/8192x8                      1.00     47.2±0.11µs    25.9 GB/sec     1.02     48.2±0.11µs    25.4 GB/sec
encode/variable/65536x4                   1.04      2.4±0.02ms     3.7 GB/sec     1.00      2.3±0.01ms     3.9 GB/sec
encode/variable/65536x8                   1.09      5.1±0.12ms     3.4 GB/sec     1.00      4.7±0.07ms     3.7 GB/sec
encode/variable/8192x4                    1.02     32.3±0.08µs    34.1 GB/sec     1.00     31.7±0.09µs    34.7 GB/sec
encode/variable/8192x8                    1.01     77.9±0.12µs    28.2 GB/sec     1.00     76.9±0.15µs    28.6 GB/sec
roundtrip/fixed/65536x4                   1.02  1217.5±22.57µs  1643.0 MB/sec     1.00  1191.0±18.32µs  1679.6 MB/sec
roundtrip/fixed/65536x8                   1.00      2.2±0.02ms  1820.5 MB/sec     1.01      2.2±0.04ms  1806.1 MB/sec
roundtrip/fixed/8192x4                    1.03    202.5±1.75µs  1236.4 MB/sec     1.00    197.4±1.78µs  1268.4 MB/sec
roundtrip/fixed/8192x8                    1.01    344.4±3.43µs  1453.8 MB/sec     1.00    339.4±4.31µs  1475.3 MB/sec
roundtrip/nested/65536x4                  1.03      4.2±0.12ms  1178.4 MB/sec     1.00      4.1±0.11ms  1213.2 MB/sec
roundtrip/nested/65536x8                  1.03      8.8±0.32ms  1133.0 MB/sec     1.00      8.5±0.31ms  1171.2 MB/sec
roundtrip/nested/8192x4                   1.00   473.8±21.64µs  1320.8 MB/sec     1.00   475.2±20.79µs  1317.0 MB/sec
roundtrip/nested/8192x8                   1.00   926.9±39.10µs  1350.3 MB/sec     1.00   922.4±44.02µs  1356.8 MB/sec
roundtrip/variable/65536x4                1.03      8.0±0.34ms  1120.5 MB/sec     1.00      7.8±0.30ms  1150.4 MB/sec
roundtrip/variable/65536x8                1.00     14.7±0.48ms  1226.9 MB/sec     1.06     15.6±0.40ms  1157.5 MB/sec
roundtrip/variable/8192x4                 1.01   704.6±23.45µs  1597.6 MB/sec     1.00   697.4±21.91µs  1614.1 MB/sec
roundtrip/variable/8192x8                 1.02  1244.3±26.76µs  1809.3 MB/sec     1.00  1220.8±36.04µs  1844.2 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 635.1s
Peak memory 165.0 MiB
Avg memory 77.2 MiB
CPU user 584.0s
CPU sys 101.2s
Peak spill 0 B

branch

Metric Value
Wall time 625.1s
Peak memory 176.2 MiB
Avg memory 56.6 MiB
CPU user 567.6s
CPU sys 107.0s
Peak spill 0 B

File an issue against this benchmark runner

@github-actions github-actions Bot added the arrow-flight Changes to the arrow-flight crate label Jul 2, 2026
@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/re-use-buffers branch from 81783c4 to a44192f Compare July 2, 2026 14:46
@Jefffrey

Jefffrey commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

run benchmark flight

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4867110706-811-clr2r 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing rich-T-kid/re-use-buffers (a44192f) to 8c7df18 (merge-base) diff
BENCH_NAME=flight
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench flight
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                     main                                    rich-T-kid_re-use-buffers
-----                                     ----                                    -------------------------
decode/fixed/65536x4                      1.00    256.0±5.01µs    30.5 GB/sec     1.02    262.2±4.39µs    29.8 GB/sec
decode/fixed/65536x8                      1.00  564.0±136.80µs    27.7 GB/sec     1.01   571.1±13.58µs    27.4 GB/sec
decode/fixed/8192x4                       1.00     27.5±0.23µs    35.6 GB/sec     1.03     28.3±0.15µs    34.6 GB/sec
decode/fixed/8192x8                       1.00     63.4±0.44µs    30.8 GB/sec     1.02     64.8±0.48µs    30.2 GB/sec
decode/nested/65536x4                     1.00      2.9±0.67ms     6.8 GB/sec     1.16      3.3±0.66ms     5.9 GB/sec
decode/nested/65536x8                     1.00     16.9±1.41ms     2.3 GB/sec     1.04     17.5±1.36ms     2.2 GB/sec
decode/nested/8192x4                      1.00   347.1±82.97µs     7.0 GB/sec     1.01   351.3±82.37µs     7.0 GB/sec
decode/nested/8192x8                      1.00  712.5±167.11µs     6.9 GB/sec     1.01  720.5±161.04µs     6.8 GB/sec
decode/variable/65536x4                   1.11      5.7±0.50ms     6.2 GB/sec     1.00      5.1±0.76ms     6.9 GB/sec
decode/variable/65536x8                   1.87     20.8±1.49ms     3.4 GB/sec     1.00     11.1±1.39ms     6.3 GB/sec
decode/variable/8192x4                    1.00   581.4±86.87µs     7.6 GB/sec     1.01   584.8±89.69µs     7.5 GB/sec
decode/variable/8192x8                    1.00  1182.2±181.32µs     7.4 GB/sec    1.01  1190.8±181.76µs     7.4 GB/sec
decode_stream/dict/65536x4x4              1.04  808.4±156.48µs     4.9 GB/sec     1.00  780.8±127.64µs     5.0 GB/sec
decode_stream/dict/65536x8x4              1.00  1546.2±204.55µs     5.1 GB/sec    1.07  1653.1±284.72µs     4.8 GB/sec
decode_stream/dict/8192x4x4               1.00    101.0±5.60µs     5.1 GB/sec     1.01    101.7±1.81µs     5.0 GB/sec
decode_stream/dict/8192x8x4               1.00    206.2±6.28µs     5.0 GB/sec     1.01    207.9±3.38µs     4.9 GB/sec
decode_stream/fixed/65536x4x4             1.00    254.9±2.27µs    30.7 GB/sec     1.04    264.6±2.69µs    29.5 GB/sec
decode_stream/fixed/65536x8x4             1.00    548.3±6.94µs    28.5 GB/sec     1.07   585.2±72.02µs    26.7 GB/sec
decode_stream/fixed/8192x4x4              1.00     28.2±0.33µs    34.6 GB/sec     1.01     28.5±0.14µs    34.3 GB/sec
decode_stream/fixed/8192x8x4              1.00     62.0±0.76µs    31.5 GB/sec     1.08     66.9±0.60µs    29.2 GB/sec
decode_stream/nested/65536x4x4            1.00      2.9±0.69ms     6.7 GB/sec     1.00      2.9±0.66ms     6.7 GB/sec
decode_stream/nested/65536x8x4            2.06     12.9±1.39ms     3.0 GB/sec     1.00      6.3±1.34ms     6.2 GB/sec
decode_stream/nested/8192x4x4             1.00   351.0±83.85µs     7.0 GB/sec     1.00   351.2±82.66µs     7.0 GB/sec
decode_stream/nested/8192x8x4             1.00  716.3±166.49µs     6.8 GB/sec     1.00  715.0±164.39µs     6.8 GB/sec
decode_stream/variable/65536x4x4          1.00      5.2±0.78ms     6.8 GB/sec     1.05      5.5±0.79ms     6.4 GB/sec
decode_stream/variable/65536x8x4          1.00     12.0±2.87ms     5.9 GB/sec     1.79     21.5±1.40ms     3.3 GB/sec
decode_stream/variable/8192x4x4           1.00   577.9±89.61µs     7.6 GB/sec     1.01   583.5±90.30µs     7.5 GB/sec
decode_stream/variable/8192x8x4           1.00  1190.0±184.86µs     7.4 GB/sec    1.03  1227.6±174.79µs     7.2 GB/sec
do_put_dictionary/dict/hydrate/65536x4    1.00  1226.4±10.90µs   820.0 MB/sec     1.01  1236.5±24.72µs   813.3 MB/sec
do_put_dictionary/dict/hydrate/65536x8    1.00      2.5±0.03ms   795.9 MB/sec     1.02      2.6±0.02ms   778.9 MB/sec
do_put_dictionary/dict/hydrate/8192x4     1.01    171.6±0.68µs   761.3 MB/sec     1.00    169.3±0.89µs   771.5 MB/sec
do_put_dictionary/dict/hydrate/8192x8     1.01    332.1±1.05µs   786.8 MB/sec     1.00    329.3±1.65µs   793.4 MB/sec
do_put_dictionary/dict/resend/65536x4     1.01    239.6±0.66µs     4.1 GB/sec     1.00    238.0±0.56µs     4.1 GB/sec
do_put_dictionary/dict/resend/65536x8     1.00    449.1±1.97µs     4.4 GB/sec     1.00    447.4±1.22µs     4.4 GB/sec
do_put_dictionary/dict/resend/8192x4      1.00     40.5±0.33µs     3.2 GB/sec     1.00     40.4±0.15µs     3.2 GB/sec
do_put_dictionary/dict/resend/8192x8      1.00     68.0±0.34µs     3.8 GB/sec     1.00     67.7±0.20µs     3.8 GB/sec
encode/fixed/65536x4                      1.01     49.7±0.19µs    39.3 GB/sec     1.00     49.3±0.36µs    39.6 GB/sec
encode/fixed/65536x8                      1.00   1068.3±2.87µs     3.7 GB/sec     1.04   1111.4±1.86µs     3.5 GB/sec
encode/fixed/8192x4                       1.02      8.4±0.05µs    29.3 GB/sec     1.00      8.2±0.02µs    30.0 GB/sec
encode/fixed/8192x8                       1.00     15.8±0.04µs    30.9 GB/sec     1.04     16.4±0.03µs    29.8 GB/sec
encode/nested/65536x4                     1.00  1484.8±27.36µs     3.3 GB/sec     1.00   1479.6±4.03µs     3.3 GB/sec
encode/nested/65536x8                     1.00      3.0±0.07ms     3.3 GB/sec     1.03      3.1±0.02ms     3.2 GB/sec
encode/nested/8192x4                      1.00     20.4±0.03µs    30.0 GB/sec     1.01     20.7±0.05µs    29.5 GB/sec
encode/nested/8192x8                      1.01     46.6±0.14µs    26.2 GB/sec     1.00     46.4±0.13µs    26.4 GB/sec
encode/variable/65536x4                   1.01      2.4±0.02ms     3.6 GB/sec     1.00      2.4±0.01ms     3.7 GB/sec
encode/variable/65536x8                   1.05      5.3±0.12ms     3.3 GB/sec     1.00      5.1±0.05ms     3.4 GB/sec
encode/variable/8192x4                    1.00     25.8±0.15µs    42.6 GB/sec     1.01     25.9±0.07µs    42.4 GB/sec
encode/variable/8192x8                    1.00     83.0±0.17µs    26.5 GB/sec     1.01     83.8±0.22µs    26.2 GB/sec
roundtrip/fixed/65536x4                   1.00  1211.7±24.37µs  1650.8 MB/sec     1.02  1236.9±19.78µs  1617.3 MB/sec
roundtrip/fixed/65536x8                   1.00      2.2±0.04ms  1822.1 MB/sec     1.03      2.3±0.02ms  1773.0 MB/sec
roundtrip/fixed/8192x4                    1.00    199.9±2.43µs  1252.3 MB/sec     1.00    200.4±3.44µs  1249.5 MB/sec
roundtrip/fixed/8192x8                    1.02    350.4±6.32µs  1429.1 MB/sec     1.00    343.5±4.71µs  1457.8 MB/sec
roundtrip/nested/65536x4                  1.03      4.3±0.11ms  1150.6 MB/sec     1.00      4.2±0.17ms  1181.1 MB/sec
roundtrip/nested/65536x8                  1.01      8.9±0.32ms  1126.8 MB/sec     1.00      8.8±0.38ms  1134.4 MB/sec
roundtrip/nested/8192x4                   1.01   471.6±20.75µs  1327.1 MB/sec     1.00   466.0±22.16µs  1342.8 MB/sec
roundtrip/nested/8192x8                   1.01   938.2±41.36µs  1334.1 MB/sec     1.00   931.2±47.16µs  1344.1 MB/sec
roundtrip/variable/65536x4                1.00      7.9±0.30ms  1143.6 MB/sec     1.02      8.0±0.51ms  1123.5 MB/sec
roundtrip/variable/65536x8                1.02     15.2±0.49ms  1185.9 MB/sec     1.00     14.9±0.42ms  1205.3 MB/sec
roundtrip/variable/8192x4                 1.02   710.7±29.82µs  1584.0 MB/sec     1.00   696.1±35.43µs  1617.1 MB/sec
roundtrip/variable/8192x8                 1.01  1253.4±21.69µs  1796.2 MB/sec     1.00  1238.0±28.34µs  1818.5 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 615.1s
Peak memory 175.4 MiB
Avg memory 75.1 MiB
CPU user 573.6s
CPU sys 95.7s
Peak spill 0 B

branch

Metric Value
Wall time 620.1s
Peak memory 166.9 MiB
Avg memory 63.9 MiB
CPU user 581.9s
CPU sys 92.0s
Peak spill 0 B

File an issue against this benchmark runner

@Jefffrey

Jefffrey commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

run benchmark flight
env:
BENCH_FILTER: encode

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4867384401-813-fzfjx 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing rich-T-kid/re-use-buffers (a44192f) to 8c7df18 (merge-base) diff
BENCH_NAME=flight
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench flight
BENCH_FILTER=encode
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                      main                                   rich-T-kid_re-use-buffers
-----                      ----                                   -------------------------
encode/fixed/65536x4       1.00     49.1±0.20µs    39.8 GB/sec    1.03     50.7±0.20µs    38.5 GB/sec
encode/fixed/65536x8       1.00   1114.3±2.93µs     3.5 GB/sec    1.04   1159.3±2.10µs     3.4 GB/sec
encode/fixed/8192x4        1.00      8.3±0.02µs    29.5 GB/sec    1.11      9.2±0.02µs    26.7 GB/sec
encode/fixed/8192x8        1.00     16.2±0.02µs    30.1 GB/sec    1.06     17.3±0.05µs    28.3 GB/sec
encode/nested/65536x4      1.00   1490.3±4.22µs     3.3 GB/sec    1.05   1562.3±3.98µs     3.1 GB/sec
encode/nested/65536x8      1.00      3.1±0.03ms     3.1 GB/sec    1.03      3.2±0.04ms     3.0 GB/sec
encode/nested/8192x4       1.00     19.9±0.07µs    30.7 GB/sec    1.05     20.8±0.05µs    29.3 GB/sec
encode/nested/8192x8       1.01     46.1±0.10µs    26.5 GB/sec    1.00     45.8±0.11µs    26.7 GB/sec
encode/variable/65536x4    1.06      2.6±0.02ms     3.4 GB/sec    1.00      2.5±0.01ms     3.5 GB/sec
encode/variable/65536x8    1.02      5.4±0.04ms     3.2 GB/sec    1.00      5.3±0.09ms     3.3 GB/sec
encode/variable/8192x4     1.00     25.5±0.05µs    43.1 GB/sec    1.01     25.8±0.05µs    42.6 GB/sec
encode/variable/8192x8     1.03     81.8±0.14µs    26.9 GB/sec    1.00     79.1±0.20µs    27.8 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 125.0s
Peak memory 41.4 MiB
Avg memory 14.1 MiB
CPU user 75.7s
CPU sys 42.1s
Peak spill 0 B

branch

Metric Value
Wall time 125.0s
Peak memory 43.4 MiB
Avg memory 14.7 MiB
CPU user 77.4s
CPU sys 43.7s
Peak spill 0 B

File an issue against this benchmark runner

@Rich-T-kid

Rich-T-kid commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

with GRPC_TARGET_MAX_FLIGHT_SIZE_BYTES set to 2 MB

┌──────────────────┬────────────┬────────┬────────────┬─────────────────────┐
│    Benchmark     │ Total size │ Chunks │ Chunk size │ Optimization fires? │
├──────────────────┼────────────┼────────┼────────────┼─────────────────────┤
│ fixed/65536x4    │ 2.0MB      │ 1      │ 2048KB     │ No                  │
├──────────────────┼────────────┼────────┼────────────┼─────────────────────┤
│ fixed/65536x8    │ 4.0MB      │ 2      │ 2048KB     │ Yes (chunk 1)       │
├──────────────────┼────────────┼────────┼────────────┼─────────────────────┤
│ fixed/8192x4     │ 0.25MB     │ 1      │ 256KB      │ No                  │
├──────────────────┼────────────┼────────┼────────────┼─────────────────────┤
│ fixed/8192x8     │ 0.5MB      │ 1      │ 512KB      │ No                  │
├──────────────────┼────────────┼────────┼────────────┼─────────────────────┤
│ nested/65536x4   │ 5.0MB      │ 3      │ ~1707KB    │ Yes (chunks 2-3)    │
├──────────────────┼────────────┼────────┼────────────┼─────────────────────┤
│ nested/65536x8   │ 10.0MB     │ 6      │ ~1707KB    │ Yes (chunks 2-6)    │
├──────────────────┼────────────┼────────┼────────────┼─────────────────────┤
│ nested/8192x4    │ 0.625MB    │ 1      │ 640KB      │ No                  │
├──────────────────┼────────────┼────────┼────────────┼─────────────────────┤
│ nested/8192x8    │ 1.25MB     │ 1      │ 1280KB     │ No                  │
├──────────────────┼────────────┼────────┼────────────┼─────────────────────┤
│ variable/65536x4 │ ~8.1MB     │ 5      │ ~1655KB    │ Yes (chunks 2-5)    │
├──────────────────┼────────────┼────────┼────────────┼─────────────────────┤
│ variable/65536x8 │ ~16.2MB    │ 9      │ ~1839KB    │ Yes (chunks 2-9)    │
├──────────────────┼────────────┼────────┼────────────┼─────────────────────┤
│ variable/8192x4  │ ~0.98MB    │ 1      │ ~1004KB    │ No                  │
├──────────────────┼────────────┼────────┼────────────┼─────────────────────┤
│ variable/8192x8  │ ~1.96MB    │ 1      │ ~2007KB    │ No                  │
└──────────────────┴────────────┴────────┴────────────┴─────────────────────┘

🤔 benchmarks aren't as strong as id expect but the amount of data being worked with isn't huge. Seems to be mostly noise due to memcpy'ing buffers into the vectors overshadowing possible re-alloc cost, but in theory we are saving copies along the way.
@Jefffrey any thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize arrow-flight

4 participants