fileio: coalesce --sparse writes instead of 1-KiB dribbles (fixes #773) by dr-who · Pull Request #1019 · RsyncProject/rsync

dr-who · 2026-06-30T21:44:00Z

Problem (#773)

write_file()'s --sparse path slices each span into SPARSE_WRITE_SIZE (1024-byte) pieces, and write_sparse() issues one write() syscall per slice, bypassing the 256 KB buffering the normal path uses. Copying a large non-sparse file with --sparse therefore costs roughly one write() per kilobyte — about a million write() calls for a 1 GiB file.

The reporter measured a 1 GiB random file:

rsync --sparse → 1.36 MB/s (would take ~12 min)
rsync (no --sparse) → 391 MB/s (~2 s)

i.e. ~280× slower purely from syscall overhead. The 1024-byte chunk is also smaller than a filesystem block, so it can't even create finer holes than a plain copy.

Reproduced here on a 200 MB file: 201,438 write() syscalls (≈ size/1024).

Fix

Rewrite write_sparse() to scan the whole span itself: it finds interior runs of zeros at least SPARSE_WRITE_SIZE long — the same hole granularity rsync has always used — and emits each intervening non-zero region (which may include shorter zero runs not worth a hole) with a single write(). A small flush_sparse_hole() helper defers/flushes holes between segments; do_punch_hole() advances the file offset just like the lseek() path, so the position stays correct.

Hole granularity is unchanged, so sparseness is identical — only the syscall pattern changes.

Improvement

Measured on a 100 MiB random (hole-free) file:

	`write()` syscalls
before	100,730
after	6,125 (~16×)

The count now tracks the data's natural chunking (≤ CHUNK_SIZE tokens) instead of its size in kilobytes. On slow/high-latency storage (the USB/RAID in the report) this is the difference between the 1.36 MB/s and ~full-speed cases.

Verified byte-identical and equally-sparse output across: hole-free, large-hole, small (8 KB) interior-hole (stays 516 K allocated — a naive constant bump would balloon it to 772 K), all-zeros, --inplace (the use_seek path), and --preallocate (the punch_hole path). Full testsuite passes under valgrind with no errors.

Test

testsuite/sparse-write-count_test.py copies a 16 MiB hole-free file under strace and asserts the write() count stays far below the old size/1024 behaviour. It fails on stock master (16,919 writes) and passes with this change (a few hundred); it skips cleanly where strace is unavailable (non-Linux / no ptrace).

…ncProject#773) write_file()'s sparse path sliced each span into SPARSE_WRITE_SIZE (1024-byte) pieces and write_sparse() issued one write() syscall per slice. Copying a large *non-sparse* file with --sparse therefore cost roughly one write() per kilobyte -- about a million write() calls for a 1 GiB file -- which on real storage ran far slower than the same copy without --sparse (the bug report measured 1.36 MB/s vs 391 MB/s, ~280x). The 1024-byte chunk is also smaller than a filesystem block, so it cannot even create finer holes than a plain copy could. Rewrite write_sparse() to scan the whole span itself: it looks for interior runs of zeros that are at least SPARSE_WRITE_SIZE long -- the same hole granularity rsync has always used -- and emits each intervening non-zero region (which may include shorter zero runs not worth a hole) with a single write(). do_punch_hole() advances the file offset just like the lseek() path, so flushing a deferred hole between segments keeps the position correct. The hole granularity is unchanged, so sparseness is identical; only the syscall pattern changes. Measured on a 100 MiB random (hole-free) file: write() syscalls drop from 100,730 to 6,125 (~16x), now tracking the data's natural chunking rather than its size in kilobytes. Verified byte-identical and equally sparse output for hole-free, large-hole, small-interior-hole, all-zero, --inplace, and --preallocate cases. testsuite/sparse-write-count_test.py copies a 16 MiB hole-free file under strace and asserts the write() count stays far below the old size/1024 behaviour (it skips where strace is unavailable).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fileio: coalesce --sparse writes instead of 1-KiB dribbles (fixes #773)#1019

fileio: coalesce --sparse writes instead of 1-KiB dribbles (fixes #773)#1019
dr-who wants to merge 1 commit into
RsyncProject:masterfrom
dr-who:fix-773-sparse-write-coalesce

dr-who commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

dr-who commented Jun 30, 2026

Problem (#773)

Fix

Improvement

Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant