fileio: coalesce --sparse writes instead of 1-KiB dribbles (fixes #773)#1019
Open
dr-who wants to merge 1 commit into
Open
fileio: coalesce --sparse writes instead of 1-KiB dribbles (fixes #773)#1019dr-who wants to merge 1 commit into
dr-who wants to merge 1 commit into
Conversation
…ncProject#773) write_file()'s sparse path sliced each span into SPARSE_WRITE_SIZE (1024-byte) pieces and write_sparse() issued one write() syscall per slice. Copying a large *non-sparse* file with --sparse therefore cost roughly one write() per kilobyte -- about a million write() calls for a 1 GiB file -- which on real storage ran far slower than the same copy without --sparse (the bug report measured 1.36 MB/s vs 391 MB/s, ~280x). The 1024-byte chunk is also smaller than a filesystem block, so it cannot even create finer holes than a plain copy could. Rewrite write_sparse() to scan the whole span itself: it looks for interior runs of zeros that are at least SPARSE_WRITE_SIZE long -- the same hole granularity rsync has always used -- and emits each intervening non-zero region (which may include shorter zero runs not worth a hole) with a single write(). do_punch_hole() advances the file offset just like the lseek() path, so flushing a deferred hole between segments keeps the position correct. The hole granularity is unchanged, so sparseness is identical; only the syscall pattern changes. Measured on a 100 MiB random (hole-free) file: write() syscalls drop from 100,730 to 6,125 (~16x), now tracking the data's natural chunking rather than its size in kilobytes. Verified byte-identical and equally sparse output for hole-free, large-hole, small-interior-hole, all-zero, --inplace, and --preallocate cases. testsuite/sparse-write-count_test.py copies a 16 MiB hole-free file under strace and asserts the write() count stays far below the old size/1024 behaviour (it skips where strace is unavailable).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem (#773)
write_file()'s--sparsepath slices each span intoSPARSE_WRITE_SIZE(1024-byte) pieces, andwrite_sparse()issues onewrite()syscall per slice, bypassing the 256 KB buffering the normal path uses. Copying a large non-sparse file with--sparsetherefore costs roughly onewrite()per kilobyte — about a millionwrite()calls for a 1 GiB file.The reporter measured a 1 GiB random file:
rsync --sparse→ 1.36 MB/s (would take ~12 min)rsync(no--sparse) → 391 MB/s (~2 s)i.e. ~280× slower purely from syscall overhead. The 1024-byte chunk is also smaller than a filesystem block, so it can't even create finer holes than a plain copy.
Reproduced here on a 200 MB file: 201,438
write()syscalls (≈ size/1024).Fix
Rewrite
write_sparse()to scan the whole span itself: it finds interior runs of zeros at leastSPARSE_WRITE_SIZElong — the same hole granularity rsync has always used — and emits each intervening non-zero region (which may include shorter zero runs not worth a hole) with a singlewrite(). A smallflush_sparse_hole()helper defers/flushes holes between segments;do_punch_hole()advances the file offset just like thelseek()path, so the position stays correct.Hole granularity is unchanged, so sparseness is identical — only the syscall pattern changes.
Improvement
Measured on a 100 MiB random (hole-free) file:
write()syscallsThe count now tracks the data's natural chunking (≤
CHUNK_SIZEtokens) instead of its size in kilobytes. On slow/high-latency storage (the USB/RAID in the report) this is the difference between the 1.36 MB/s and ~full-speed cases.Verified byte-identical and equally-sparse output across: hole-free, large-hole, small (8 KB) interior-hole (stays 516 K allocated — a naive constant bump would balloon it to 772 K), all-zeros,
--inplace(theuse_seekpath), and--preallocate(thepunch_holepath). Full testsuite passes under valgrind with no errors.Test
testsuite/sparse-write-count_test.pycopies a 16 MiB hole-free file understraceand asserts thewrite()count stays far below the oldsize/1024behaviour. It fails on stock master (16,919 writes) and passes with this change (a few hundred); it skips cleanly wherestraceis unavailable (non-Linux / no ptrace).