io: use poll() instead of select() to avoid an FD_SETSIZE hang (fixes #231)#1018
Open
dr-who wants to merge 1 commit into
Open
io: use poll() instead of select() to avoid an FD_SETSIZE hang (fixes #231)#1018dr-who wants to merge 1 commit into
dr-who wants to merge 1 commit into
Conversation
…syncProject#231) rsync's I/O loops (safe_read, safe_write, and the main perform_io multiplexer) waited for readiness with select() and fd_set bitmaps. An fd_set can only represent descriptors below FD_SETSIZE (1024 with glibc). When rsync is started with many descriptors already open -- e.g. inherited from a parent process that leaked fds, a high "ulimit -n", or a busy daemon -- its own socket and pipe fds get allocated at or above 1024. FD_SET() and FD_ISSET() then index past the end of the fixed-size fd_set, which is undefined behavior: select() reports the fd as ready, but FD_ISSET() reads the out-of-bounds bit as 0, so the read or write never happens and rsync spins at 100% CPU forever with no progress. This is the long-standing "rsync hangs at 100% CPU on large systems" report, and it matches the MemorySanitizer use-of-uninitialized-value seen in perform_io. Convert the three loops to poll(), which identifies descriptors by value in a small array and has no FD_SETSIZE ceiling, so a high-numbered fd works fine. rsync only ever waits on a handful of fds (at most three in perform_io: in_fd, out_fd, and the files-from forward fd), so poll() is as fast as -- or faster than -- select() here; the select()-vs-poll() cost gap only appears when watching thousands of descriptors, which rsync never does. The remaining select(0, ...) call is a pure timed sleep with no fds and is unaffected. The conversion is behavior-preserving: the same max_fd bookkeeping decides when there is nothing to wait on, the per-fd readiness checks map to the matching pollfd revents, and the timeout is the same (now expressed in milliseconds). testsuite/highfd-hang_test.py reproduces the hang deterministically by opening enough inheritable dummy fds to push rsync's descriptors past FD_SETSIZE before an ordinary transfer; it hangs (caught by a timeout) on the select() code and passes instantly with poll().
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem (#231)
rsync's I/O loops —
safe_read,safe_write, and the mainperform_iomultiplexer — waited for readiness withselect()andfd_setbitmaps. Anfd_setcan only represent descriptors belowFD_SETSIZE(1024 with glibc).When rsync is started with many descriptors already open — inherited from a parent that leaked fds, a high
ulimit -n, or a busy daemon — its own socket/pipe fds get allocated at or above 1024.FD_SET()/FD_ISSET()then index past the end of the fixed-sizefd_set, which is undefined behavior:select()reports the fd ready, butFD_ISSET()reads the out-of-bounds bit as 0, so the read/write never happens and rsync spins at 100% CPU forever with no progress.This is the long-standing "rsync hangs at 100% CPU" report, and it explains the MemorySanitizer use-of-uninitialized-value in
perform_ioin the issue thread (the OOBFD_ISSETread) and Wayne's own hypothesis there ("maybe iobuf.in is greater than the defaultFD_SETSIZE"). The-vvvcorrelation was a red herring — the message-backlog deadlock is separately mitigated by the dynamiciobuf.msggrowth.Deterministic reproduction
Pre-open ~1100 inheritable fds so rsync's descriptors land above
FD_SETSIZE, then run any transfer:straceshowspselect6(1106, [1105], …) = 1returning "ready" instantly in a tight loop; one process pegged at 100% CPU in stateR; 0 files transferred.Fix
Convert the three loops to
poll(), which identifies descriptors by value in a small array and has noFD_SETSIZEceiling, so a high-numbered fd works fine.perform_io:in_fd,out_fd, the files-from forward fd). The "poll is slower than select" effect only appears when watching thousands of descriptors; at N≤3poll()is equal-to or faster thanselect()(it skips theFD_ZERObitmap clears).epoll/kqueuewould be more code, unportable, and actually slower for such a tiny, changing fd set.max_fdbookkeeping decides when there's nothing to wait on; eachFD_ISSETmaps to the matchingpollfd.revents; the timeout is unchanged (now in milliseconds). The lone remainingselect(0, …)is a pure timed sleep with no fds and is left as-is.Verified: with the fix, transfers complete correctly with 1100 and 5000 pre-opened fds (and the data matches); a normal low-fd transfer is unchanged.
Test
testsuite/highfd-hang_test.pyopens enough inheritable dummy fds to push rsync's descriptors pastFD_SETSIZE, then runs an ordinary transfer withclose_fds=False. The hang is an infinite spin, so the instant-pass / never-finish cross-over is binary rather than a timing race. It hangs (caught by a timeout) on theselect()code and passes instantly withpoll(), and also verifies the transferred files are correct. It skips cleanly ifRLIMIT_NOFILEcan't be raised aboveFD_SETSIZE.Full suite: 107 passed, 6 skipped, 0 failed.