fix cancel/accept race in NetAccept that causes EBADF abort#12925
fix cancel/accept race in NetAccept that causes EBADF abort#12925JakeChampion wants to merge 1 commit intoapache:masterfrom
Conversation
27c9a8a to
b259ce1
Compare
`NetAcceptAction::cancel()` closes the server socket before setting the cancelled flag In the window between close and flag set, `net_accept()` can get `EBADF` from `accept4()`, see `cancelled=false`, and dispatch `EVENT_ERROR` to handlers that don't expect it now we check the atomic server pointer in all three accept paths before dispatching `EVENT_ERROR`
b259ce1 to
ad25b64
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes a race condition between NetAcceptAction::cancel() and the net_accept() accept paths that could cause an unhandled EBADF abort. The cancel operation atomically closes the server socket (via server.exchange(nullptr, ...)) before setting the cancelled flag, leaving a window where an accept call can fail with EBADF, see cancelled == false, and spuriously dispatch EVENT_ERROR to a continuation that does not expect it.
Changes:
- Adds an atomic
std::atomic<Server *> servercheck (action_->server.load(std::memory_order_acquire) != nullptr) as the primary guard before dispatchingEVENT_ERRORin all three accept code paths (net_accept,do_blocking_accept,acceptFastEvent). - The
NetAcceptAction::cancel()implementation (inP_NetAccept.h) usesserver.exchange(nullptr, std::memory_order_acq_rel)so the null pointer state serves as a reliable cancellation indicator beforecancelledis set.
| goto Ldone; | ||
| } | ||
| if (na->server.sock.is_ok() && !na->action_->cancelled) { | ||
| if (na->action_->server.load(std::memory_order_acquire) != nullptr && na->server.sock.is_ok() && !na->action_->cancelled) { |
There was a problem hiding this comment.
In net_accept, the updated condition at line 115 adds na->action_->server.load(std::memory_order_acquire) != nullptr as the primary atomic guard, but retains the pre-existing na->server.sock.is_ok() check. This is now redundant: when action_->cancel() runs, it first atomically exchanges server to nullptr, then closes the socket. So if the atomic load returns nullptr, short-circuit evaluation means sock.is_ok() is never reached. If the atomic load returns non-null, the socket is guaranteed to still be open (cancel hasn't run yet).
The other two fixed paths (do_blocking_accept and acceptFastEvent) correctly use only the atomic server pointer check, without an additional sock.is_ok() check, which is the right and consistent approach.
The na->server.sock.is_ok() sub-expression also performs a non-atomic read of the socket's file descriptor, which could be written concurrently by another thread calling cancel(). While in practice this doesn't create a real correctness problem (the atomic check is authoritative), it is an unnecessary TOCTOU discrepancy between the three accept paths.
| if (na->action_->server.load(std::memory_order_acquire) != nullptr && na->server.sock.is_ok() && !na->action_->cancelled) { | |
| if (na->action_->server.load(std::memory_order_acquire) != nullptr && !na->action_->cancelled) { |
NetAcceptAction::cancel()closes the server socket before setting the cancelled flag In the window between close and flag set,net_accept()can getEBADFfromaccept4(), seecancelled=false, and dispatchEVENT_ERRORto handlers that don't expect itnow we check the atomic server pointer in all three accept paths before dispatching
EVENT_ERROR