mod_http2: RST_STREAM incomplete HTTP/2 response to avoid hang#676
Draft
mmontalbo wants to merge 1 commit into
Draft
mod_http2: RST_STREAM incomplete HTTP/2 response to avoid hang#676mmontalbo wants to merge 1 commit into
mmontalbo wants to merge 1 commit into
Conversation
When a request handler emits a final response (status, headers, perhaps some body) and then stops before sending EOS, e.g. a CGI that hits the server Timeout mid-body, the HTTP/2 stream was left open: no DATA, no END_STREAM, no RST_STREAM. A client without a stall timeout then waits indefinitely. The same handler over HTTP/1.1 closes the connection and the client fails fast. mod_http2 already intends to reset such a response: stream_data_cb() RST_STREAMs when it pulls APR_EOF from a beam that never delivered an EOS. But reaching that pull depends on c1 re-dispatching stream_data_cb after the output beam goes terminal, and s_c2_done() does not force it: h2_beam_is_complete() reports a plainly closed beam as complete (it keys on beam->closed, not on whether an EOS was seen), so the reset rides on a c1/c2 wakeup that is lost under load. On threaded MPMs (event, worker) this leaves the stream parked. Make the terminal state observable independent of timing. When c2_process() finishes with a final response started but no EOS seen, abort the output beam instead of closing it. An aborted beam returns APR_ECONNABORTED from h2_beam_receive() on the next pull unconditionally (it is checked before the transfer loop), so the existing reset fires regardless of how the wakeup is scheduled. Header-only responses (204/304, and HEAD) are complete without a body EOS, so mark them complete (response_eos_seen) where the response is finalized in h2_c2_filter.c; they must not be aborted, or a body-less 304 from mod_cache revalidation would be RST_STREAM'd again, re-opening PR 69580. Adds two regression tests to test_105_timeout.py: test_h2_105_20 (a CGI that goes silent past Timeout, as concurrent streams, must be reset) and test_h2_105_21 (a mod_cache revalidation 304 must close cleanly). Each fails if its corresponding change is reverted; the unchanged complete-response path is covered by the suite's existing tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
On 2.4.x, when an HTTP/2 request handler emits a final response (status,
headers, maybe a partial body) and then stops without sending EOS, e.g. a
CGI that hits the server
Timeoutmid-body, the stream is left open with noexplicit END_STREAM / RST_STREAM. A client without its own stall timeout waits
indefinitely. The same handler over HTTP/1.1 closes the connection, so the
client fails fast.
Bugzilla: https://bz.apache.org/bugzilla/show_bug.cgi?id=70131
Root cause
mod_http2 already intends to reset such a response:
stream_data_cb()(
h2_stream.c) sends RST_STREAM when it pullsAPR_EOFfrom output that neverdelivered EOS. But reaching that pull depends on c1 re-dispatching
stream_data_cbafter the output beam goes terminal, ands_c2_done()does notforce it:
h2_beam_is_complete()treats a plainly closed beam as complete (itkeys on
beam->closed, not on whether an EOS was seen), so the reset rides on ac1/c2 wakeup that is racy under concurrency. On trunk the worker appears to use the
generic
ap_process_connection()and the truncated response reachess_c2_done()incomplete (so it resets) whereas 2.4.x's bespoke worker closes the beam first, so it
looks complete and never resets.
Fix
Make the terminal state observable independent of timing. When
c2_process()finishes with a final response started but no EOS seen, abort the output
beam instead of closing it. An aborted beam returns
APR_ECONNABORTEDfromh2_beam_receive()on the next pull unconditionally (it is checked before thetransfer loop), so the existing reset fires regardless of how the wakeup is
scheduled.
Header-only responses (204/304, and HEAD) are complete without a body EOS, so
they are marked complete (
response_eos_seen) where the response is finalizedin
h2_c2_filter.c; they must not be aborted, or a body-less 304 from mod_cacherevalidation would be RST_STREAM'd again, re-opening
PR 69580.
Tests
Adds two regression tests to
test_105_timeout.py:test_h2_105_20: a CGI that goes silent pastTimeout, run as concurrentstreams, must be reset.
test_h2_105_21: a mod_cache revalidation 304 must close cleanly (guards thePR 69580 regression).