Summary
TestNetAcceptedIsMonotonic in test/observability/observability_connection_metrics_test.h fails intermittently on GitHub Actions shared runners with delta=4.000000 expected>=5.
Failing test
[FAIL] ObsConnMetrics: net.connections.accepted monotonic over N opens
Error: delta=4.000000 expected>=5
Root cause
The test opens 5 TCP connections in a tight loop with immediate close + 20ms inter-connection sleep (line 484-496), then polls for up to 2 seconds for the accepted counter to reach 5.
On slow/contended CI runners, one of the connect+close cycles completes before the accept dispatcher processes the connection — the TCP RST arrives before (or nearly simultaneously with) the accept callback that increments reactor.net.connections.accepted. The counter only reaches 4.
The 20ms inter-connection sleep and 2-second poll deadline are insufficient on GitHub Actions shared runners under load.
Observed environment
Suggested fix options
-
Increase inter-connection sleep to 50-100ms to give the accept dispatcher more time to process each connection before the next one arrives.
-
Send an HTTP request before closing — instead of bare connect+close, send a minimal GET / HTTP/1.1\r\n\r\n and read the response. This ensures the connection was fully accepted and processed before closing, making the counter bump deterministic.
-
Increase poll timeout from 2s to 5s — though this only helps if the counter eventually reaches 5 (it may not if the accept was genuinely lost due to the immediate close).
Option 2 is the most robust since it makes the accept+count sequence deterministic rather than relying on timing.
Related
Same class of CI timing sensitivity as issue #43 (ObsConnMetrics protocol gauge race).
Summary
TestNetAcceptedIsMonotonicintest/observability/observability_connection_metrics_test.hfails intermittently on GitHub Actions shared runners withdelta=4.000000 expected>=5.Failing test
Root cause
The test opens 5 TCP connections in a tight loop with immediate close + 20ms inter-connection sleep (line 484-496), then polls for up to 2 seconds for the accepted counter to reach 5.
On slow/contended CI runners, one of the connect+close cycles completes before the accept dispatcher processes the connection — the TCP RST arrives before (or nearly simultaneously with) the accept callback that increments
reactor.net.connections.accepted. The counter only reaches 4.The 20ms inter-connection sleep and 2-second poll deadline are insufficient on GitHub Actions shared runners under load.
Observed environment
ubuntu-latestshared runnerSuggested fix options
Increase inter-connection sleep to 50-100ms to give the accept dispatcher more time to process each connection before the next one arrives.
Send an HTTP request before closing — instead of bare connect+close, send a minimal
GET / HTTP/1.1\r\n\r\nand read the response. This ensures the connection was fully accepted and processed before closing, making the counter bump deterministic.Increase poll timeout from 2s to 5s — though this only helps if the counter eventually reaches 5 (it may not if the accept was genuinely lost due to the immediate close).
Option 2 is the most robust since it makes the accept+count sequence deterministic rather than relying on timing.
Related
Same class of CI timing sensitivity as issue #43 (ObsConnMetrics protocol gauge race).