Skip to content

Flaky CI: ObsConnMetrics net.connections.accepted monotonic test loses 1 of 5 connections on slow runners #45

Description

@mwfj

Summary

TestNetAcceptedIsMonotonic in test/observability/observability_connection_metrics_test.h fails intermittently on GitHub Actions shared runners with delta=4.000000 expected>=5.

Failing test

[FAIL] ObsConnMetrics: net.connections.accepted monotonic over N opens
       Error: delta=4.000000 expected>=5

Root cause

The test opens 5 TCP connections in a tight loop with immediate close + 20ms inter-connection sleep (line 484-496), then polls for up to 2 seconds for the accepted counter to reach 5.

On slow/contended CI runners, one of the connect+close cycles completes before the accept dispatcher processes the connection — the TCP RST arrives before (or nearly simultaneously with) the accept callback that increments reactor.net.connections.accepted. The counter only reaches 4.

The 20ms inter-connection sleep and 2-second poll deadline are insufficient on GitHub Actions shared runners under load.

Observed environment

Suggested fix options

  1. Increase inter-connection sleep to 50-100ms to give the accept dispatcher more time to process each connection before the next one arrives.

  2. Send an HTTP request before closing — instead of bare connect+close, send a minimal GET / HTTP/1.1\r\n\r\n and read the response. This ensures the connection was fully accepted and processed before closing, making the counter bump deterministic.

  3. Increase poll timeout from 2s to 5s — though this only helps if the counter eventually reaches 5 (it may not if the accept was genuinely lost due to the immediate close).

Option 2 is the most robust since it makes the accept+count sequence deterministic rather than relying on timing.

Related

Same class of CI timing sensitivity as issue #43 (ObsConnMetrics protocol gauge race).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions