fix(browser): resolve flaky TestManagedBackendErrorPropagation (CDP send race)#116
Conversation
wsCDP.send selected between the response channel and the connection-closed signal. When a server writes a result/error frame and immediately drops the socket, the read loop buffers the frame into the request's channel *and* closes c.closed, leaving both select cases ready — so Go picked between them at random and send would intermittently report "cdp connection closed during <method>" instead of returning the actual response. This surfaced as a flaky TestManagedBackendErrorPropagation on CI. Recover the buffered response in the c.closed branch before reporting the close (the read loop always delivers or closes every pending channel before signaling c.closed, so a non-blocking receive is decisive). Harden the test to loop the boom-then-close scenario so the race fails reliably instead of ~half the time; verified it reproduces the CI failure pre-fix (GOMAXPROCS=2, count=3000) and passes post-fix. Generated with Jack AI bot
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe wsCDP.send method's close-handling path is modified to first check for an already-available pending response before falling back to the closed-connection error. The corresponding test is updated to write an error frame and immediately close the connection, looping 50 times to deterministically verify error propagation. ChangesCDP send close/error race fix
Estimated code review effort: 3 (Moderate) | ~20 minutes Sequence Diagram(s)sequenceDiagram
participant Caller
participant wsCDPsend as wsCDP.send
participant PendingChannel as ch
participant ClosedChannel as c.closed
Caller->>wsCDPsend: send(command)
ClosedChannel->>wsCDPsend: closed signal ready
wsCDPsend->>PendingChannel: non-blocking read
alt response available
PendingChannel-->>wsCDPsend: response frame or msg.Error
wsCDPsend-->>Caller: return response/error
else no response
wsCDPsend-->>Caller: return connection closed error
end
Related issues: None specified. Related PRs: None specified. Suggested labels: bug, tests Suggested reviewers: None specified. 🐰 A race between "closed" and "boom", 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Problem
TestManagedBackendErrorPropagationininternal/browserfails intermittently on CI:(Seen blocking an unrelated PR — the failure has nothing to do with that PR's diff.)
Root cause
wsCDP.send(cdp.go) waits on aselectover three cases: the per-request response channelch,ctx.Done(), andc.closed. When a server writes a result/error frame and immediately drops the socket, the read loop:ch(ch <- msg), thenclose(c.closed).That leaves both
<-chand<-c.closedready at once. Go'sselectpicks a ready case at random, so ~50% of the timesendreturns"cdp connection closed during <method>"and discards the response that had already arrived. This is both the test flake and a latent correctness bug (a real CDP response racing a socket close could be dropped).CI runs
go test ./...without-race, so it only manifested as the occasional assertion failure, not a race warning.Fix
In the
c.closedbranch, do a non-blocking receive onchfirst and return the buffered response if one is present. The read loop always delivers-or-closes every pending channel before signalingc.closed, so this recovery is decisive — no new races, no busy-waiting.Verification
GOMAXPROCS=2 go test ./internal/browser -run TestManagedBackendErrorPropagation -count=3000→ multipleexpected boom error, got cdp connection closedfailures.-race.Note (out of scope)
Running the full
internal/browserpackage under-racealso surfaces a separate, pre-existing data race inTestBridgeKeepAlivePing(it mutates the package globalskeepAlivePing/keepAliveWaitthat akeepAlive()goroutine reads at bridge.go:178). It exists onmainindependent of this change and CI doesn't run-race, so it's left for a follow-up.Generated with Jack AI bot
Summary by CodeRabbit