fix: keep StreamingDataSource alive on non-Eof stream errors#168
Merged
kinyoklion merged 4 commits intoMay 11, 2026
Merged
Conversation
The fetch loop matched any error other than `Eof` (and recoverable HTTP responses) with a catch-all `_ => break`, dropping the spawned task and leaving the data source silently dysfunctional. Because no one was polling the eventsource-client's stream anymore, its reconnect logic never ran. The `None` branch also broke without notifying the caller, so callers waiting on initialization had no signal that the stream had closed permanently. Replace the catch-all `break` with `continue` so the eventsource-client gets to reconnect on the next poll, and call `init_complete(false)` in the `None` branch so the caller can observe permanent stream closure. Adds `streaming_source_recovers_from_non_eof_stream_error` test -- sends invalid UTF-8 to trigger `Error::InvalidLine` from the parser and asserts the mock receives at least two requests. Pairs with launchdarkly/rust-eventsource-client#134, which fixes the matching state-transition gap on the eventsource-client side. Together they restore the reconnection contract: the eventsource-client owns reconnection, and the SDK keeps polling and trusts it. Closes #116.
The 401 path (and any other status that `is_http_error_recoverable` rejects) should stop the streaming fetch loop and signal init failure to the caller. The catch-all-error change in this PR sits directly next to that branch, so add a behavioral test exercising the 401 path end-to-end through the fetch loop -- existing coverage was only of the predicate function.
Three follow-ups from review of #168: * Routes `Error::Eof` to `debug!` so healthy stream rotations don't spam `warn!` lines on every reconnect. * Rewords the catch-all comment and log: some non-Eof errors converge via the eventsource-client's `StreamClosed` transition (observed as `None` on the next poll), not in-stream reconnect, so "will retry" was misleading. * Strengthens `streaming_source_recovers_from_non_eof_stream_error` with an `Arc<Mutex<Option<bool>>>` capture asserting `init_complete` is *not* called during the retry window. Permanent-failure init is already covered by `streaming_source_stops_on_unrecoverable_http_status`.
keelerm84
approved these changes
May 9, 2026
keelerm84
approved these changes
May 11, 2026
Member
keelerm84
left a comment
There was a problem hiding this comment.
Did you try removing any of the current test harness suppression entries to see if they are resolved with these changes? There are several associated with reconnect handling you might have fixed.
"Polling" is a customer-facing product term for the PollingDataSource mode, so using it in the streaming log line is confusing -- readers can't tell whether the SDK silently fell back to polling. Reword to "continuing".
kinyoklion
pushed a commit
that referenced
this pull request
May 11, 2026
🤖 I have created a release *beep* *boop* --- ## [3.0.3](3.0.2...3.0.3) (2026-05-11) ### Bug Fixes * keep StreamingDataSource alive on non-Eof stream errors ([#168](#168)) ([b91a9ca](b91a9ca)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Low risk: this is a release metadata bump (manifest/Cargo version) plus changelog entry, with no functional code changes in the diff. > > **Overview** > Bumps `launchdarkly-server-sdk` version from `3.0.2` to `3.0.3` (including `.release-please-manifest.json` and `Cargo.toml`). > > Updates `CHANGELOG.md` with the `3.0.3` release notes, calling out a bug fix to keep `StreamingDataSource` alive on non-EOF stream errors. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 221fa3d. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
StreamingDataSourcefetch loop matched any error other thanEof(and recoverable HTTP responses) with a catch-all_ => break, dropping the spawned task and leaving the data source silently dysfunctional. Because nothing was polling the eventsource-client's stream anymore, its reconnect logic never ran. TheNonebranch also broke without notifying the caller.This PR:
breakwithcontinue, so the eventsource-client gets to reconnect on the next poll. A warning log keeps transient errors visible.init_complete(false)in theNonebranch before breaking, so callers waiting on initialization get a signal when the stream is permanently closed.Context
Surfaced from launchdarkly/rust-server-sdk#116. That report has two contributing bugs; this PR addresses the rust-server-sdk side. The matching rust-eventsource-client side is in launchdarkly/rust-eventsource-client#134 (SDK-2345). Together they restore the reconnection contract: the eventsource-client owns reconnection, and the SDK keeps polling and trusts it.
Tracked in SDK-2346.
Test plan
streaming_source_recovers_from_non_eof_stream_error(launchdarkly-server-sdk/src/data_source.rs) — sends invalid UTF-8 that triggersError::InvalidLinefrom the eventsource-client parser and asserts the mock receives at least two requests, proving the SDK kept polling and the underlying client reconnected.cargo test --package launchdarkly-server-sdk --lib— 300 tests pass; no regressions.cargo fmt --checkclean.Closes #116.
Note
Medium Risk
Touches core streaming update loop behavior; a mistake could cause retry storms or incorrect initialization signaling, though scope is limited and covered by new tests.
Overview
Fixes
StreamingDataSourceto stay alive on non-EOF SSE errors by continuing the fetch loop (allowing the underlying eventsource client to reconnect) and downgrading the failure to a warning.When the event stream is permanently exhausted (
None), it now signals initialization failure viainit_complete(false)before exiting. Adds tests to verify reconnect attempts on parser errors and stopping/signaling failure on unrecoverable HTTP statuses (e.g.401).Reviewed by Cursor Bugbot for commit f1f3cc4. Bugbot is set up for automated code reviews on this repo. Configure here.