Skip to content

fix: keep StreamingDataSource alive on non-Eof stream errors#168

Merged
kinyoklion merged 4 commits into
mainfrom
rl/sdk-2346/streaming-source-recovers-from-stream-errors
May 11, 2026
Merged

fix: keep StreamingDataSource alive on non-Eof stream errors#168
kinyoklion merged 4 commits into
mainfrom
rl/sdk-2346/streaming-source-recovers-from-stream-errors

Conversation

@kinyoklion
Copy link
Copy Markdown
Member

@kinyoklion kinyoklion commented May 8, 2026

Summary

The StreamingDataSource fetch loop matched any error other than Eof (and recoverable HTTP responses) with a catch-all _ => break, dropping the spawned task and leaving the data source silently dysfunctional. Because nothing was polling the eventsource-client's stream anymore, its reconnect logic never ran. The None branch also broke without notifying the caller.

This PR:

  • Replaces the catch-all break with continue, so the eventsource-client gets to reconnect on the next poll. A warning log keeps transient errors visible.
  • Calls init_complete(false) in the None branch before breaking, so callers waiting on initialization get a signal when the stream is permanently closed.

Context

Surfaced from launchdarkly/rust-server-sdk#116. That report has two contributing bugs; this PR addresses the rust-server-sdk side. The matching rust-eventsource-client side is in launchdarkly/rust-eventsource-client#134 (SDK-2345). Together they restore the reconnection contract: the eventsource-client owns reconnection, and the SDK keeps polling and trusts it.

Tracked in SDK-2346.

Test plan

  • New test streaming_source_recovers_from_non_eof_stream_error (launchdarkly-server-sdk/src/data_source.rs) — sends invalid UTF-8 that triggers Error::InvalidLine from the eventsource-client parser and asserts the mock receives at least two requests, proving the SDK kept polling and the underlying client reconnected.
  • cargo test --package launchdarkly-server-sdk --lib — 300 tests pass; no regressions.
  • cargo fmt --check clean.

Closes #116.


Note

Medium Risk
Touches core streaming update loop behavior; a mistake could cause retry storms or incorrect initialization signaling, though scope is limited and covered by new tests.

Overview
Fixes StreamingDataSource to stay alive on non-EOF SSE errors by continuing the fetch loop (allowing the underlying eventsource client to reconnect) and downgrading the failure to a warning.

When the event stream is permanently exhausted (None), it now signals initialization failure via init_complete(false) before exiting. Adds tests to verify reconnect attempts on parser errors and stopping/signaling failure on unrecoverable HTTP statuses (e.g. 401).

Reviewed by Cursor Bugbot for commit f1f3cc4. Bugbot is set up for automated code reviews on this repo. Configure here.

The fetch loop matched any error other than `Eof` (and recoverable
HTTP responses) with a catch-all `_ => break`, dropping the spawned
task and leaving the data source silently dysfunctional. Because no
one was polling the eventsource-client's stream anymore, its reconnect
logic never ran. The `None` branch also broke without notifying the
caller, so callers waiting on initialization had no signal that the
stream had closed permanently.

Replace the catch-all `break` with `continue` so the eventsource-client
gets to reconnect on the next poll, and call `init_complete(false)` in
the `None` branch so the caller can observe permanent stream closure.

Adds `streaming_source_recovers_from_non_eof_stream_error` test --
sends invalid UTF-8 to trigger `Error::InvalidLine` from the parser and
asserts the mock receives at least two requests.

Pairs with launchdarkly/rust-eventsource-client#134, which fixes the
matching state-transition gap on the eventsource-client side. Together
they restore the reconnection contract: the eventsource-client owns
reconnection, and the SDK keeps polling and trusts it.

Closes #116.
@kinyoklion kinyoklion changed the title fix: keep StreamingDataSource alive on non-Eof stream errors [SDK-2346] fix: keep StreamingDataSource alive on non-Eof stream errors May 8, 2026
kinyoklion added 2 commits May 8, 2026 14:45
The 401 path (and any other status that `is_http_error_recoverable`
rejects) should stop the streaming fetch loop and signal init failure
to the caller. The catch-all-error change in this PR sits directly
next to that branch, so add a behavioral test exercising the 401 path
end-to-end through the fetch loop -- existing coverage was only of
the predicate function.
Three follow-ups from review of #168:

* Routes `Error::Eof` to `debug!` so healthy stream rotations don't
  spam `warn!` lines on every reconnect.
* Rewords the catch-all comment and log: some non-Eof errors converge
  via the eventsource-client's `StreamClosed` transition (observed as
  `None` on the next poll), not in-stream reconnect, so "will retry"
  was misleading.
* Strengthens `streaming_source_recovers_from_non_eof_stream_error`
  with an `Arc<Mutex<Option<bool>>>` capture asserting `init_complete`
  is *not* called during the retry window. Permanent-failure init is
  already covered by `streaming_source_stops_on_unrecoverable_http_status`.
Comment thread launchdarkly-server-sdk/src/data_source.rs Outdated
Copy link
Copy Markdown
Member

@keelerm84 keelerm84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try removing any of the current test harness suppression entries to see if they are resolved with these changes? There are several associated with reconnect handling you might have fixed.

@kinyoklion kinyoklion marked this pull request as ready for review May 11, 2026 15:27
@kinyoklion kinyoklion requested a review from a team as a code owner May 11, 2026 15:27
"Polling" is a customer-facing product term for the PollingDataSource
mode, so using it in the streaming log line is confusing -- readers
can't tell whether the SDK silently fell back to polling. Reword to
"continuing".
@kinyoklion kinyoklion merged commit b91a9ca into main May 11, 2026
32 checks passed
@kinyoklion kinyoklion deleted the rl/sdk-2346/streaming-source-recovers-from-stream-errors branch May 11, 2026 17:08
kinyoklion pushed a commit that referenced this pull request May 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[3.0.3](3.0.2...3.0.3)
(2026-05-11)


### Bug Fixes

* keep StreamingDataSource alive on non-Eof stream errors
([#168](#168))
([b91a9ca](b91a9ca))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk: this is a release metadata bump (manifest/Cargo version)
plus changelog entry, with no functional code changes in the diff.
> 
> **Overview**
> Bumps `launchdarkly-server-sdk` version from `3.0.2` to `3.0.3`
(including `.release-please-manifest.json` and `Cargo.toml`).
> 
> Updates `CHANGELOG.md` with the `3.0.3` release notes, calling out a
bug fix to keep `StreamingDataSource` alive on non-EOF stream errors.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
221fa3d. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data source can shut down with no way to detect it

2 participants