fix: Restarted the watch after stream reset #19055#19056
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes issue #19055 by improving resiliency of the Kubernetes service-discovery watch so broker node inventory can recover after watch stream disruptions.
Changes:
- Updated
WatchResultto be usable with try-with-resources by extendingAutoCloseable. - Refactored
NodeRoleWatcher.keepWatchingto use try-with-resources and added explicit handling for HTTP/2 stream reset conditions.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
extensions-core/kubernetes-extensions/src/main/java/org/apache/druid/k8s/discovery/WatchResult.java |
Adjusts the watch iterator contract to support automatic resource management. |
extensions-core/kubernetes-extensions/src/main/java/org/apache/druid/k8s/discovery/K8sDruidNodeDiscoveryProvider.java |
Updates watch loop to auto-close resources and attempts to restart after stream resets/timeouts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@vivek807 Thanks for reporting the issue and submitting this PR for fix. can you address above comments? |
updated, please recheck. |
e79643a to
dc86628
Compare
dc86628 to
268b1ff
Compare
capistrant
left a comment
There was a problem hiding this comment.
Still think we need to push further in removing the okhttp internals from this PR. My initial thought is that we just need to handle all IOExceptions with a full re-list. We can keep the one off handling of the known ok IOException that a simple retry of the watch from same resource version will work for. But for all others force a re-list? My biggest fear is that this is an overreaction.
…-not-recovering-from-stream-reset # Conflicts: # extensions-core/kubernetes-extensions/src/main/java/org/apache/druid/k8s/discovery/K8sDruidNodeDiscoveryProvider.java # extensions-core/kubernetes-extensions/src/main/java/org/apache/druid/k8s/discovery/WatchResult.java
Remove hard dependency of okhttp3 internals
FrankChen021
left a comment
There was a problem hiding this comment.
Follow-up handled: the OkHttp internal StreamResetException dependency was removed, and the watch handling now treats non-interrupt IOExceptions as restartable channel resets while preserving timeout handling. No further inline reply needed from me.
This is an automated review by Codex GPT-5
This is an automated review by Codex GPT-5
|
@FrankChen021 @gianm @capistrant Please let me know if there’s anything I can do to help move this PR forward. |
Fixes #19055.
Description
Restarted the watch after stream reset
Fixed the bug #19055
This PR has: