Skip to content

fix: Handle Redis connection errors gracefully instead of panicking#111

Merged
sbernauer merged 4 commits intomainfrom
fix/redis-connection
Apr 20, 2026
Merged

fix: Handle Redis connection errors gracefully instead of panicking#111
sbernauer merged 4 commits intomainfrom
fix/redis-connection

Conversation

@sbernauer
Copy link
Copy Markdown
Member

Maybe fixes #109

When Redis goes down (e.g. master failover), a broken pipe error would cause get_queued_query_count to panic due to an .unwrap(), which then poisoned the RwLock used in metrics callbacks, cascading into further panics and leaving pods unresponsive until the liveness probe killed them.

  • Replace .unwrap() with proper error propagation in get_queued_query_count
  • Handle poisoned locks, closed channels, and panicked threads gracefully in both metrics callbacks

sbernauer and others added 4 commits April 17, 2026 15:57
When Redis goes down (e.g. master failover), a broken pipe error would
cause `get_queued_query_count` to panic due to an `.unwrap()`, which
then poisoned the RwLock used in metrics callbacks, cascading into
further panics and leaving pods unresponsive until the liveness probe
killed them.

- Replace `.unwrap()` with proper error propagation in
  `get_queued_query_count`
- Handle poisoned locks, closed channels, and panicked threads
  gracefully in both metrics callbacks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sbernauer sbernauer self-assigned this Apr 17, 2026
@sbernauer sbernauer moved this to Development: In Progress in Stackable Engineering Apr 17, 2026
@sbernauer sbernauer moved this from Development: In Progress to Development: Waiting for Review in Stackable Engineering Apr 17, 2026
@NickLarsenNZ NickLarsenNZ self-requested a review April 17, 2026 14:57
Copy link
Copy Markdown
Member

@NickLarsenNZ NickLarsenNZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread trino-lb-persistence/src/redis/mod.rs
Comment thread trino-lb/src/metrics.rs
@sbernauer sbernauer moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Apr 20, 2026
@sbernauer
Copy link
Copy Markdown
Member Author

Kuttl tests passed 🚀

@sbernauer sbernauer added this pull request to the merge queue Apr 20, 2026
Merged via the queue into main with commit b1319ed Apr 20, 2026
9 checks passed
@sbernauer sbernauer deleted the fix/redis-connection branch April 20, 2026 07:06
@sbernauer sbernauer moved this from Development: In Review to Development: Done in Stackable Engineering Apr 20, 2026
@lfrancke lfrancke moved this from Development: Done to Done in Stackable Engineering May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

trino-lb is stuck after redis master shuts down

3 participants