AsyncRedisManager listener can die permanently (and silently with default logger) after Redis closes pubsub clients for exceeding client-output-buffer-limit

### Environment

- python-socketio **5.16.1** (server, `async_mode='asgi'`), redis-py **7.4.0**
- Valkey 8 (alpine), used via `AsyncRedisManager` (`redis://…/1`)
- uvicorn with **8 workers** (one `AsyncServer` + `AsyncRedisManager` per worker), Engine.IO v4, websocket-only transport
- Deployment: Open-WebUI v0.9.6 in production (~150 users)

### Summary

After Redis force-closed all pub/sub subscriber connections for exceeding `client-output-buffer-limit pubsub` (a single ~33MB published message hit the 32mb hard limit), 7 of our 8 workers' listeners recovered via `_redis_listen_with_retries`, but **one worker's listener died permanently**: its process stayed alive and kept accepting websocket connections, but it never re-subscribed to the channel, so every client attached to that worker silently stopped receiving events until the service was restarted. `PUBSUB NUMSUB socketio` showed 7 subscribers with 8 workers for ~24h.

With the default `logger=False`, the death is **completely silent** — there is nothing in the logs to distinguish a healthy worker from a deaf one.

### What Redis logged (kill storm)

```
1:M 10 Jun 2026 20:51:58 # Client id=543906 … flags=P … cmd=subscribe …
  omem=33554456 tot-mem=33557200 … scheduled to be closed ASAP for overcoming of output buffer limits.
```

(3 rounds of kills within 10 minutes; each round disconnected every subscriber. The retry path logged `Cannot receive from redis... retrying in 1 secs` in bursts and recovered — except for one worker.)

### Code analysis (5.16.1)

1. `AsyncRedisManager._redis_listen_with_retries` only catches the Redis client's error class (`redis.exceptions.RedisError` / `ValkeyError`). Any other exception escaping `pubsub.subscribe()` / `pubsub.listen()` (e.g. redis-py's asyncio `PubSub` can raise plain `RuntimeError` for connection-state issues) propagates out of the generator.

2. In `AsyncPubSubManager._thread()`:
   - the outer `except Exception` logs to `self.server.logger` — invisible with the default `logger=False`;
   - if the `_listen()` generator ever **exits** instead of raising, the loop hits `self.server.logger.error('pubsub listen() exited unexpectedly')` followed by `break` — the background task ends **permanently**, with no recovery and (by default) no visible trace.

We could not capture the exact escaping exception precisely because logging was disabled, but the observable outcome was a permanently dead listener following the buffer-limit kill storm, while sibling workers recovered.

### Suggested improvements

Any of these would have avoided the silent permanent failure:

1. Broaden the retry in `_redis_listen_with_retries` to also retry on connection-layer exceptions that are not subclasses of `RedisError` (or simply `except Exception`), since the loop already reconnects from scratch.
2. In `_thread()`, restart `_listen()` instead of `break`-ing permanently when the generator exits.
3. Log listener death at `logging.getLogger('socketio')` level regardless of the `logger=False` convenience flag, or expose a health indicator (e.g. a `listening` property / callback) so multi-worker deployments can monitor listener liveness — today the only reliable external check we found is comparing `PUBSUB NUMSUB <channel>` against the worker count.

### Workaround we applied

Raised Valkey's `client-output-buffer-limit pubsub` (32mb→64mb hard) to stop the kills at the source, plus an external cron alert on `PUBSUB NUMSUB socketio < workers`.

### Related

- #1411 (event loss while Redis temporarily unavailable)
- #1569 (reconnect path orphans previous Redis client)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AsyncRedisManager listener can die permanently (and silently with default logger) after Redis closes pubsub clients for exceeding client-output-buffer-limit #1581

Environment

Summary

What Redis logged (kill storm)

Code analysis (5.16.1)

Suggested improvements

Workaround we applied

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

AsyncRedisManager listener can die permanently (and silently with default logger) after Redis closes pubsub clients for exceeding client-output-buffer-limit #1581

Description

Environment

Summary

What Redis logged (kill storm)

Code analysis (5.16.1)

Suggested improvements

Workaround we applied

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions