Skip to content

[Bug]: No Redis Connection Recovery Due to Cached IP Address #11086

@nkbud

Description

@nkbud

What happened?

Problem

When a Redis pod restarts/moves in Kubernetes (getting a new IP), LibreChat continues attempting to connect to the old cached IP address indefinitely instead of re-resolving the DNS hostname.

Evidence

  • Redis pod moved from one node to another.
  • LibreChat attempted connection to 172.20.201.154:6379 for 51+ minutes straight
  • 167 failed connection attempts all targeting the same stale IP
  • Manual LibreChat restart resolved the issue immediately

Log examples:

14:00:08 → "connect ECONNREFUSED 172.20.201.154:6379"
14:51:30 → "connect ECONNREFUSED 172.20.201.154:6379"  // Still same IP

Theory

The Redis client caches the resolved IP address on initial connection and never re-performs DNS lookups during reconnection attempts.

Proposed Solution

Add a DNS re-resolution handler in the Redis connection loss event:

redisClient.on('error', async (err) => {
  if (err.code === 'ECONNREFUSED' || err.code === 'EHOSTUNREACH') {
    // Trigger DNS re-lookup by recreating the client
    await redisClient.disconnect();
    redisClient = createRedisClient({ url: REDIS_URL }); // Forces fresh DNS resolution
  }
});

The connection loss event should trigger a DNS lookup retry for the original hostname before attempting reconnection, rather than reusing the cached IP.

Impact

  • Affects all session management (SAML, OpenID)

Reproduction

  1. Deploy LibreChat + Redis in K8s
  2. Delete Redis pod: kubectl delete pod redis-xxx
  3. LibreChat tries old IP forever
  4. Restart LibreChat → works immediately

Expected: LibreChat should automatically recover by re-resolving DNS during reconnection attempts.

Version Information

v0.8.1

Steps to Reproduce

  1. Deploy LibreChat + Redis in K8s
  2. Delete Redis pod: kubectl delete pod redis-xxx
  3. LibreChat tries old IP forever
  4. Restart LibreChat → works immediately

What browsers are you seeing the problem on?

No response

Relevant log output

14:00:08 → "connect ECONNREFUSED 172.20.201.154:6379"
14:51:30 → "connect ECONNREFUSED 172.20.201.154:6379"  // Still same IP

Screenshots

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐛 bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions