Skip to content

Defer redis connection until first use (avoid sockets stuck in CLOSE_WAIT)#3618

Open
jes wants to merge 1 commit intoOpenSIPS:masterfrom
jes:defer-redis-connection
Open

Defer redis connection until first use (avoid sockets stuck in CLOSE_WAIT)#3618
jes wants to merge 1 commit intoOpenSIPS:masterfrom
jes:defer-redis-connection

Conversation

@jes
Copy link
Contributor

@jes jes commented Apr 3, 2025

Summary
We were seeing OpenSIPS leaving redis client sockets in the CLOSE_WAIT state when redis restarts. This PR prevents this problem.

Details
When redis restarts, the client sockets go into the CLOSE_WAIT state until OpenSIPS next tries to use the socket and then notices that it needs to reconnect.
Some of the OpenSIPS processes open redis sockets that they never use, because the connection is opened immediately even if it doesn't get used.
This is not a major issue, but it means we keep getting alerts for sockets stuck in CLOSE_WAIT.

Solution
This PR fixes the problem by deferring the connection to redis until it is actually used. That way the only sockets that go into CLOSE_WAIT are ones that have been actively used at least once, so we can expect them to get used again, which will clean up the CLOSE_WAIT sockets.

Compatibility
I don't imagine this breaks any scenarios.

Closing issues
N/A

Copy link
Member

@liviuchircu liviuchircu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jes! I would be onboard with this with no hesitation, were it not impacting the startup phase. As of now, say you start OpenSIPS with Redis offline -- it gives an Error and fails startup. If we merge this, there is an added side-effect of OpenSIPS now successfully starting with an offline Redis.

Is this subtle startup behavior change something we want? Is it for the better? And, also worth asking, are the current, "indefinitely" temporary CLOSE_WAIT sockets that harmful?

@NormB
Copy link
Member

NormB commented Feb 10, 2026

I've been running a fork that solves the same CLOSE_WAIT problem but addresses @liviuchircu's concern about the startup behavior change. Instead of unconditionally deferring the connection, add an opt-in module parameter:

/* in cachedb_redis.c, alongside shutdown_on_error */
int lazy_connect = 0;

/* in the param_export_t array */
{ "lazy_connect", INT_PARAM, &lazy_connect },

Then the change in redis_new_connection() becomes:

/* if doing failover Redises, only connect the 1st one for now! */
if (!cons && !lazy_connect && redis_connect(con) < 0) {
LM_ERR("failed to connect to DB\n");
if (shutdown_on_error)
goto out_err;
}

This way:

  • Default (lazy_connect=0): Existing behavior is preserved — startup fails if Redis is unreachable and shutdown_on_error=1. No surprise for current users.

  • lazy_connect=1: Connections are deferred until first use, solving the CLOSE_WAIT problem for deployments where unused worker processes hold stale sockets.

  • No additional reconnect logic needed: The existing REDIS_INIT_NODES check in redis_run_command() already handles first-use connect for uninitialized connections: if (!(con->flags & REDIS_INIT_NODES) && redis_connect(con) < 0) {

    The only code change is a one-line guard (!lazy_connect &&) plus the parameter declaration — minimal diff, fully backward-compatible, and operators can opt in when CLOSE_WAIT is a concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants