sockets_mgm: fix race in receive_fd causing infinite loop on reload #3820

NormB · 2026-02-11T19:00:08Z

Summary

Fixes #3789 — sockets_mgm reload causes all OpenSIPS worker processes to spin at 100% CPU in an infinite loop inside push_sock2list() → sock_listadd().

Root Cause

During sockets_reload, every process receives an IPC RPC to run rpc_socket_reload_proc(). Worker (non-dynamic) processes close their copy of each dynamic socket, then call receive_fd() on the shared sock_mgm_unix[0] socketpair to obtain a fresh fd from the mgm process.

Since sock_mgm_unix[0] is a single SOCK_STREAM socket shared across all worker processes, concurrent receive_fd() calls race: worker A can consume the fd+metadata response intended for worker B. When this happens, worker B receives the wrong struct sock_mgm * pointer — one that references a socket already present in worker B's listener list. The sock_listadd() macro then corrupts the linked list into a circular loop (si->next == si), causing push_sock2list() to spin indefinitely.

GDB confirmed: si->next pointed to itself, and protos[1].listeners formed an infinite cycle through the same node.

Fix

Add a sock_mgm_reload_lock that serializes the entire send-IPC-to-mgm + receive-fd sequence across worker processes, ensuring only one worker at a time performs the fd-passing handshake on the shared socketpair.

Dynamic (mgm) processes are excluded from this lock because they create sockets directly via sock_mgm_add_listener() and never call receive_fd(). Including them would deadlock: the worker holding the lock blocks on receive_fd() waiting for the mgm process to handle rpc_sockets_send(), but the mgm process would be spinning on the same lock.

Changes

sockets_mgm.c: Add sock_mgm_reload_lock (shared memory lock), allocated and initialized in mod_init(). In rpc_socket_reload_proc(), non-dynamic processes acquire the lock before the reload sequence and release it after all receive_fd() calls complete.

Test plan

Start OpenSIPS with 2 dynamic UDP sockets from DB — 22 processes, 16 dynamic socket entries, all sleeping
1st sockets_reload (add 3rd socket) — MI returns OK, 24 socket entries, no CPU spinning
2nd sockets_reload (add 4th socket) — MI returns OK, 32 socket entries, no CPU spinning
All processes remain in S (sleeping) state throughout — verified with ps and top
Clean shutdown after testing

During sockets_reload, all processes receive an IPC RPC to run rpc_socket_reload_proc(). Non-dynamic (worker) processes close their copy of each dynamic socket and then call receive_fd() on the shared sock_mgm_unix[0] socketpair to get a fresh fd from the mgm process. Because sock_mgm_unix[0] is shared across all workers and SOCK_STREAM delivers bytes in order (not per-message), concurrent receive_fd() calls race: worker A can consume the fd response intended for worker B. When this happens, worker B receives worker A's fd response, which references a socket already in worker B's listener list. The sock_listadd() macro then corrupts the linked list into a circular loop (si->next == si), causing push_sock2list() to spin at 100% CPU indefinitely. Add a sock_mgm_reload_lock that serializes the entire send-IPC-to-mgm + receive-fd sequence for worker processes. Dynamic (mgm) processes are excluded from this lock because they create sockets directly via sock_mgm_add_listener() and never call receive_fd(); including them would deadlock since the worker holding the lock blocks on receive_fd() waiting for the mgm to process rpc_sockets_send(). Fixes: OpenSIPS#3789

NormB requested a review from razvancrainea February 11, 2026 19:01

NormB mentioned this pull request Feb 11, 2026

[BUG] sockets_mgm module causes high CPU usage and "freezes" OpenSIPS #3789

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sockets_mgm: fix race in receive_fd causing infinite loop on reload #3820

sockets_mgm: fix race in receive_fd causing infinite loop on reload #3820

NormB commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sockets_mgm: fix race in receive_fd causing infinite loop on reload #3820

Are you sure you want to change the base?

sockets_mgm: fix race in receive_fd causing infinite loop on reload #3820

Conversation

NormB commented Feb 11, 2026

Summary

Root Cause

Fix

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant