Skip to content

Bound shutdown after termination signal#6164

Open
M03ED wants to merge 1 commit into
XTLS:mainfrom
M03ED:fix/bounded-shutdown-timeout
Open

Bound shutdown after termination signal#6164
M03ED wants to merge 1 commit into
XTLS:mainfrom
M03ED:fix/bounded-shutdown-timeout

Conversation

@M03ED
Copy link
Copy Markdown
Contributor

@M03ED M03ED commented May 19, 2026

This change prevents Xray from waiting indefinitely during graceful shutdown after receiving SIGTERM or interrupt.

Previously, shutdown depended on the deferred server.Close() call returning. If any feature Close() path blocked, the process could stay alive while listeners or active connections continued operating. This made stale Xray processes difficult to detect, especially when another process could bind the same port.

Now Xray runs server.Close() explicitly after receiving a termination signal and waits up to 10 seconds. If shutdown does not complete in time, the process exits with code 1 so supervisors can treat it as a failed shutdown and avoid leaving a functional stale process behind.

This issue was most visible on mobile clients. Some apps, such as Telegram proxy-style traffic, may continue using an already-established path even when the VPN/client UI appears disabled. If the old Xray process remains alive during a stuck shutdown, those connections can continue transferring traffic and updating usage counters, making the node look stopped from the client side while traffic is still flowing server-side.

@Fangliding
Copy link
Copy Markdown
Member

如果观测到了意料外的 Close() 阻塞 应该直接找到阻塞 Close 的是谁 耗时太久肯定是设计问题 尝试异步关闭 Close 不是一个好办法

@M03ED
Copy link
Copy Markdown
Contributor Author

M03ED commented May 19, 2026

如果观测到了意料外的 Close() 阻塞 应该直接找到阻塞 Close 的是谁 耗时太久肯定是设计问题 尝试异步关闭 Close 不是一个好办法

we have similar situation with #5844 but this can happens at both server and client, there is another reason why most people didn't realize this problem, it's because of allowing reuse port, client will run another process without knowing there is a zombie process, since most people even don't know or doesn't use this feature its better to make it disable by default, maybe if it was disabled this issue have been found way sooner

since were closing the process i think this would be enough and GC will handle the dirty works, yes there may be better approach but it needs deep inspection, very good understanding about the project structure and takes way much time and testing.

@Fangliding
Copy link
Copy Markdown
Member

https://github.com/XTLS/Xray-core/tree/close-trace
你们可以尝试运行一下这个分支 它将会在close超过10秒后抛出阻塞它的函数

@M03ED
Copy link
Copy Markdown
Contributor Author

M03ED commented May 19, 2026

https://github.com/XTLS/Xray-core/tree/close-trace 你们可以尝试运行一下这个分支 它将会在close超过10秒后抛出阻塞它的函数

i handled this before for server side in my project and i kill the zombie process manually with a double check, at the time we found no exact pattern of this, and there is no guaranty it happens again, also were seeing it mostly on client side (mobile usually)

@M03ED
Copy link
Copy Markdown
Contributor Author

M03ED commented May 19, 2026

https://github.com/XTLS/Xray-core/tree/close-trace 你们可以尝试运行一下这个分支 它将会在close超过10秒后抛出阻塞它的函数

i tried this version on windows but i was unable to reproduce the issue
at the same time o noticed something, ws transport (without xhttp) is not working for me, v26.3.27 is completely fine, since there is no release note i couldn't check this is intentional or its a bug

@RPRX
Copy link
Copy Markdown
Member

RPRX commented May 28, 2026

下个月看看

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants