Skip to content

WireGuard outbound: Fix UDP FullCone NAT on Linux#5858

Merged
RPRX merged 10 commits intoXTLS:mainfrom
LjhAUMEM:wg-linux-tun-fullcone
Apr 5, 2026
Merged

WireGuard outbound: Fix UDP FullCone NAT on Linux#5858
RPRX merged 10 commits intoXTLS:mainfrom
LjhAUMEM:wg-linux-tun-fullcone

Conversation

@LjhAUMEM
Copy link
Copy Markdown
Collaborator

  • fix wireguard outbound linux tun fullcone

@RPRX
Copy link
Copy Markdown
Member

RPRX commented Mar 29, 2026

先等 @Kapkap5454 测试看看吧,另外你有没有兴趣研究下 TUN 实现 auto-route 和 auto-redirect #5857 (comment)

@RPRX
Copy link
Copy Markdown
Member

RPRX commented Mar 29, 2026

其实主要是 Windows 上的 auto-route,毕竟我自己都懒得输入那 route 命令

@LjhAUMEM
Copy link
Copy Markdown
Collaborator Author

还没研究过,我试试吧,不一定能做出来

@Fangliding
Copy link
Copy Markdown
Member

Fangliding commented Mar 29, 2026

到wg里自己写一套吧 别动core别的地方
算了sing包里的定义改就改吧

@RPRX
Copy link
Copy Markdown
Member

RPRX commented Mar 29, 2026

算了sing包里的定义改就改吧

是这样的,如果有人 PR 了原生的 SS2022 过来,sing 的依赖肯定会移除的,虽然现在可能已经没有人有兴趣 PR SS2022 了

@Kapkap5454
Copy link
Copy Markdown

Tested with my configs from here #5848 (comment)

I built new xray from pr, downloaded, replaced, systemctl restart xray on server.
Client curls successfully, connection exists. https://quic.nginx.org/quic.html on client shows quic is OK in firefox browser.
Then I decided to clear logs and restart xray on both client and server.

Client can't connect with warp on server. This in logs many times. Direct freedom on server - client connects fine. Wtf, is this some rate limiting from cloudflare or what? I remember I saw during one of the old tests such message, then I tried connecting without tun, using socks as inbound on client, and it worked. So I am not sure it's some block from Cloudflare.

2026/03/29 09:16:22.513881 [Debug] peer(bmXO…fgyo) - Handshake did not complete after 5 seconds, retrying (try 2)

Anyhow, I rebooted both client and server. Client connected successfully. Curl works. Go to https://quic.nginx.org/quic.html. Loads. Start test. Requests sent (gray squares). Then hangs, no green or black squares in test. Wtf. Repeat. Same result. Reload page.

Firefox shows this: The connection has timed out. The server at quic.nginx.org is taking too long to respond. Reload many times, tried closing and opening different tab with same site - no result. Test opening different websites in same browser - all ok. Tried opening https://quic.nginx.org/quic.html from my regular pc with different IP - all ok. Went back to client vps, opened private browsing window in firefox, quic test page loads, green squares coming, all ok. Return to regular firefox window, open new tab. Again - The connection has timed out. The server at quic.nginx.org is taking too long to respond.

Now, I do remember, during very-very initial tests of xray v26.3.23 on both server and client, from my windows PC with v2rayN and on my android with v2rayng, I ran into the same problem: https://quic.nginx.org/quic.html loading hangs from browser after one or two tries. Then I opened different browser and it loaded. This doesnt seem to happen with xray 26.2.6 on server, I think.....

~09:16 - Client can't connect to internet when server has outbound warp.
~09:30-09:33 After reboot of both server and client. Broken quic test wtf. Only gray squares.
~09:36-:09:37 Quic page not loading at all, other pages load absolutely fine. Curl to the same address succeeds (regular curl, no http3 request).
~09:44 Website quic test all ok. in private mode.
~09:45 Website from regular browsing, connection timed out. opened new tabs same window - same result. Weird.

I attach logs as files, tried to truncate, but still too long. Don't want to omit something possibly important.

client_log.txt
server_log.txt

@LjhAUMEM
Copy link
Copy Markdown
Collaborator Author

LjhAUMEM commented Mar 29, 2026

The WireGuard TCP connection remains the same before and after this commit; the main difference is with UDP. If you cannot use a TCP connection in WireGuard, then the problem isn't with this commit.

In TUN mode, the browser caches the quic.nginx.org IP address obtained from DNS queries, typically prioritizing IPv6. This might mean your outbound IPv6 connection is unstable during the check.

Regarding warp, could you try reusing the configurations from ./wgcf register and ./wgcf generate in wgcf? It's recommended to use an IP address for the endpoint address in the peer field.

@LjhAUMEM
Copy link
Copy Markdown
Collaborator Author

到wg里自己写一套吧 别动core别的地方 算了sing包里的定义改就改吧

这么喜欢 sing 包么...

@RPRX
Copy link
Copy Markdown
Member

RPRX commented Mar 29, 2026

并不,都是世界往别的项目 PR 的 sing 包,我早就说了非必要地依赖外部库会造成维护上的困难

@LjhAUMEM
Copy link
Copy Markdown
Collaborator Author

从 fullcone 那个 commit 到这个 pr 主要改动是 udp 相关,理论上测试 wg 出站的两种 tun udp 没问题和入站的 gvisor tun udp 没问题就行,tcp 连接和以往一样没有任何变动,我这边已经简单测试过都是正常的

他的回复来看甚至还提到了 tcp 不通,属实有点跑题了

@Kapkap5454
Copy link
Copy Markdown

The WireGuard TCP connection remains the same before and after this commit; the main difference is with UDP. If you cannot use a TCP connection in WireGuard, then the problem isn't with this commit.

In TUN mode, the browser caches the quic.nginx.org IP address obtained from DNS queries, typically prioritizing IPv6. This might mean your outbound IPv6 connection is unstable during the check.

Regarding warp, could you try reusing the configurations from ./wgcf register and ./wgcf generate in wgcf? It's recommended to use an IP address for the endpoint address in the peer field.

Maybe you are right. I will run some more tests on the two VPS I made and will report.

But I tested on my personal machine with Windows and v2rayN and active server, it looks totally same to 26.2.6. Mosh is working, quic is working. All good.

@LjhAUMEM
Copy link
Copy Markdown
Collaborator Author

@Kapkap5454 Please also test the server-side Wireguard outbound usage of gvisor.

@LjhAUMEM
Copy link
Copy Markdown
Collaborator Author

@Kapkap5454
Copy link
Copy Markdown

@Kapkap5454 https://github.com/XTLS/Xray-core/actions/runs/23704082970

https://github.com/XTLS/Xray-core/actions/runs/23704082970/artifacts/6164337109

ELI5 please....
These artifacts, they are no different from what I built myself from this PR, right? You didnt make any changes since then? I dont need to redownload them?

"Please also test the server-side Wireguard outbound usage of gvisor." You do mean, test like v2rayN on my Windows PC? which uses gvisor (according to its settings). Correct? I did that, I wrote report in a comment above.

@Kapkap5454
Copy link
Copy Markdown

Just to mention, I never updated client with this PR. Only server. If that's important.

@LjhAUMEM
Copy link
Copy Markdown
Collaborator Author

"Please also test the server-side Wireguard outbound usage of gvisor." You do mean, test like v2rayN on my Windows PC? which uses gvisor (according to its settings). Correct? I did that, I wrote report in a comment above.

Simply add "noKernelTun": true to your server-side Wireguard outbound configuration.

Just to mention, I never updated client with this PR. Only server. If that's important.

If your client does not use Wireguard outbound, then keeping the old version is fine.

ELI5 please....
These artifacts, they are no different from what I built myself from this PR, right? You didnt make any changes since then? I dont need to redownload them?

There's no difference; just make sure your compilation is error-free.

@Kapkap5454
Copy link
Copy Markdown

The WireGuard TCP connection remains the same before and after this commit; the main difference is with UDP. If you cannot use a TCP connection in WireGuard, then the problem isn't with this commit.

In TUN mode, the browser caches the quic.nginx.org IP address obtained from DNS queries, typically prioritizing IPv6. This might mean your outbound IPv6 connection is unstable during the check.

Regarding warp, could you try reusing the configurations from ./wgcf register and ./wgcf generate in wgcf? It's recommended to use an IP address for the endpoint address in the peer field.

Probably you are right that there might be a problem with ipv6. I disabled ipv6 address in warp outbound, I also changed "endpoint": "engage.cloudflareclient.com:2408" to static ipv4. And since then I didn't get any problems connecting client to server using this PR. Weird....

"Please also test the server-side Wireguard outbound usage of gvisor."
It connects and works fine. Quic works.

2026/03/29 16:48:05.705696 [Info] switching dialer
2026/03/29 16:48:05.705700 [Warning] proxy/wireguard: Using gVisor TUN. NoKernelTun is set to true.
2026/03/29 16:48:05.705702 [Debug] UAPI: Updating private key
2026/03/29 16:48:05.705705 [Debug] peer(bmXO…fgyo) - UAPI: Created
2026/03/29 16:48:05.705707 [Debug] peer(bmXO…fgyo) - UAPI: Updating endpoint
2026/03/29 16:48:05.705710 [Debug] peer(bmXO…fgyo) - UAPI: Adding allowedip
2026/03/29 16:48:05.705712 [Debug] peer(bmXO…fgyo) - UAPI: Adding allowedip
2026/03/29 16:48:05.705715 [Debug] proxy/wireguard: bind closed
2026/03/29 16:48:05.705718 [Debug] proxy/wireguard: bind opened
2026/03/29 16:48:05.705720 [Debug] UDP bind has been updated
2026/03/29 16:48:05.705723 [Debug] peer(bmXO…fgyo) - Starting
2026/03/29 16:48:05.705725 [Debug] Interface state was Down, requested Up, now Up
2026/03/29 16:48:05.705729 [Info] [563895835] proxy/vless/inbound: firstLen = 94
2026/03/29 16:48:05.705732 [Info] [563895835] proxy/vless/inbound: received request for udp:9.9.9.9:53
2026/03/29 16:48:05.705736 [Info] [563895835] app/dispatcher: default route for udp:9.9.9.9:53
2026/03/29 16:48:05.705813 [Debug] Routine: encryption worker 1 - started
2026/03/29 16:48:05.705823 [Debug] Routine: decryption worker 1 - started
2026/03/29 16:48:05.705830 [Debug] Routine: handshake worker 1 - started
2026/03/29 16:48:05.705878 [Debug] Routine: TUN reader - started
2026/03/29 16:48:05.705885 [Debug] Routine: event worker - started
2026/03/29 16:48:05.705887 [Debug] Interface up requested
2026/03/29 16:48:05.705930 [Debug] Routine: receive incoming Open - started
2026/03/29 16:48:05.705951 [Debug] Routine: receive incoming Open - started
2026/03/29 16:48:05.705961 [Debug] peer(bmXO…fgyo) - Routine: sequential sender - started
2026/03/29 16:48:05.705970 [Debug] peer(bmXO…fgyo) - Routine: sequential receiver - started
2026/03/29 16:48:05.707488 [Debug] peer(bmXO…fgyo) - Sending handshake initiation
2026/03/29 16:48:05.707507 [Debug] [2724715013] transport/internet: dialing to udp:162.159.192.1:2408
2026/03/29 16:48:05.707512 [Info] [1856734370] proxy/vless/inbound: firstLen = 80
2026/03/29 16:48:05.707527 [Info] [1856734370] proxy/vless/inbound: received request for udp:3.125.64.247:443
2026/03/29 16:48:05.707531 [Info] [1856734370] app/dispatcher: default route for udp:3.125.64.247:443
2026/03/29 16:48:05.707534 [Info] [3999280301] proxy/vless/inbound: firstLen = 94

I do ocasionally get this in logs
2026/03/29 16:48:14.047013 [Info] [3484121671] app/proxyman/outbound: app/proxyman/outbound: failed to process outbound traffic > proxy/wireguard: connection ends > stream error: stream ID 189; CANCEL
On both gvisor and not. But thats normal, right?

Seems like can be merged. Thank you so much!

@LjhAUMEM
Copy link
Copy Markdown
Collaborator Author

Thanks for testing. I'm glad to see that NoKernelTun is working correctly. If Linux tun behaves any differently from NoKernelTun, feel free to provide feedback.

I'm also glad you understand that this PR is for fixing UDP. If you encounter other issues, please open a separate issue.

I do ocasionally get this in logs
2026/03/29 16:48:14.047013 [Info] [3484121671] app/proxyman/outbound: app/proxyman/outbound: failed to process outbound traffic > proxy/wireguard: connection ends > stream error: stream ID 189; CANCEL
On both gvisor and not. But thats normal, right?

I encountered this when using xhttp too, but it didn't seem to affect the connection, and I didn't modify anything related to xhttp, so...

@Kapkap5454
Copy link
Copy Markdown

Kapkap5454 commented Mar 30, 2026

Some new changes?

Yesterday I put artifact from this pr on my main proxy server. The one I connect to with v2rayN+singbox TUN from windows.
And I just can't get off the idea, that https://quic.nginx.org keeps hanging after some tests run.....
And the feeling it never happened on 26.2.6. Yesterday on 26.2.6, I pressed quic run test maybe 20-30 times, without the page ever hanging on any browser.

Then I switched and tried maybe 50 times in different browsers, changing from this pr to 26.2.26 and backwards and again and again.... many-many times. WIth occasional hangs.

I can't get consistent results to say "do this and this and get this behaviour", I see nothing interesting in logs except for stream canceled, nothing interesting in wireshark.

From a very dumb user perspective ignoring the mechanics and technical details. It's like it overfloods something with udp/quic, probably for some time switches to tcp, then dies... I don't know how it's possible, sorry 😅.

The most consistent results I can get is only with ms edge browser, which I have configured most closely to default settings and without built in DOH.

Today I woke up, opened MS Edge, run 2-3 tests (all ok, with QUIC). Then it started running these tests with black squares only (meaning it could only achieve regular http, not quic). So, like downgraded. Ran for 2 times this way. Then hung again. Internet explorer showing This page isn’t working right now quic.nginx.org didn’t send any data. ERR_EMPTY_RESPONSE. Closing browser doesn't help.

It continues working in other browsers after this. As I said, it does hang sometimes in other browsers as well. But in ms edge I can get to this hung up state the fastest way.

I wish I could give you more details. Should I try this PR now, after some changes mentioning buffer were made?

I also think, that in 26.2.6, it always loads these squares sequentially: from left to right, first request sent (gray squares), then they start appear green from left to right, top to bottom.

On 26.3 and after, I notice it can start getting green squares in the middle of the section, before earlier appeared.

@Kapkap5454
Copy link
Copy Markdown

And just to remind you, something similar happened yesterday on two test VPS I made, which used only raw xray, with xray tun on the client side....

@Kapkap5454
Copy link
Copy Markdown

It definitely instantly unfroze in MS Edge after I switched to direct-freedom.

@LjhAUMEM
Copy link
Copy Markdown
Collaborator Author

I think you should know that the modifications involved only relate to whether you are using Wireguard for outbound traffic. If you are using other outbound methods and encounter problems, you shouldn't be complaining here.

Now tell me, which outbound Wireguard configuration is causing your dissatisfaction: Linux tun or Gvisor tun?

@LjhAUMEM
Copy link
Copy Markdown
Collaborator Author

According to 5845, have you tried other protocols?

@Kapkap5454
Copy link
Copy Markdown

I think you should know that the modifications involved only relate to whether you are using Wireguard for outbound traffic. If you are using other outbound methods and encounter problems, you shouldn't be complaining here.

Now tell me, which outbound Wireguard configuration is causing your dissatisfaction: Linux tun or Gvisor tun?

Well, I do use wireguard outbound for cloudflare warp on the server, right? That is the case for this PR? We already solved some major problems with it here. QUIC and Mosh started to work on the client after it.

Linux tun or Gvisor tun: I experienced something similar yesterday, when I used two fresh test VPS, with raw xray. It was Linux-tun.

Now, as far as I understand, I am using gvisor tun. Because on the server, I have one more wireguard inbound. Its dormant, I dont use it, but its there. And logs say, kernel-tun not supported for wireguard inbound, switching to gvisor tun. So it means it also affects wireguard-warp outbound on the server, am I correct....?

So I experience this problem on both: linux tun and gvisor tun.

Right now I tested again. MS Edge froze on quic test page. I changed to freedom outbound on the server (without disabling the wireguard-warp outbound). It immediately unfroze quic test page. I switched back to warp. Quic page continues to work. I pressed run test 5-8 times. All QUIC results. Tried incomplete tests (click run, then cancel. then again run). Working. Turned Xray off on the client. Turned on.
Go to ms esge + quic test page: its hung. Switched server again to freedom. Quic in ms edge unfroze.

If you think this is some different kind of issue, ok I will stop writing here. Unfortunately I doubt this would qualify for a separate issue, rprx will close it. Not enough data =/

@Kapkap5454
Copy link
Copy Markdown

According to 5845, have you tried other protocols?

You mean this?
VLESS+ENC: RTSP handshake failed ❌
VLESS+ENC+FLOW: RTSP handshake failed ❌
VLESS+ENC+MUX: Normal streaming ✅️
VMESS unprotected: Normal streaming ✅️

No, I haven't... Should I?
With flow, it would effectively kill QUIC, so no point. Mux is not applicable, as I use xhttp. Vmess - never used it.

Try VLESS + ENC? The problem seems to be connected with wireguard outbound, I don't understand how that might help. But if you insist, I can test.

@LjhAUMEM
Copy link
Copy Markdown
Collaborator Author

let's start over. I need you to provide your server-side configuration file, and whether curl works correctly when the problem occurs.

Server-side update: https://github.com/XTLS/Xray-core/actions/runs/23727523363/artifacts/6171013326
Clients can keep the old version.

@LjhAUMEM
Copy link
Copy Markdown
Collaborator Author

Perhaps I should provide you with a version of Wireguard that only restores the old version for testing?

If it works immediately after switching to Freedom, it doesn't necessarily mean it's a Wireguard issue, because you're using a warp, another Wireguard service provider.

If possible, it would be best to host your own Wireguard inbound service and Freedom outbound service; that would be ideal.

@Kapkap5454
Copy link
Copy Markdown

It is very hard to reproduce consistently. I will try to find some pattern before giving full config and logs....

Right now my observation is:
26.2.6 I think I never encountered this.
26.3+ many reloads+starting quic test, cancelling, starting, etc. Makes it appear. Not every time though.
When it's frozen
1)switching on server to direct freedom - unfreezes quic page immediately.
2)switching back to 26.2.6 does not immediately unfreeze it. but after some time it will unfreeze and all is ok.
3)not doing anything, will often unfreeze it after some long time.
4)when its frozen in one browser, it continues working in other browsers.
5)running curl.... it sometimes works when the page is frozen in one browser. But sometimes it produces this result
$ curl https://quic.nginx.org
curl: (35) TLS connect error: error:0A000126:SSL routines::unexpected eof while reading
while at the same time it continues working in other browsers.
6)when its frozen, logs on server only show
image
and
image

@Kapkap5454
Copy link
Copy Markdown

No, I totally can't reproduce it consistently. It has been working absolutely perfect for the last hour, done hundreds of tests.... I withdraw all these complaints from here. If it ever happens I will open a new issue and provide dump. Thank you for your time.

@LjhAUMEM
Copy link
Copy Markdown
Collaborator Author

I understand your confusion. The only difference between the new and old versions of Wireguard outbound is the use of readfrom and writeto, so it's almost impossible for them to have problems.

I think the only explanation is that the warp in your region is unstable. You can try modifying the address created by wg tun using address: [""], the default is ["10.0.0.1", "fd59:7153:2388:b5fd:0000:0000:0000:0001"], which might help. Additional things to do include modifying the MTU and using ip address in the peer endpoint.

@LjhAUMEM
Copy link
Copy Markdown
Collaborator Author

Additionally, if you suspect a problem with an inbound or outbound service, please provide a reproducible configuration. Ideally, the inbound service should be hosted by self, as this will facilitate troubleshooting.

@RPRX RPRX changed the title fix wireguard outbound linux tun fullcone WireGuard outbound: Fix UDP FullCone NAT on Linux Apr 5, 2026
@RPRX RPRX merged commit ba88aa1 into XTLS:main Apr 5, 2026
39 checks passed
Exclude0122 pushed a commit to Exclude0122/Xray-core that referenced this pull request Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants