Reverse: portal-worker should not be closed before making sure there is at least one other active worker #4869

patterniha · 2025-07-07T19:20:49Z

when a portal-worker is draining, heartbeat-reader/writer is immediately closed, this causes some problems, let explain why:

suppose we only have one active portal-worker and it starts draining.

also support the only sub-connection(session) of this worker is heartbeat-sub-connection and it has no client-sub-connection.

after heartbeat-sub-connection is closed, the number of sub-connection(session) of the worker reaches 0,

worker check every 16 seconds if there is no session(sub-connection), It closes itself.

So there is a chance that the worker will be closed after heartbeat-sub-connection is closed and before the new worker arrives from bridge-side.

So during this time, until a new worker is created and arrives from bridge-side, our connection with bridge-side will be completely broken, so if new sub-connection is received from the client-side, we can't connect it to the bridge-side.

so we need to check that there is at least one other active worker before closing heartbeat-sub-connection.

///

also, apart from the fact that our connection to the bridge-side can be interrupted for a moment, there is another reason for this change:

during Iran-Israel 12 days war, some Iran-servers become "Iran-Access", means they only accept connection from Iran-IP, but if the connection is established before becoming "Iran-Access" from foreign-IP, the connection wouldn't disconnect after becoming "Iran-Access".

so Xray-core should not disconnect a connection(worker), before making sure there is at least one other active connection (worker).

///

also, when worker is not draining, heartbeat period increase to 10 seconds.

…ne other worker

Fangliding · 2025-07-07T19:58:47Z

如果只是为了至少有一个活连接正确的方法应该是把monitor()函数里的timer放入ClientWorker 然后在heartbeat()中发送drain消息前把timer重设为16秒给客户端时间打开新的连接而不是还丢个parentPicker进去

patterniha · 2025-07-07T20:09:07Z

If you just want to have at least one active connection, the correct way should be to put the timer in the monitor() function into the ClientWorker and then reset the timer to 16 seconds before sending the drain message in heartbeat() to give the client time to open a new connection instead of throwing a parentPicker in.

@Fangliding

But, this doesn't help the "Iran-Access" problem (read the second reason)

After becoming "Iran-Access", the new connection cannot be established from bridge-server to Iran-server.

So after 16 seconds the only active worker is disconnected and new connection cannot be established.

~~If this change had been made sooner, my Iran-server wouldn't have been disconnected from the foreign-bridge-server during the war~~

Fangliding · 2025-07-07T20:28:46Z

我不觉得核心应该为只会发生一次两次的Internet shutdown的特殊行为做准备持有父对象的引用是不太好的行为这么实现还额外绕开内部draining机制难懂又难看而且即便是这样这样tcp连接不见得可以活太久以及mux本来就有的硬上限65535摆在那发送65535个请求后mux就没有多余的流ID了总的来说可用面很小还不如在外面开个mux或者用grpc和xhttp3之类的当底层传输

patterniha · 2025-07-07T23:08:49Z

OK, I implemented your timer-reset idea instead.

patterniha · 2025-07-10T09:10:39Z

Xray-core/app/reverse/config.go

Line 11 in cb1afb3

randomLength := dice.Roll(64)

another problem is that, If 0 is selected for randomLength here, the heartbeat-packet will not be sent at all.

~~and if the packet is drain-notify packet, bridge-server doesn't notice draining and finally, after filling up 65535-mux-limit, our connection is completely cut off.~~

so randomLength should be at least 1.

UPDATE:

it only affect keepAlive-hearbeat, not drain-notify-heartbeat

Fangliding · 2025-07-10T09:59:22Z

那要不你试试drain数据包能不能发出去

patterniha · 2025-07-10T10:03:04Z

I tested, I always test after making a change.

set the randomLength to 0, and you see heartbeat-packet is not sent.

Fangliding · 2025-07-10T10:06:49Z

你有试过这会导致drain数据包不会被发送导致超过256吗

patterniha · 2025-07-10T10:27:46Z

if msg.State = Control_DRAIN the packet is sent, so it doesn't affect drain-packet.

but heartbeat-packet has two roles drain-notify and keepAlive.

and it is still cause some keepAlive packets is not sent.(If there are several consecutive fail-keepAlive, the connection may be disconnected)

so randomLength still needs to be set to at least 1.

Fangliding · 2025-07-10T10:31:59Z

我没有说不行我只是说你臆想问题不是一回两回了。。

…e there is at least one other active worker (XTLS#4869)" This reverts commit b065595.

Reverse: portal-worker should not be closed until there is at least o…

8844058

…ne other worker

patterniha force-pushed the rev-fix branch from 6f3208b to 23633b2 Compare July 7, 2025 23:06

reset timer before close heartbeat

b7895a0

patterniha force-pushed the rev-fix branch from 23633b2 to b7895a0 Compare July 7, 2025 23:13

fix random length

9491611

patterniha mentioned this pull request Jul 10, 2025

vless+xtls-rprx-vision+reality偶发ERR_SSL_PROTOCOL_ERROR #4878

Open

4 tasks

Fangliding approved these changes Jul 11, 2025

View reviewed changes

patterniha mentioned this pull request Jul 19, 2025

DNS hosts: Support returning RCode #4681

Merged

RPRX merged commit b065595 into XTLS:main Jul 23, 2025
39 checks passed

patterniha deleted the rev-fix branch July 23, 2025 15:28

This was referenced Jul 25, 2025

xray 25.7.25 Homebrew/homebrew-core#231232

Merged

xray 25.7.26 Homebrew/homebrew-core#231286

Merged

maoxikun added a commit to maoxikun/Xray-core that referenced this pull request Aug 22, 2025

Revert "Reverse: portal-worker should not be closed before making sur…

2cf950c

…e there is at least one other active worker (XTLS#4869)" This reverts commit b065595.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reverse: portal-worker should not be closed before making sure there is at least one other active worker #4869

Reverse: portal-worker should not be closed before making sure there is at least one other active worker #4869

Uh oh!

patterniha commented Jul 7, 2025 •

edited

Loading

Uh oh!

Fangliding commented Jul 7, 2025 •

edited

Loading

Uh oh!

patterniha commented Jul 7, 2025 •

edited

Loading

Uh oh!

Fangliding commented Jul 7, 2025 •

edited

Loading

Uh oh!

patterniha commented Jul 7, 2025

Uh oh!

patterniha commented Jul 10, 2025 •

edited

Loading

Uh oh!

Fangliding commented Jul 10, 2025

Uh oh!

patterniha commented Jul 10, 2025

Uh oh!

Fangliding commented Jul 10, 2025

Uh oh!

patterniha commented Jul 10, 2025 •

edited

Loading

Uh oh!

Fangliding commented Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!

Reverse: portal-worker should not be closed before making sure there is at least one other active worker #4869

Reverse: portal-worker should not be closed before making sure there is at least one other active worker #4869

Uh oh!

Conversation

patterniha commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fangliding commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patterniha commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fangliding commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patterniha commented Jul 7, 2025

Uh oh!

patterniha commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fangliding commented Jul 10, 2025

Uh oh!

patterniha commented Jul 10, 2025

Uh oh!

Fangliding commented Jul 10, 2025

Uh oh!

patterniha commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fangliding commented Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!

patterniha commented Jul 7, 2025 •

edited

Loading

Fangliding commented Jul 7, 2025 •

edited

Loading

patterniha commented Jul 7, 2025 •

edited

Loading

Fangliding commented Jul 7, 2025 •

edited

Loading

patterniha commented Jul 10, 2025 •

edited

Loading

patterniha commented Jul 10, 2025 •

edited

Loading