Skip to content

Conversation

lachlan-roberts
Copy link
Contributor

@lachlan-roberts lachlan-roberts commented Jun 27, 2025

After PR #13264 there is a new race condition that can prevent a WebSocketConnection from completing.

2025-06-30 16:19:10.123:WARN :oejut.QueuedThreadPool:qtp-LocalServer-827: Job failed
java.lang.IllegalStateException: FILLING_AND_PARSING
	at org.eclipse.jetty.websocket.core.WebSocketConnection.fillAndParse(WebSocketConnection.java:489)
	at org.eclipse.jetty.websocket.core.WebSocketConnection.run(WebSocketConnection.java:389)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:981)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1211)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1166)
	at java.base/java.lang.Thread.run(Thread.java:840)

This stacktrace can be seen in jetty-12.0.x HEAD on build job for JDK17 and JDK21 (doesn't happen in JDK24) -

Signed-off-by: Lachlan Roberts <lachlan.p.roberts@gmail.com>
Copy link
Contributor

@gregw gregw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lachlan-roberts This failure looks like more than a flake:

2025-06-29 20:48:43.421:WARN :oejuts.AdaptiveExecutionStrategy:WebSocket@2039328294-4500: Task run failed
java.lang.IllegalStateException: FILLING_AND_PARSING
	at org.eclipse.jetty.websocket.core.WebSocketConnection.fillAndParse(WebSocketConnection.java:480)
	at org.eclipse.jetty.websocket.core.WebSocketConnection.onFillable(WebSocketConnection.java:372)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:480)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:443)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:293)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:201)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:311)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:981)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1211)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1166)
	at java.base/java.lang.Thread.run(Thread.java:1583)

…g state

Signed-off-by: Lachlan Roberts <lachlan.p.roberts@gmail.com>
@lachlan-roberts
Copy link
Contributor Author

@gregw I cannot see that failure in the CI results. But that would indicate an issue with the original PR #13264 not with this one.

It is because we are registering as fillInterested inside the FILLING_AND_PARSING state, so there is a race if onFillable() is called before we transition back to IDLE state.

So I have added another commit to this PR to fix this.

@joakime
Copy link
Contributor

joakime commented Jul 1, 2025

@lachlan-roberts this missed the 12.0.23 release.
It isn't a a blocker issue.
No tests on our side are failing, nor are the cometd / spring-framework / spring-boot tests of the existing 12.0.23 as well.
It's going in 12.0.24.

@lachlan-roberts
Copy link
Contributor Author

@joakime the issue is a race condition, so the tests just got lucky and didn't trigger it.

I really don't think we should release 12.0.23 without this fix.

@joakime
Copy link
Contributor

joakime commented Jul 1, 2025

@joakime the issue is a race condition, so the tests just got lucky and didn't trigger it.

I really don't think we should release 12.0.23 without this fix.

There is no test case for this supposed race condition.
There is no reproduction case for this supposed race condition.
This PR is insufficient to sneak into 12.0.23 release (which is already 6 days late)

@joakime joakime changed the title remove unnecessary locks around the WebSocketConnection networkBuffer Race Condition in WebSocketConnection - Job failed: java.lang.IllegalStateException: FILLING_AND_PARSING Jul 1, 2025
@joakime joakime added High Priority Bug For general bugs on Jetty side labels Jul 1, 2025
@joakime joakime moved this to 👀 In review in Jetty 12.0.23 - FROZEN Jul 1, 2025
@joakime
Copy link
Contributor

joakime commented Jul 1, 2025

I've been able to confirm that the mentioned stacktrace exists in the jetty-12.0.x HEAD builds right now.

2025-06-30 16:37:45.140:WARN :oejuts.AdaptiveExecutionStrategy:qtp83786308-661: Task run failed
java.lang.IllegalStateException: FILLING_AND_PARSING
	at org.eclipse.jetty.websocket.core.WebSocketConnection.fillAndParse(WebSocketConnection.java:489)
	at org.eclipse.jetty.websocket.core.WebSocketConnection.onFillable(WebSocketConnection.java:381)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:480)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:443)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:293)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:201)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:311)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:981)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1211)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1166)
	at java.base/java.lang.Thread.run(Thread.java:1583)
2025-06-30 16:37:45.206:WARN :oejut.QueuedThreadPool:WebSocket@1004570941-2013: Job failed
java.lang.IllegalStateException: FILLING_AND_PARSING
	at org.eclipse.jetty.websocket.core.WebSocketConnection.fillAndParse(WebSocketConnection.java:489)
	at org.eclipse.jetty.websocket.core.WebSocketConnection.onFillable(WebSocketConnection.java:381)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:981)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1211)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1166)
	at java.base/java.lang.Thread.run(Thread.java:1583)

Signed-off-by: Lachlan Roberts <lachlan.p.roberts@gmail.com>
@joakime
Copy link
Contributor

joakime commented Jul 1, 2025

Stack is still present in JDK17 build.

https://jenkins.webtide.net/blue/organizations/jenkins/jetty.project/detail/PR-13299/4/pipeline/12

2025-07-01 16:03:39.292:WARN :oejut.QueuedThreadPool:WebSocket@1389194219-442: Job failed
java.lang.IllegalStateException: FILLING_AND_PARSING
	at org.eclipse.jetty.websocket.core.WebSocketConnection.fillAndParse(WebSocketConnection.java:477)
	at org.eclipse.jetty.websocket.core.WebSocketConnection.run(WebSocketConnection.java:380)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:981)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1211)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1166)
	at java.base/java.lang.Thread.run(Thread.java:840)

@joakime joakime mentioned this pull request Jul 1, 2025
50 tasks
Copy link
Contributor

@gregw gregw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. But see comment about an enhancement for another PR

Comment on lines +346 to +347
networkBuffer.release();
networkBuffer = newNetworkBuffer(getInputBufferSize());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lachlan-roberts not for this PR, but we should do the same as HTTP/1 and reuse the existing buffer, even if retained, if there is sufficient space in it to do something useful. See org.eclipse.jetty.server.internal.HttpConnection#parseAndFillForContent

Copy link
Contributor

@sbordet sbordet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition still present when demand() is called from application code, and the parser thread has not hit the finally block in filllAndParse().

Copy link
Contributor

@gregw gregw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lachlan-roberts, Simone briefed me on the remaining race condition. I will look today. Ping me if you are online at all

gregw added 2 commits July 2, 2025 09:24
This resolves the race between one thread asking for more f&p and another exiting the state.

Will investigate an alternative using ICB now.
This resolves the race between one thread asking for more f&p and another exiting the state.

Will investigate an alternative using ICB now.
@gregw
Copy link
Contributor

gregw commented Jul 2, 2025

@lachlan-roberts @sbordet I tried converting the fillAndParse into an IteratingCallback. It was surprisingly mostly successful, and most of the tests pass. But then a few tests fail in strange ways (missing bytes from the middle of message etc), so it is not as simple as it looks. I think replacing executes with iterates is a big change in the threading model and we should not consider it for this release cycle. Instead I think the fix as it is now should go to the release and @lachlan-roberts can look at making it an ICB after.

Signed-off-by: Lachlan Roberts <lachlan.p.roberts@gmail.com>
Signed-off-by: Lachlan Roberts <lachlan.p.roberts@gmail.com>
@joakime
Copy link
Contributor

joakime commented Jul 2, 2025

If the build is green, and you are happy with these changes, do merge this.
If you cannot, say so (and why you cannot).

@joakime
Copy link
Contributor

joakime commented Jul 2, 2025

@gregw I think Lachlan will need and approval to merge

@lachlan-roberts lachlan-roberts requested review from gregw and sbordet July 2, 2025 03:18
@sbordet sbordet merged commit 5ac211c into jetty-12.0.x Jul 2, 2025
10 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Jetty 12.0.23 - FROZEN Jul 2, 2025
@sbordet sbordet deleted the jetty-12.0.x-WebSocketConnection-NetworkBufferLocking branch July 2, 2025 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side High Priority
Projects
No open projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

4 participants