Skip to content

Conversation

rynewang
Copy link
Contributor

@rynewang rynewang commented Sep 17, 2024

On ray job submit, the CLI tails logs from the job agent. The agent needs to read log tails from an iterator and yields to a WebSocket. However the file reading iterator is SYNC so it blocks agent event loop, causing the agent to block, making downstream consumers like KubeRay to break. Changes the log reading function file_tail_iterator to an AsyncIterator and a 1s time.sleep to asyncio.sleep to unblock.

The issue was introduced from #44658.

Fixes #47637.
Fixes ray-project/kuberay#2355

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@rynewang rynewang changed the title [core][dashboard] Change file_tail_iterator to async [core][dashboard] Change file_tail_iterator to async. Sep 17, 2024
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense to me, but I’m curious why Ray 2.9 works. I thought we’ve been using Iterator instead of AsyncIterator for years.

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@rynewang rynewang added the go add ONLY when ready to merge, run all tests label Sep 17, 2024
@rynewang
Copy link
Contributor Author

didn't take time to blame out the problematic pr...

@rynewang rynewang enabled auto-merge (squash) September 17, 2024 23:29
@github-actions github-actions bot disabled auto-merge September 18, 2024 01:13
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@rynewang
Copy link
Contributor Author

@kevin85421 so it's #44658 from 5 mo ago I think

@rynewang rynewang merged commit bc2b26e into ray-project:master Sep 18, 2024
4 of 5 checks passed
@rynewang rynewang deleted the async-file-iterator branch September 18, 2024 18:55
@shaowei-su
Copy link

shaowei-su commented Sep 19, 2024

Hi @rynewang @kevin85421 will this fix be included in the ray nightly release? thanks!
https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
)

On `ray job submit`, the CLI tails logs from the job agent. The agent
needs to read log tails from an iterator and yields to a WebSocket.
However the file reading iterator is SYNC so it blocks agent event loop,
causing the agent to block, making downstream consumers like KubeRay to
break. Changes the log reading function `file_tail_iterator` to an
AsyncIterator and a 1s `time.sleep` to `asyncio.sleep` to unblock.

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[core] Raylet health check on dashboard agent hangs if a job is created [Bug] Exec probes are causing high load on Ray pods
3 participants