Skip to content

Conversation

terrykong
Copy link
Contributor

@terrykong terrykong commented May 18, 2025

image

Closes #309

Signed-off-by: Terry Kong <terryk@nvidia.com>
Copy link
Contributor

@SahilJain314 SahilJain314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're getting to a lot of 'knobs' in ray.sub. Maybe it's time to add a doc for it? It didn't seem obvious to me that envvars like HF_HOME and WANDB_API_KEY would get plumbed through ray.sub and now we're adding GPUS_PER_NODE and CPUS_PER_WORKER too.

@terrykong
Copy link
Contributor Author

I'll address the UV_CACHE_DIR in a follow up PR #426

terrykong added 2 commits May 20, 2025 23:06
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
SahilJain314
SahilJain314 previously approved these changes May 21, 2025
Copy link
Contributor

@jgerh jgerh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed the Tech Pubs review of docs/cluster.md and provided some copyedits and suggested text revisions. Comments added inline with the "add a suggestion" tool as well as line-by-line for read-only text.

Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong enabled auto-merge May 22, 2025 20:38
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong added this pull request to the merge queue May 22, 2025
@parthchadha parthchadha removed this pull request from the merge queue due to a manual request May 22, 2025
@SahilJain314 SahilJain314 added this pull request to the merge queue May 23, 2025
Merged via the queue into main with commit f9e45de May 23, 2025
13 of 14 checks passed
@SahilJain314 SahilJain314 deleted the tk/cpu-task branch May 23, 2025 04:43
YzjiaoNvd pushed a commit to YzjiaoNvd/NeMo-RL that referenced this pull request Jun 10, 2025
…A-NeMo#410)

Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

init_ray's runtime_env (with full os.environ) causes Ray runtime_env_agent to fail
4 participants