feat: make torch index explicit to support grace-hopper/GH200/aarch64 #533

terrykong · 2025-06-21T07:10:16Z

Partially addresses #532

With this change, loops like SFT/DPO that only need torch (and not vllm), should work.

GRPO still needs more work since vllm needs to be manually compiled for aarch64 since they do not publish aarch64 wheels

Perf/convergence don't look to be affected by manually specifying this index, which pulls nvidia lib wheels for 12.8 instead of 12.6 (which appeared to be the default before specifying the index)

Signed-off-by: Terry Kong <terryk@nvidia.com>

…NVIDIA-NeMo#533) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Xuehan Xiong <xxman@google.com>

pramodk · 2025-07-04T09:57:55Z

GRPO still needs more work since vllm needs to be manually compiled for aarch64 since they do not publish aarch64 wheels

Hello @terrykong! To support GRPO on GH, is the vllm's aarch64 wheel is the only blocker here?

Just want to understand if it's only wheel availability or there is something more. We want to test this internally (nvidia) and hence want to clarify.

…NVIDIA-NeMo#533) Signed-off-by: Terry Kong <terryk@nvidia.com>

…#533) Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong added 7 commits June 21, 2025 06:39

feat: add explicit torch index

0017670

Signed-off-by: Terry Kong <terryk@nvidia.com>

switch to 12.6 which seemed like it was default

fe7b979

Signed-off-by: Terry Kong <terryk@nvidia.com>

give 12.8 a try

8f40fc5

Signed-off-by: Terry Kong <terryk@nvidia.com>

go back to 126

b71d40d

Signed-off-by: Terry Kong <terryk@nvidia.com>

try splitting it out by arch

cdeaa4e

Signed-off-by: Terry Kong <terryk@nvidia.com>

try triton coming from torch index

60c828e

Signed-off-by: Terry Kong <terryk@nvidia.com>

try this

8f6b5ff

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong mentioned this pull request Jun 21, 2025

Install on GH200 - Torch not compiled with CUDA enabled #532

Open

terrykong added 4 commits June 24, 2025 07:05

try just using index now

9344066

Signed-off-by: Terry Kong <terryk@nvidia.com>

fix

5f16e62

Signed-off-by: Terry Kong <terryk@nvidia.com>

try 128

9d40c57

Signed-off-by: Terry Kong <terryk@nvidia.com>

upgrade

4eed5d5

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong marked this pull request as ready for review June 24, 2025 16:32

terrykong requested review from chtruong814, parthchadha and SahilJain314 June 24, 2025 16:32

parthchadha approved these changes Jun 24, 2025

View reviewed changes

terrykong added this pull request to the merge queue Jun 24, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 24, 2025

terrykong added this pull request to the merge queue Jun 24, 2025

Merged via the queue into main with commit 3c75fd0 Jun 25, 2025
19 of 20 checks passed

terrykong deleted the tk/explicit-torch-index branch June 25, 2025 05:11

therealnaveenkamal pushed a commit to therealnaveenkamal/RL that referenced this pull request Jul 7, 2025

feat: make torch index explicit to support grace-hopper/GH200/aarch64 (…

8ddf72e

…NVIDIA-NeMo#533) Signed-off-by: Terry Kong <terryk@nvidia.com>

YzjiaoNvd pushed a commit to YzjiaoNvd/NeMo-RL that referenced this pull request Jul 14, 2025

feat: make torch index explicit to support grace-hopper/GH200/aarch64 (…

eba9220

…NVIDIA-NeMo#533) Signed-off-by: Terry Kong <terryk@nvidia.com>

KiddoZhu pushed a commit that referenced this pull request Jul 28, 2025

feat: make torch index explicit to support grace-hopper/GH200/aarch64 (…

d5b573f

…#533) Signed-off-by: Terry Kong <terryk@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: make torch index explicit to support grace-hopper/GH200/aarch64 #533

feat: make torch index explicit to support grace-hopper/GH200/aarch64 #533

Uh oh!

terrykong commented Jun 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

pramodk commented Jul 4, 2025

Uh oh!

Uh oh!

feat: make torch index explicit to support grace-hopper/GH200/aarch64 #533

feat: make torch index explicit to support grace-hopper/GH200/aarch64 #533

Uh oh!

Conversation

terrykong commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pramodk commented Jul 4, 2025

Uh oh!

Uh oh!

terrykong commented Jun 21, 2025 •

edited

Loading