fix: enable expandable segments for hopper+ #594

parthchadha · 2025-07-02T22:33:02Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Signed-off-by: Parth Chadha <pchadha@nvidia.com>

ashors1

Thank you for making this change! I think we actually want to detect compute capability when using megatron: https://github.com/NVIDIA-NeMo/RL/blob/main/nemo_rl/models/policy/megatron_policy_worker.py#L647-L651. @SahilJain314 , correct me if I'm wrong, but it's my understanding that we only hit issues with (A100 + expandable segments) when using Megatron

Signed-off-by: Parth Chadha <pchadha@nvidia.com>

parthchadha · 2025-07-02T23:57:21Z

Thank you for making this change! I think we actually want to detect compute capability when using megatron: https://github.com/NVIDIA-NeMo/RL/blob/main/nemo_rl/models/policy/megatron_policy_worker.py#L647-L651. @SahilJain314 , correct me if I'm wrong, but it's my understanding that we only hit issues with (A100 + expandable segments) when using Megatron

Made the change for both dtensor and megatron worker in the latest commit.

Signed-off-by: Parth Chadha <pchadha@nvidia.com>

Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: Parth Chadha <pchadha@nvidia.com>

fix: enable expandable segments for hopper+

c0ce2bc

Signed-off-by: Parth Chadha <pchadha@nvidia.com>

parthchadha requested review from SahilJain314 and ashors1 July 2, 2025 22:33

parthchadha added the CI:L0 Run doctests and unit tests label Jul 2, 2025

parthchadha temporarily deployed to nemo-ci July 2, 2025 22:34 — with GitHub Actions Inactive

ashors1 reviewed Jul 2, 2025

View reviewed changes

add runtime env for megatron worker

77cf050

Signed-off-by: Parth Chadha <pchadha@nvidia.com>

ashors1 approved these changes Jul 3, 2025

View reviewed changes

parthchadha added this pull request to the merge queue Jul 3, 2025

Merged via the queue into main with commit f4150bf Jul 3, 2025
13 of 14 checks passed

parthchadha deleted the pchadha/enabled-expandable-segments-hopper branch July 3, 2025 18:36

therealnaveenkamal pushed a commit to therealnaveenkamal/RL that referenced this pull request Jul 7, 2025

fix: enable expandable segments for hopper+ (NVIDIA-NeMo#594)

3b33987

Signed-off-by: Parth Chadha <pchadha@nvidia.com>

YzjiaoNvd pushed a commit to YzjiaoNvd/NeMo-RL that referenced this pull request Jul 14, 2025

fix: enable expandable segments for hopper+ (NVIDIA-NeMo#594)

ef205e7

Signed-off-by: Parth Chadha <pchadha@nvidia.com>

jialei777 pushed a commit to jialei777/nemo-rl that referenced this pull request Jul 23, 2025

fix: enable expandable segments for hopper+ (NVIDIA-NeMo#594)

9325915

Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

KiddoZhu pushed a commit that referenced this pull request Jul 28, 2025

fix: enable expandable segments for hopper+ (#594)

56fc9c6

Signed-off-by: Parth Chadha <pchadha@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: enable expandable segments for hopper+ #594

fix: enable expandable segments for hopper+ #594

Uh oh!

parthchadha commented Jul 2, 2025

Uh oh!

ashors1 left a comment

Uh oh!

parthchadha commented Jul 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

fix: enable expandable segments for hopper+ #594

fix: enable expandable segments for hopper+ #594

Uh oh!

Conversation

parthchadha commented Jul 2, 2025

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

ashors1 left a comment

Choose a reason for hiding this comment

Uh oh!

parthchadha commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

parthchadha commented Jul 2, 2025 •

edited

Loading