Skip to content

Conversation

parthchadha
Copy link
Contributor

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Signed-off-by: Parth Chadha <pchadha@nvidia.com>
@parthchadha parthchadha added the CI:L0 Run doctests and unit tests label Jul 2, 2025
Copy link
Contributor

@ashors1 ashors1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making this change! I think we actually want to detect compute capability when using megatron: https://github.com/NVIDIA-NeMo/RL/blob/main/nemo_rl/models/policy/megatron_policy_worker.py#L647-L651. @SahilJain314 , correct me if I'm wrong, but it's my understanding that we only hit issues with (A100 + expandable segments) when using Megatron

Signed-off-by: Parth Chadha <pchadha@nvidia.com>
@parthchadha
Copy link
Contributor Author

parthchadha commented Jul 2, 2025

Thank you for making this change! I think we actually want to detect compute capability when using megatron: https://github.com/NVIDIA-NeMo/RL/blob/main/nemo_rl/models/policy/megatron_policy_worker.py#L647-L651. @SahilJain314 , correct me if I'm wrong, but it's my understanding that we only hit issues with (A100 + expandable segments) when using Megatron

Made the change for both dtensor and megatron worker in the latest commit.

@parthchadha parthchadha added this pull request to the merge queue Jul 3, 2025
Merged via the queue into main with commit f4150bf Jul 3, 2025
13 of 14 checks passed
@parthchadha parthchadha deleted the pchadha/enabled-expandable-segments-hopper branch July 3, 2025 18:36
therealnaveenkamal pushed a commit to therealnaveenkamal/RL that referenced this pull request Jul 7, 2025
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
YzjiaoNvd pushed a commit to YzjiaoNvd/NeMo-RL that referenced this pull request Jul 14, 2025
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
jialei777 pushed a commit to jialei777/nemo-rl that referenced this pull request Jul 23, 2025
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Jialei Chen <jialeic@google.com>
KiddoZhu pushed a commit that referenced this pull request Jul 28, 2025
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI:L0 Run doctests and unit tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants