Skip to content

Conversation

SahilJain314
Copy link
Contributor

bring mcore training up to dtensor with no explicit refit buffer size.

Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
@github-actions github-actions bot added the CI Relating to CI label Jun 17, 2025
@SahilJain314 SahilJain314 changed the title fix: Remove explicit refit buffer sizing for megatron<-> vllm refit fix: Remove explicit refit buffer sizing for megatron<-> vllm refit and added functional MCore grpo test Jun 17, 2025
@SahilJain314 SahilJain314 changed the title fix: Remove explicit refit buffer sizing for megatron<-> vllm refit and added functional MCore grpo test fix: Mcore: remove explicit refit buffer sizing and added functional grpo test Jun 17, 2025
@SahilJain314 SahilJain314 requested a review from parthchadha June 17, 2025 23:25
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jun 17, 2025
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
@SahilJain314 SahilJain314 requested a review from parthchadha June 17, 2025 23:45
parthchadha
parthchadha previously approved these changes Jun 17, 2025
Base automatically changed from sahilj/mypy2 to main June 18, 2025 08:21
@terrykong terrykong dismissed parthchadha’s stale review June 18, 2025 08:21

The base branch was changed.

Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
@SahilJain314 SahilJain314 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jun 26, 2025
@SahilJain314 SahilJain314 requested a review from parthchadha June 26, 2025 22:46
@SahilJain314 SahilJain314 changed the title fix: Mcore: remove explicit refit buffer sizing and added functional grpo test fix: Mcore: Added functional grpo test and typing fixes Jul 11, 2025
@terrykong terrykong added this pull request to the merge queue Jul 11, 2025
Merged via the queue into main with commit c3860de Jul 11, 2025
15 of 16 checks passed
@terrykong terrykong deleted the sahilj/buf_size_free branch July 11, 2025 18:28
ZhiyuLi-Nvidia pushed a commit that referenced this pull request Jul 21, 2025
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
jialei777 pushed a commit to jialei777/nemo-rl that referenced this pull request Jul 23, 2025
)

Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Signed-off-by: Jialei Chen <jialeic@google.com>
KiddoZhu pushed a commit that referenced this pull request Jul 28, 2025
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jul 30, 2025
)

Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025
)

Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025
)

Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Relating to CI documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants