fix: correct mcore dtype + assertion on activation_func #572

terrykong · 2025-06-27T16:19:30Z

This is to address a bug found when testing Nemotron-H where the params_dtype gets set incorrectly and add an assert to give a hint of what might have happened if activation_func is empty.

The issue was during conversion we use the default value of the optimizer.params_dtype as float32 here https://github.com/NVIDIA/NeMo/blob/bab66472d2f2eb05ab621dbad66ad6031e4ee19e/nemo/tron/converter/common.py#L217. But the params_dtype is supposed to be set by the bf16 and fp16 args according to how it's handled in mcore: https://github.com/NVIDIA/Megatron-LM/blob/1ab876ddc4c1893c76f26d775226a8d1dcdfb3d2/megatron/training/arguments.py#L676

This change just respects that logic.

Signed-off-by: Terry Kong <terryk@nvidia.com>

) Signed-off-by: Terry Kong <terryk@nvidia.com>

) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Xuehan <xxman@google.com>

) Signed-off-by: Terry Kong <terryk@nvidia.com>

Signed-off-by: Terry Kong <terryk@nvidia.com>

) Signed-off-by: Terry Kong <terryk@nvidia.com>

fix: correct mcore dtype + assertion on activation_func

fe2be4b

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong requested review from yfw, hemildesai and SahilJain314 June 27, 2025 16:19

parthchadha approved these changes Jun 27, 2025

View reviewed changes

terrykong added this pull request to the merge queue Jun 27, 2025

Merged via the queue into main with commit 8771995 Jun 28, 2025
13 of 14 checks passed

terrykong deleted the tk/fix-mcore-types branch June 28, 2025 00:58

xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jun 28, 2025

fix: correct mcore dtype + assertion on activation_func (NVIDIA-NeMo#572

f94e0cd

) Signed-off-by: Terry Kong <terryk@nvidia.com>

xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jun 30, 2025

fix: correct mcore dtype + assertion on activation_func (NVIDIA-NeMo#572

1b7292c

) Signed-off-by: Terry Kong <terryk@nvidia.com>

xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jun 30, 2025

fix: correct mcore dtype + assertion on activation_func (NVIDIA-NeMo#572

7f308aa

) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Xuehan <xxman@google.com>

xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jun 30, 2025

fix: correct mcore dtype + assertion on activation_func (NVIDIA-NeMo#572

d0dca5b

) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Xuehan <xxman@google.com>

therealnaveenkamal pushed a commit to therealnaveenkamal/RL that referenced this pull request Jul 7, 2025

fix: correct mcore dtype + assertion on activation_func (NVIDIA-NeMo#572

f0484d7

) Signed-off-by: Terry Kong <terryk@nvidia.com>

YzjiaoNvd pushed a commit to YzjiaoNvd/NeMo-RL that referenced this pull request Jul 14, 2025

fix: correct mcore dtype + assertion on activation_func (NVIDIA-NeMo#572

27bd144

) Signed-off-by: Terry Kong <terryk@nvidia.com>

KiddoZhu pushed a commit that referenced this pull request Jul 28, 2025

fix: correct mcore dtype + assertion on activation_func (#572)

039ef48

Signed-off-by: Terry Kong <terryk@nvidia.com>

FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025

fix: correct mcore dtype + assertion on activation_func (NVIDIA-NeMo#572

6c62c1a

) Signed-off-by: Terry Kong <terryk@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: correct mcore dtype + assertion on activation_func #572

fix: correct mcore dtype + assertion on activation_func #572

Uh oh!

terrykong commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!

fix: correct mcore dtype + assertion on activation_func #572

fix: correct mcore dtype + assertion on activation_func #572

Uh oh!

Conversation

terrykong commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!