feat: chunked logprob calculation with deferred fp32 cast to help with OOM #856

pjin-nvidia · 2025-08-06T20:23:40Z

No description provided.

Signed-off-by: Peter Jin <pjin@nvidia.com>

Based on NeMo commit: 8ddf4387344c6423763ec9ee0c9a755cbb5d8d35 Signed-off-by: Peter Jin <pjin@nvidia.com>

Signed-off-by: Peter Jin <pjin@nvidia.com>

nemo_rl/distributed/model_utils.py

nemo_rl/models/dtensor/parallelize.py

nemo_rl/tron/model.py

Signed-off-by: Peter Jin <pjin@nvidia.com>

terrykong · 2025-08-06T22:09:33Z

nemo_rl/tron/model.py

+from nemo.collections.llm.t5.model.t5 import T5Config
+
+
+def get_model_from_config_no_float32(


was this function copied from somewhere? if so, what changes were made?

it's a copy of nemo/tron/model.py:
https://github.com/NVIDIA/NeMo/blob/8ddf4387344c6423763ec9ee0c9a755cbb5d8d35/nemo/tron/model.py

the main change is removing the Float16Module wrapper (which is what originally casts the model logits output to float32):
https://github.com/NVIDIA-NeMo/RL/pull/856/files/a020289609cfa0d7a695a175eed009fdb4695088#diff-37539801eab6c58172c5cf85be33a1f9eac04c096a8e23170550ddf3bff8e3b3R125-R128

If it's only a one-line change, I'd prefer the change be reflected in the submodule (you can branch where the submodule is at to update)

also, if you expect the model coming back to not be a FP16 but something else, could you add a test asserting the model type? We're currently migrating away from tron, so once that is done, this test would ensure we don't miss this typing fix you're adding

updated NeMo submodule

branch: https://github.com/NVIDIA/NeMo/tree/pjin/nemorl-logprob
commit: NVIDIA-NeMo/NeMo@0bf0dbc

also, if you expect the model coming back to not be a FP16 but something else, could you add a test asserting the model type? We're currently migrating away from tron, so once that is done, this test would ensure we don't miss this typing fix you're adding

what I did is add a float32 dtype check to the existing megatron logprobs test, and running that test on more cases of (logprob chunk size, deferred float32 logits)
https://github.com/NVIDIA-NeMo/RL/pull/856/files#diff-9556cb57e37308923c54e7a6df8982afafef5e36544f350af3324db43f74bdbeR703

one thing is that the policy model worker mainly exposes the model output through get_logprobs, and there is not another interface for getting at the underlying torch model logits. but I think just checking that the returned logprobs are float32 should be sufficient?

terrykong · 2025-08-06T22:11:08Z

nemo_rl/distributed/model_utils.py

@@ -141,6 +143,123 @@ def backward(
        return grad_input, None, None, None, None, None, None


+class ChunkedDistributedLogprob(torch.autograd.Function):


can we add a unit test for this function so we make sure the non-chunked version equals the chunk?

added a chunk_size parameter to DistributedLogprobTestActor:

https://github.com/NVIDIA-NeMo/RL/pull/856/files#diff-13516af2fb4ffe4b66dd0cc8a1113bcd8334a1500f77ab0ce0f503bf529d8dc9

Signed-off-by: Peter Jin <pjin@nvidia.com>

nemo_rl/models/policy/megatron_policy_worker.py

Signed-off-by: Peter Jin <pjin@nvidia.com>

nemo_rl/distributed/model_utils.py

nemo_rl/models/policy/dtensor_policy_worker.py

Signed-off-by: Peter Jin <pjin@nvidia.com>

terrykong · 2025-08-12T18:20:33Z

examples/configs/recipes/llm/grpo-math-qwen3-30ba3b-megatron-tp4-32k.yaml

+    moe_router_bias_update_rate: 0.0 # by default, disable bias updates for grpo
+    apply_rope_fusion: True
+    activation_checkpointing: True
+    defer_fp32_logits: True


what would be the reason to set this to False?

mostly for strict backward compat, but we could instead enable it by default (i.e. make it an opt-out config like no_defer_fp32_logits or similar)

wdyt?

I see. How about the following:

this PR introduces it, default off

follow up PR where we run all our nightly tests to see if defaulting to true is ok, if so, remove the arg
wdyt? If the feature is broadly applicable we should probably switch it to true so no one else runs into the same issue (assuming no accuracy penalty)

yup, (1) and then (2) SGTM!

Signed-off-by: Peter Jin <pjin@nvidia.com>

terrykong · 2025-08-12T21:13:43Z

There's a permission issue with that Check submodule fast-forward job. @chtruong814 is taking a look

Signed-off-by: Peter Jin <pjin@nvidia.com>

pjin-nvidia · 2025-08-15T01:48:18Z

closing in favor of #918

pjin-nvidia marked this pull request as ready for review August 6, 2025 20:56

pjin-nvidia changed the title ~~Fix logprobs and logits-related OOM~~ fix: logprobs and logits-related OOM Aug 6, 2025

pjin-nvidia added 4 commits August 6, 2025 14:49

Port chunked logprob and deferred float32 logits (WIP).

a99900f

Signed-off-by: Peter Jin <pjin@nvidia.com>

Add copy of nemo.tron.model without logits float32 cast.

37a027a

Based on NeMo commit: 8ddf4387344c6423763ec9ee0c9a755cbb5d8d35 Signed-off-by: Peter Jin <pjin@nvidia.com>

Fix.

31707db

Signed-off-by: Peter Jin <pjin@nvidia.com>

Ruff + doc comment.

6a445bc

Signed-off-by: Peter Jin <pjin@nvidia.com>

pjin-nvidia force-pushed the pjin/logprob branch from 2a985fe to 6a445bc Compare August 6, 2025 21:50

wangshangsam reviewed Aug 6, 2025

View reviewed changes

nemo_rl/distributed/model_utils.py Show resolved Hide resolved

nemo_rl/distributed/model_utils.py Show resolved Hide resolved

nemo_rl/models/dtensor/parallelize.py Show resolved Hide resolved

nemo_rl/tron/model.py Outdated Show resolved Hide resolved

Configurable deferring float32 logits.

a020289

Signed-off-by: Peter Jin <pjin@nvidia.com>

terrykong reviewed Aug 6, 2025

View reviewed changes

Update docstrings.

956051c

Signed-off-by: Peter Jin <pjin@nvidia.com>

wangshangsam reviewed Aug 6, 2025

View reviewed changes

nemo_rl/models/policy/megatron_policy_worker.py Outdated Show resolved Hide resolved

pjin-nvidia added 5 commits August 6, 2025 15:22

Ruff.

ece8049

Signed-off-by: Peter Jin <pjin@nvidia.com>

Basic chunking support in logprobs computation with sequence packing.

670743f

Signed-off-by: Peter Jin <pjin@nvidia.com>

Merge remote-tracking branch 'origin/main' into pjin/logprob

4dc2aca

Signed-off-by: Peter Jin <pjin@nvidia.com>

Unit test for chunked logprobs.

f1a9d21

Signed-off-by: Peter Jin <pjin@nvidia.com>

Ruff.

df70715

Signed-off-by: Peter Jin <pjin@nvidia.com>

pjin-nvidia force-pushed the pjin/logprob branch from 58a202e to df70715 Compare August 6, 2025 23:14

pjin-nvidia added 11 commits August 6, 2025 16:20

Pyrefly.

637e131

Signed-off-by: Peter Jin <pjin@nvidia.com>

Fix test. Pyrefly.

abbf796

Signed-off-by: Peter Jin <pjin@nvidia.com>

Ruff.

985ba77

Signed-off-by: Peter Jin <pjin@nvidia.com>

Stale comment.

13265aa

Signed-off-by: Peter Jin <pjin@nvidia.com>

Merge remote-tracking branch 'origin/main' into pjin/logprob

e0b8da0

Signed-off-by: Peter Jin <pjin@nvidia.com>

Remove unused config.

9758b14

Signed-off-by: Peter Jin <pjin@nvidia.com>

Remove unused config.

ea51715

Signed-off-by: Peter Jin <pjin@nvidia.com>

Also apply to the reference model.

1670f93

Signed-off-by: Peter Jin <pjin@nvidia.com>

Merge remote-tracking branch 'origin/main' into pjin/logprob

584d8e0

Signed-off-by: Peter Jin <pjin@nvidia.com>

Merge remote-tracking branch 'origin/main' into pjin/logprob

e958d40

Signed-off-by: Peter Jin <pjin@nvidia.com>

Typed policy configs.

8aef5ed

Signed-off-by: Peter Jin <pjin@nvidia.com>

pjin-nvidia requested a review from wangshangsam August 11, 2025 20:42

wangshangsam requested a review from soodoshll August 11, 2025 20:50

pjin-nvidia added 2 commits August 11, 2025 13:57

Merge remote-tracking branch 'origin/main' into pjin/logprob

194eef6

Signed-off-by: Peter Jin <pjin@nvidia.com>

Merge remote-tracking branch 'origin/main' into pjin/logprob

6865c1a

Signed-off-by: Peter Jin <pjin@nvidia.com>

soodoshll reviewed Aug 12, 2025

View reviewed changes

nemo_rl/distributed/model_utils.py Outdated Show resolved Hide resolved

nemo_rl/models/policy/dtensor_policy_worker.py Show resolved Hide resolved

pjin-nvidia added 3 commits August 12, 2025 10:44

Merge remote-tracking branch 'origin/main' into pjin/logprob

7100474

Signed-off-by: Peter Jin <pjin@nvidia.com>

Lint and minor refactor.

da2f305

Signed-off-by: Peter Jin <pjin@nvidia.com>

Fix.

cd5b02a

Signed-off-by: Peter Jin <pjin@nvidia.com>

terrykong reviewed Aug 12, 2025

View reviewed changes

Unnecessary clone.

81fb8e1

Signed-off-by: Peter Jin <pjin@nvidia.com>

terrykong changed the title ~~fix: logprobs and logits-related OOM~~ feat: chunked logprob calculation with deferred fp32 cast to help with OOM Aug 12, 2025

pjin-nvidia force-pushed the pjin/logprob branch from 1c8186a to 81fb8e1 Compare August 12, 2025 20:28

Remove clone + exp_ with just exp.

ef9d3d5

Signed-off-by: Peter Jin <pjin@nvidia.com>

Merge remote-tracking branch 'origin/main' into pjin/logprob

b66497b

Signed-off-by: Peter Jin <pjin@nvidia.com>

chtruong814 added the CI:L1 Run doctests, unit tests, and functional tests label Aug 13, 2025

chtruong814 temporarily deployed to nemo-ci August 13, 2025 16:54 — with GitHub Actions Inactive

chtruong814 temporarily deployed to nemo-ci August 13, 2025 16:55 — with GitHub Actions Inactive

chtruong814 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Aug 13, 2025

chtruong814 temporarily deployed to nemo-ci August 13, 2025 17:11 — with GitHub Actions Inactive

chtruong814 temporarily deployed to nemo-ci August 13, 2025 17:18 — with GitHub Actions Inactive

Set HF_HUB_OFFLINE=1 for github CI.

3d38161

Signed-off-by: Peter Jin <pjin@nvidia.com>

github-actions bot added the CI Relating to CI label Aug 13, 2025

pjin-nvidia mentioned this pull request Aug 13, 2025

feat: chunked logprob calculation with deferred fp32 cast to help with OOM #918

Merged

wangshangsam temporarily deployed to nemo-ci August 13, 2025 19:46 — with GitHub Actions Inactive

wangshangsam temporarily deployed to nemo-ci August 13, 2025 20:19 — with GitHub Actions Inactive

Fix test.

0f0de7d

Signed-off-by: Peter Jin <pjin@nvidia.com>

pjin-nvidia closed this Aug 15, 2025

		from nemo.collections.llm.t5.model.t5 import T5Config


		def get_model_from_config_no_float32(

		@@ -141,6 +143,123 @@ def backward(
		return grad_input, None, None, None, None, None, None


		class ChunkedDistributedLogprob(torch.autograd.Function):

feat: chunked logprob calculation with deferred fp32 cast to help with OOM #856

feat: chunked logprob calculation with deferred fp32 cast to help with OOM #856

Uh oh!

Conversation

pjin-nvidia commented Aug 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pjin-nvidia Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

terrykong commented Aug 12, 2025

Uh oh!

pjin-nvidia commented Aug 15, 2025

Uh oh!

Uh oh!

pjin-nvidia Aug 12, 2025 •

edited

Loading