Skip to content

Conversation

LeonMalteW
Copy link
Contributor

What does this PR do ?

add the missing config entry

Issues

List issues that this PR closes:
I saw no issue for this, but if I tried to run the GRPO on DeepScaler with standard config it couldn't run

  • [yes] Make sure you read and followed Contributor guidelines
  • [no] Did you write any new necessary tests?
  • [no] Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • [no ] Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Copy link
Contributor

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing! Could you please signoff your commit?

https://github.com/NVIDIA/NeMo-RL/blob/main/CONTRIBUTING.md#signing-your-work

Retroactively, you can rebase to do it and follow it up with a force push:

git rebase HEAD~1 --signoff
git push --force-with-lease origin main  # note this is your fork's main since that's the source branch

@SahilJain314
Copy link
Contributor

@terrykong I thought we were originally deriving this config from the common grpo config. Guess that changed after a while. Should we update this one to enable dynamic batching like the common one?

@terrykong
Copy link
Contributor

@SahilJain314 good point. we should go back to depending on the common.

i think for this case, @abukharin-nv didn't use dynamic batching in his recipes, so I think this PR is faithful to the original experiment. a follow up PR can enable dynamic batching just to make convergence is still good

@terrykong
Copy link
Contributor

terrykong commented May 28, 2025

Hi @LeonMalteW . If you're not able to resolve the DCO by this afternoon, I will create a new PR (and give you credit for the contribution) and merge this since this bug should be fixed ASAP

@LeonMalteW
Copy link
Contributor Author

By the way, I'm still trying to reproduce the group-deepscaler run.

At first, I thought there was only one thing wrong with the configuration.
But after the fix, I could only run the First Stage successfully.

When I ran the second stage with
max_total_sequence_length set to 16384
it led to lots of VRAM problems.

The only run I could perform was by reducing the GRPO batch size to 2 and adjusting the other configurations accordingly.

This would obviously lead to an increase in training time of 20x or more.
That's why I'm asking if there is a functioning configuration out there?

Maybe fixing the config was not the right approach, and there's something else that's wrong.

This is my "working config"

# GRPO Algorithm Configuration
defaults: "grpo-deepscaler-1.5b-8K.yaml"

grpo:
  num_prompts_per_step: 2 # orignial 128

loss_fn:
  reference_policy_kl_penalty: 0.001
  ratio_clip_max: 0.28


policy:
  max_total_sequence_length: 16384

  train_global_batch_size: 16 # orignial 64
  generation_batch_size: 16 # orignial 32
  logprob_batch_size: 1 # orignial 4

@terrykong
Copy link
Contributor

@LeonMalteW do you mind opening a new issue for your OOM and share your hardware requirements? It'll help us triage.

I'm going to close this PR in favor of #455 (I've given you credit for the contribution)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants