-
Notifications
You must be signed in to change notification settings - Fork 129
add missing entry dynamic_batching and setting it to False #442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing! Could you please signoff your commit?
https://github.com/NVIDIA/NeMo-RL/blob/main/CONTRIBUTING.md#signing-your-work
Retroactively, you can rebase to do it and follow it up with a force push:
git rebase HEAD~1 --signoff
git push --force-with-lease origin main # note this is your fork's main since that's the source branch
@terrykong I thought we were originally deriving this config from the common grpo config. Guess that changed after a while. Should we update this one to enable dynamic batching like the common one? |
@SahilJain314 good point. we should go back to depending on the common. i think for this case, @abukharin-nv didn't use dynamic batching in his recipes, so I think this PR is faithful to the original experiment. a follow up PR can enable dynamic batching just to make convergence is still good |
Signed-off-by: lw86ruwo <Leon@wenderoth.de>
Hi @LeonMalteW . If you're not able to resolve the DCO by this afternoon, I will create a new PR (and give you credit for the contribution) and merge this since this bug should be fixed ASAP |
By the way, I'm still trying to reproduce the group-deepscaler run. At first, I thought there was only one thing wrong with the configuration. When I ran the second stage with The only run I could perform was by reducing the GRPO batch size to 2 and adjusting the other configurations accordingly. This would obviously lead to an increase in training time of 20x or more. Maybe fixing the config was not the right approach, and there's something else that's wrong. This is my "working config" # GRPO Algorithm Configuration
defaults: "grpo-deepscaler-1.5b-8K.yaml"
grpo:
num_prompts_per_step: 2 # orignial 128
loss_fn:
reference_policy_kl_penalty: 0.001
ratio_clip_max: 0.28
policy:
max_total_sequence_length: 16384
train_global_batch_size: 16 # orignial 64
generation_batch_size: 16 # orignial 32
logprob_batch_size: 1 # orignial 4 |
@LeonMalteW do you mind opening a new issue for your OOM and share your hardware requirements? It'll help us triage. I'm going to close this PR in favor of #455 (I've given you credit for the contribution) |
What does this PR do ?
add the missing config entry
Issues
List issues that this PR closes:
I saw no issue for this, but if I tried to run the GRPO on DeepScaler with standard config it couldn't run