-
Notifications
You must be signed in to change notification settings - Fork 130
feat: GRPO + SFT Dtensor support for multimodal training #712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
❌ Submodule Fast-Forward Check FailedCheck based on commit: cc2986f (PR #712 from ❌ Submodules that need attention:Megatron-LM: ❌ PR branch is BEHIND main branch NeMo: ❌ PR branch is BEHIND main branch Please ensure all submodule commits are fast-forwards of the main branch before merging. |
❌ Submodule Fast-Forward Check FailedCheck based on commit: 919a7ce (PR #712 from ❌ Submodules that need attention:Megatron-LM: ❌ PR branch is BEHIND main branch NeMo: ❌ PR branch is BEHIND main branch Please ensure all submodule commits are fast-forwards of the main branch before merging. |
copying over the last message from @rohitrango from #655 re: Remaining blockers:
I prefer handling this issue in a separate PR (and merging an initial support first) for
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: 80d9ff5 (PR #712 from ❌ Submodules that need attention:Megatron-LM: ❌ PR branch is BEHIND main branch NeMo: ❌ PR branch is BEHIND main branch Please ensure all submodule commits are fast-forwards of the main branch before merging. |
@rohitrango My understanding is that currently the logprob issue may be from input processing not matching inside vllm vs outside. The excerpt you shared is related to sampling, so I think it still remains to be seen whether this is a bug or expected |
As far as keeping up with changes from |
re: keeping up with changes from This basically means I have to debug a working GRPO/SFT training loop every 2 days after merging from main. The multimodal test cases are expected to block all such changes. |
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: rohitrango <rohit.rango@gmail.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
…#712) Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: rohitrango <rohit.rango@gmail.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Qidong Su <qidongs@nvidia.com>
What does this PR do ?
Adds image / video VLM support for supervised finetuning and GRPO using
dtensor
policy. Solves #85Tested models:
Tested datasets:
🔪 Sharp Edges
Although training runs converge, logprob error between vllm and hf model is higher than 1.05 consistently. Issue tracked in #793 .
Edit: Only in Gemma3. logprob issue is fixed in Llava, SmolVLM, Qwen2, 2.5VL
Usage
Before your PR is "Ready for review"
Pre checks: