👐 FSDP2+GRPO #3687

SalmanMohammadi · 2025-07-03T15:25:21Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

winglian · 2025-07-15T19:56:22Z

trl/trainer/grpo_trainer.py

+            elif self.vllm_mode == "colocate":
+                llm_model = self.llm.llm_engine.model_executor.driver_worker.model_runner.model
+                llm_model.load_weights([(name, param)])
+


Does this get duplicated and sent twice with the now L934-938?

I think you're right - this should be the only logic for syncing params, and I believe we'll be dropping support for FSDP1, right? @kashif

yes we are dropping support for FSDP1

qgallouedec

This won't work for peft, but we can do that for the next patch release

HuggingFaceDocBuilderDev · 2025-07-29T03:20:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

fabianlim · 2025-07-29T13:45:34Z

@qgallouedec one possible issue I can see with calling state_dict is that it wont scale for large models? Have we tried this for large models like 70b? cc: @toslali-ibm

SalmanMohammadi · 2025-07-29T13:56:45Z

@qgallouedec one possible issue I can see with calling state_dict is that it wont scale for large models? Have we tried this for large models like 70b? cc: @toslali-ibm

In FSDP2, state_dict doesn't gather the full state dict, this only happens per-parameter when you convert the parameter's DTensor to a full local tensor.

fabianlim · 2025-07-29T14:24:29Z

@SalmanMohammadi I see thanks for clarifying. So you are saying that the gathering only happens when you do param = param.full_tensor()

Kirill-Kravtsov · 2025-08-06T09:13:08Z

Doesn't the prepare_fsdp function require more changes, since FSDP2 has a completely different API compared to FSDP1?

SalmanMohammadi and others added 5 commits July 3, 2025 15:02

init

b3e272d

reverting

30a8a4a

Merge branch 'main' into grpo_fsdp2

7fc33de

Merge branch 'main' into grpo_fsdp2

44594cf

Merge branch 'main' into grpo_fsdp2

a88c6e6

winglian reviewed Jul 15, 2025

View reviewed changes

qgallouedec and others added 4 commits July 15, 2025 19:03

Merge branch 'main' into grpo_fsdp2

184c7b5

Merge branch 'main' into grpo_fsdp2

17c3da9

Merge branch 'main' into grpo_fsdp2

bec1806

we need the removed import

c142904

kashif approved these changes Jul 28, 2025

View reviewed changes

kashif mentioned this pull request Jul 28, 2025

Support FSDP2 in GRPOTrainer #3670

Closed

kashif and others added 4 commits July 28, 2025 16:51

do not run fsdp 1 for fsdp 2

0c401eb

Merge branch 'main' into grpo_fsdp2

8b0d62d

🔧 Fix model dtype conversion in GRPOTrainer's input preparation

c3665f4

simplified and style

6b5bb18

qgallouedec approved these changes Jul 29, 2025

View reviewed changes

qgallouedec changed the title ~~FSDP2+GRPO~~ 👐 FSDP2+GRPO Jul 29, 2025

qgallouedec merged commit 5522cc0 into huggingface:main Jul 29, 2025
10 checks passed

SalmanMohammadi deleted the grpo_fsdp2 branch July 29, 2025 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

👐 FSDP2+GRPO #3687

👐 FSDP2+GRPO #3687

Uh oh!

SalmanMohammadi commented Jul 3, 2025

Uh oh!

winglian Jul 15, 2025

Uh oh!

SalmanMohammadi Jul 16, 2025 •

edited

Loading

Uh oh!

kashif Jul 27, 2025

Uh oh!

qgallouedec left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Jul 29, 2025

Uh oh!

Uh oh!

fabianlim commented Jul 29, 2025

Uh oh!

SalmanMohammadi commented Jul 29, 2025

Uh oh!

fabianlim commented Jul 29, 2025

Uh oh!

Kirill-Kravtsov commented Aug 6, 2025

Uh oh!

Uh oh!

👐 FSDP2+GRPO #3687

👐 FSDP2+GRPO #3687

Uh oh!

Conversation

SalmanMohammadi commented Jul 3, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

winglian Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

SalmanMohammadi Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kashif Jul 27, 2025

Choose a reason for hiding this comment

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jul 29, 2025

Uh oh!

Uh oh!

fabianlim commented Jul 29, 2025

Uh oh!

SalmanMohammadi commented Jul 29, 2025

Uh oh!

fabianlim commented Jul 29, 2025

Uh oh!

Kirill-Kravtsov commented Aug 6, 2025

Uh oh!

Uh oh!

SalmanMohammadi Jul 16, 2025 •

edited

Loading