Skip to content

Conversation

SalmanMohammadi
Copy link
Contributor

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

elif self.vllm_mode == "colocate":
llm_model = self.llm.llm_engine.model_executor.driver_worker.model_runner.model
llm_model.load_weights([(name, param)])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this get duplicated and sent twice with the now L934-938?

Copy link
Contributor Author

@SalmanMohammadi SalmanMohammadi Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right - this should be the only logic for syncing params, and I believe we'll be dropping support for FSDP1, right? @kashif

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we are dropping support for FSDP1

Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work for peft, but we can do that for the next patch release

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec qgallouedec changed the title FSDP2+GRPO 👐 FSDP2+GRPO Jul 29, 2025
@qgallouedec qgallouedec merged commit 5522cc0 into huggingface:main Jul 29, 2025
10 checks passed
marcandrelarochelle pushed a commit to marcandrelarochelle/trl that referenced this pull request Jul 29, 2025
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
@fabianlim
Copy link
Contributor

@qgallouedec one possible issue I can see with calling state_dict is that it wont scale for large models? Have we tried this for large models like 70b? cc: @toslali-ibm

@SalmanMohammadi
Copy link
Contributor Author

@qgallouedec one possible issue I can see with calling state_dict is that it wont scale for large models? Have we tried this for large models like 70b? cc: @toslali-ibm

In FSDP2, state_dict doesn't gather the full state dict, this only happens per-parameter when you convert the parameter's DTensor to a full local tensor.

@SalmanMohammadi SalmanMohammadi deleted the grpo_fsdp2 branch July 29, 2025 14:03
@fabianlim
Copy link
Contributor

@SalmanMohammadi I see thanks for clarifying. So you are saying that the gathering only happens when you do param = param.full_tensor()

@Kirill-Kravtsov
Copy link
Contributor

Doesn't the prepare_fsdp function require more changes, since FSDP2 has a completely different API compared to FSDP1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants