Skip to content

Conversation

wizeng23
Copy link
Contributor

@wizeng23 wizeng23 commented Jun 13, 2025

Description

  • Added vllm_mode param to GRPOParams which controls how to integrate with vLLM.
  • Delete deprecated vllm_device param (unused in our configs).
  • Enabled vLLM for both trl GRPO jobs. Note they don't use the newly added param as it hasn't been pushed to the PyPI yet.

Tested that the jobs work. The tldr sample job training time went from 2min to 30 seconds.

Related issues

Towards OPE-1107

Before submitting

  • This PR only changes documentation. (You can ignore the following checks in that case)
  • Did you read the contributor guideline Pull Request guidelines?
  • Did you link the issue(s) related to this PR in the section above?
  • Did you add / update tests where needed?

@wizeng23 wizeng23 requested review from oelachqar and taenin June 13, 2025 20:09
@wizeng23 wizeng23 merged commit ba3b248 into main Jun 13, 2025
6 checks passed
@wizeng23 wizeng23 deleted the wizeng/o1107-trl-vllm branch June 13, 2025 22:54
penfever pushed a commit that referenced this pull request Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants