Enable vLLM for trl GRPO jobs #1760

wizeng23 · 2025-06-13T20:09:23Z

Description

Added vllm_mode param to GRPOParams which controls how to integrate with vLLM.
Delete deprecated vllm_device param (unused in our configs).
Enabled vLLM for both trl GRPO jobs. Note they don't use the newly added param as it hasn't been pushed to the PyPI yet.

Tested that the jobs work. The tldr sample job training time went from 2min to 30 seconds.

Towards OPE-1107

This PR only changes documentation. (You can ignore the following checks in that case)
Did you read the contributor guideline Pull Request guidelines?
Did you link the issue(s) related to this PR in the section above?
Did you add / update tests where needed?

wizeng23 added 3 commits June 11, 2025 16:35

merge main

d882cb4

save

e9fcd52

save

aa827a7

wizeng23 requested review from oelachqar and taenin June 13, 2025 20:09

wizeng23 added 3 commits June 13, 2025 13:09

merge main

8b2960a

save

b460433

merge main

41b83d6

oelachqar approved these changes Jun 13, 2025

View reviewed changes

taenin approved these changes Jun 13, 2025

View reviewed changes

wizeng23 merged commit ba3b248 into main Jun 13, 2025
6 checks passed

wizeng23 deleted the wizeng/o1107-trl-vllm branch June 13, 2025 22:54

penfever pushed a commit that referenced this pull request Aug 27, 2025

Enable vLLM for trl GRPO jobs (#1760)

6aab7be