-
Notifications
You must be signed in to change notification settings - Fork 902
NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL #2882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome feature and really well written blog post! I left a few nits, but nothing that should block you if you need to go fast - just wait for @pcuenca to approve
@@ -6087,3 +6087,17 @@ | |||
- nlp | |||
- tools | |||
- community | |||
|
|||
- local: vllm-colocate | |||
title: "NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it was intentional, but I'd use this for "NO"
title: "NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL" | |
title: "No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL" |
@@ -0,0 +1,384 @@ | |||
--- | |||
title: "NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title: "NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL" | |
title: "No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL" |
|
||
## 🚀 Introduction | ||
|
||
TRL supports training LLMs using GRPO, an online learning algorithm recently introduced in the *DeepSeekMath* paper. In GRPO, the model learns from its own outputs: it generates responses during training, receives feedback, and uses that feedback to improve itself over time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a link to the paper & perhaps share the diagram of the algorithm (either from the paper or @qgallouedec's example from the TRL docs)
|
||
## 🛠️ Implementation Notes | ||
|
||
Instead of launching vLLM as a server, the training now launches vLLM **in-process** using the external launcher, as shown below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe point to the bits of code in TRL where this comes from?
) | ||
``` | ||
|
||
> Note: Depending on the model size and the overall GPU memory requirements for training, you may need to adjust the vllm_gpu_memory_utilization parameter in `GRPOConfig` to avoid underutilization or out-of-memory errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a rough heuristic we can recommend for common model sizes like 7B / 14B / 32B / 70B?
Congratulations! You've made it this far! Once merged, the article will appear at https://huggingface.co/blog. Official articles
require additional reviews. Alternatively, you can write a community article following the process here.
Preparing the Article
You're not quite done yet, though. Please make sure to follow this process (as documented here):
md
file. You can also specifyguest
ororg
for the authors.Here is an example of a complete PR: #2382
Getting a Review
Please make sure to get a review from someone on your team or a co-author.
Once this is done and once all the steps above are completed, you should be able to merge.
There is no need for additional reviews if you and your co-authors are happy and meet all of the above.
Feel free to add @pcuenca as a reviewer if you want a final check. Keep in mind he'll be biased toward light reviews
(e.g., check for proper metadata) rather than content reviews unless explicitly asked.