Skip to content

NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL #2882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 3, 2025

Conversation

qgallouedec
Copy link
Member

@qgallouedec qgallouedec commented Jun 2, 2025

Congratulations! You've made it this far! Once merged, the article will appear at https://huggingface.co/blog. Official articles
require additional reviews. Alternatively, you can write a community article following the process here.

Preparing the Article

You're not quite done yet, though. Please make sure to follow this process (as documented here):

  • Add an entry to _blog.yml.
  • Add a thumbnail. There are no requirements here, but there is a template if it's helpful.
  • Check you use a short title and blog path.
  • Upload any additional assets (such as images) to the Documentation Images repo. This is to reduce bloat in the GitHub base repo when cloning and pulling. Try to have small images to avoid a slow or expensive user experience.
  • Add metadata (such as authors) to your md file. You can also specify guest or org for the authors.
  • Ensure the publication date is correct.
  • Preview the content. A quick way is to paste the markdown content in https://huggingface.co/new-blog. Do not click publish, this is just a way to do an early check.

Here is an example of a complete PR: #2382

Getting a Review

Please make sure to get a review from someone on your team or a co-author.
Once this is done and once all the steps above are completed, you should be able to merge.
There is no need for additional reviews if you and your co-authors are happy and meet all of the above.

Feel free to add @pcuenca as a reviewer if you want a final check. Keep in mind he'll be biased toward light reviews
(e.g., check for proper metadata) rather than content reviews unless explicitly asked.

@qgallouedec qgallouedec marked this pull request as ready for review June 2, 2025 23:25
@qgallouedec qgallouedec requested a review from pcuenca June 3, 2025 06:25
Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome feature and really well written blog post! I left a few nits, but nothing that should block you if you need to go fast - just wait for @pcuenca to approve

@@ -6087,3 +6087,17 @@
- nlp
- tools
- community

- local: vllm-colocate
title: "NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it was intentional, but I'd use this for "NO"

Suggested change
title: "NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL"
title: "No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL"

@@ -0,0 +1,384 @@
---
title: "NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
title: "NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL"
title: "No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL"


## 🚀 Introduction

TRL supports training LLMs using GRPO, an online learning algorithm recently introduced in the *DeepSeekMath* paper. In GRPO, the model learns from its own outputs: it generates responses during training, receives feedback, and uses that feedback to improve itself over time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a link to the paper & perhaps share the diagram of the algorithm (either from the paper or @qgallouedec's example from the TRL docs)


## 🛠️ Implementation Notes

Instead of launching vLLM as a server, the training now launches vLLM **in-process** using the external launcher, as shown below:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe point to the bits of code in TRL where this comes from?

)
```

> Note: Depending on the model size and the overall GPU memory requirements for training, you may need to adjust the vllm_gpu_memory_utilization parameter in `GRPOConfig` to avoid underutilization or out-of-memory errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a rough heuristic we can recommend for common model sizes like 7B / 14B / 32B / 70B?

@qgallouedec qgallouedec merged commit 9b825a9 into main Jun 3, 2025
1 check passed
@qgallouedec qgallouedec deleted the vllm-colocate-ibm branch June 3, 2025 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants