NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL #2882

qgallouedec · 2025-06-02T17:50:35Z

Congratulations! You've made it this far! Once merged, the article will appear at https://huggingface.co/blog. Official articles
require additional reviews. Alternatively, you can write a community article following the process here.

Preparing the Article

You're not quite done yet, though. Please make sure to follow this process (as documented here):

Add an entry to _blog.yml.
Add a thumbnail. There are no requirements here, but there is a template if it's helpful.
Check you use a short title and blog path.
Upload any additional assets (such as images) to the Documentation Images repo. This is to reduce bloat in the GitHub base repo when cloning and pulling. Try to have small images to avoid a slow or expensive user experience.
Add metadata (such as authors) to your md file. You can also specify guest or org for the authors.
Ensure the publication date is correct.
Preview the content. A quick way is to paste the markdown content in https://huggingface.co/new-blog. Do not click publish, this is just a way to do an early check.

Here is an example of a complete PR: #2382

Getting a Review

Please make sure to get a review from someone on your team or a co-author.
Once this is done and once all the steps above are completed, you should be able to merge.
There is no need for additional reviews if you and your co-authors are happy and meet all of the above.

Feel free to add @pcuenca as a reviewer if you want a final check. Keep in mind he'll be biased toward light reviews
(e.g., check for proper metadata) rather than content reviews unless explicitly asked.

lewtun

Awesome feature and really well written blog post! I left a few nits, but nothing that should block you if you need to go fast - just wait for @pcuenca to approve

lewtun · 2025-06-03T13:57:08Z

_blog.yml

@@ -6087,3 +6087,17 @@
    - nlp
    - tools
    - community
+
+- local: vllm-colocate
+  title: "NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL"


Not sure if it was intentional, but I'd use this for "NO"

Suggested change

title: "NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL"

title: "No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL"

lewtun · 2025-06-03T13:57:17Z

vllm-colocate.md

@@ -0,0 +1,384 @@
+---
+title: "NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL" 


Suggested change

title: "NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL"

title: "No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL"

lewtun · 2025-06-03T13:58:24Z

vllm-colocate.md

+
+## 🚀 Introduction
+
+TRL supports training LLMs using GRPO, an online learning algorithm recently introduced in the *DeepSeekMath* paper. In GRPO, the model learns from its own outputs: it generates responses during training, receives feedback, and uses that feedback to improve itself over time.


Would be nice to have a link to the paper & perhaps share the diagram of the algorithm (either from the paper or @qgallouedec's example from the TRL docs)

lewtun · 2025-06-03T14:00:44Z

vllm-colocate.md

+
+## 🛠️ Implementation Notes
+
+Instead of launching vLLM as a server, the training now launches vLLM **in-process** using the external launcher, as shown below:


Maybe point to the bits of code in TRL where this comes from?

lewtun · 2025-06-03T14:01:39Z

vllm-colocate.md

+)
+```
+
+> Note: Depending on the model size and the overall GPU memory requirements for training, you may need to adjust the vllm_gpu_memory_utilization parameter in `GRPOConfig` to avoid underutilization or out-of-memory errors.


Is there a rough heuristic we can recommend for common model sizes like 7B / 14B / 32B / 70B?

qgallouedec added 3 commits June 2, 2025 10:50

struct

cef6bcd

content

141a09d

authors

da6c251

qgallouedec marked this pull request as ready for review June 2, 2025 23:25

qgallouedec added 4 commits June 2, 2025 16:32

ticks

74d57b6

remove useless spaces

60e31a6

doc images

aa9ad49

update date

4f687eb

qgallouedec mentioned this pull request Jun 2, 2025

📰 Add blog "No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL" huggingface/trl#3527

Merged

thumbnail

9bb4659

qgallouedec requested a review from pcuenca June 3, 2025 06:25

kashif approved these changes Jun 3, 2025

View reviewed changes

lewtun approved these changes Jun 3, 2025

View reviewed changes

qgallouedec merged commit 9b825a9 into main Jun 3, 2025
1 check passed

qgallouedec deleted the vllm-colocate-ibm branch June 3, 2025 14:04

qgallouedec mentioned this pull request Jun 3, 2025

Improve vllm colocate #2887

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL #2882

NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL #2882

Uh oh!

qgallouedec commented Jun 2, 2025 •

edited

Loading

Uh oh!

lewtun left a comment

Uh oh!

lewtun Jun 3, 2025

Uh oh!

lewtun Jun 3, 2025

Uh oh!

lewtun Jun 3, 2025

Uh oh!

lewtun Jun 3, 2025

Uh oh!

lewtun Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

	title: "NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL"
	title: "No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL"

		@@ -0,0 +1,384 @@
		---
		title: "NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL"


		## 🚀 Introduction

		TRL supports training LLMs using GRPO, an online learning algorithm recently introduced in the DeepSeekMath paper. In GRPO, the model learns from its own outputs: it generates responses during training, receives feedback, and uses that feedback to improve itself over time.


		## 🛠️ Implementation Notes

		Instead of launching vLLM as a server, the training now launches vLLM in-process using the external launcher, as shown below:

NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL #2882

NO GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL #2882

Uh oh!

Conversation

qgallouedec commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preparing the Article

Getting a Review

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

lewtun Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

lewtun Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

lewtun Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

lewtun Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

lewtun Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

qgallouedec commented Jun 2, 2025 •

edited

Loading