Enable External Launcher Support for vLLM in TRL for Efficient GRPO Training

### Feature request

vLLM has introduced support for an [external launcher](https://github.com/vllm-project/vllm/pull/12071), enabling vLLM processes to be co-located with other processes, such as training. By running multiple vLLM instances alongside the training process, we can improve inference process, reducing the time required for GRPO training. I propose adding an option in TRL to spawn vLLM processes per GPU using its external launcher.

### Motivation

Efficient GRPO relies heavily on fast and scalable inference. Currently, inference and training processes are executed separately, introducing bottlenecks that slow down training. The ideal is to enable multiple vLLM instances in the training process as done by the others like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) and [VERL](https://github.com/volcengine/verl).

With vLLM's newly introduced external launcher ([PR #12071](https://github.com/vllm-project/vllm/pull/12071)), it is now possible to co-locate vLLM instances with training processes, allowing spawn vLLM instances to run per GPU. This reduces inference latency, leading to shorter training durations.

By integrating vLLM’s external launcher into TRL, we can enhance distributed inference efficiency and accelerate GRPO training, making large-scale reinforcement learning more practical and scalable.

### Your contribution

Modify GRPO_trainer to enable initialization of vllm via external launcher - if TRL flag (such as self.args.external_launcher) is provided. We are considering doing a RAY-less version, in which case the changes can be quite minimal. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable External Launcher Support for vLLM in TRL for Efficient GRPO Training #3064

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enable External Launcher Support for vLLM in TRL for Efficient GRPO Training #3064

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions