Skip to content

Enable External Launcher Support for vLLM in TRL for Efficient GRPO Training #3064

@mtoslalibu

Description

@mtoslalibu

Feature request

vLLM has introduced support for an external launcher, enabling vLLM processes to be co-located with other processes, such as training. By running multiple vLLM instances alongside the training process, we can improve inference process, reducing the time required for GRPO training. I propose adding an option in TRL to spawn vLLM processes per GPU using its external launcher.

Motivation

Efficient GRPO relies heavily on fast and scalable inference. Currently, inference and training processes are executed separately, introducing bottlenecks that slow down training. The ideal is to enable multiple vLLM instances in the training process as done by the others like OpenRLHF and VERL.

With vLLM's newly introduced external launcher (PR #12071), it is now possible to co-locate vLLM instances with training processes, allowing spawn vLLM instances to run per GPU. This reduces inference latency, leading to shorter training durations.

By integrating vLLM’s external launcher into TRL, we can enhance distributed inference efficiency and accelerate GRPO training, making large-scale reinforcement learning more practical and scalable.

Your contribution

Modify GRPO_trainer to enable initialization of vllm via external launcher - if TRL flag (such as self.args.external_launcher) is provided. We are considering doing a RAY-less version, in which case the changes can be quite minimal.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions