Skip to content

[GRPO] Faster generation at the 7B scale #3195

@edbeeching

Description

@edbeeching

Feature request

I have been running experiments in the open-r1 project on 7B Instruct and Reasoning models on code, although the same observations can be seen on Mathmatics datasets as well.

For reasoning models, generation in still a bottleneck, here green is the generation time from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B and blue from Qwen/Qwen2.5-7B-Instruct

Image

We (@lewtun and @edbeeching ) would like to know what can be done to improve the generation time. Which is is over 5 minutes in some cases.
There are a few things to explore:

  1. Benchmarking the trl vllm-serve generation time on the 7B instruct and reasoning models detailed above. Expand the benchmark to include a mix TP / PP options. Look at how the average generation time varies in a larger batch setting (see point 3). @shirinyamani I think this would be a great task for you, this dataset is good source of prompts for the models: https://huggingface.co/datasets/open-r1/OpenR1-Math-cn_k12-86k)
  2. I believe in the 7B setting, it is possible to host the model on a single device (H100), vlm does not support DDP natively, but perhaps we could implement something. One idea is that in the 2 node setting, there are 8 accelerate processes on the node running the optimization loop. We could spawn 8 independent vllm instances on the second node and have each accelerate process send prompts to its own dedicated vllm instance, which would be specified by a unique port per process.
  3. The final point is how we send batches to the vllm instance. If I have understood correctly, the prompts are not grouped based on the number of gradient accumulation steps, could a "mega-batch" of prompts be sampled so that vllm can further benefit from its scheduler and continuous batching. It could be that this is already the case, I am sure @qgallouedec can answer this.

Motivation

Your contribution

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions