generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Open
Labels
✨ enhancementNew feature or requestNew feature or request🏋 GRPORelated to GRPORelated to GRPO🚀 deepspeedRelated to deepspeedRelated to deepspeed
Description
Feature request
I have been running experiments in the open-r1 project on 7B Instruct and Reasoning models on code, although the same observations can be seen on Mathmatics datasets as well.
For reasoning models, generation in still a bottleneck, here green is the generation time from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
and blue from Qwen/Qwen2.5-7B-Instruct
We (@lewtun and @edbeeching ) would like to know what can be done to improve the generation time. Which is is over 5 minutes in some cases.
There are a few things to explore:
- Benchmarking the
trl vllm-serve
generation time on the 7B instruct and reasoning models detailed above. Expand the benchmark to include a mix TP / PP options. Look at how the average generation time varies in a larger batch setting (see point 3). @shirinyamani I think this would be a great task for you, this dataset is good source of prompts for the models: https://huggingface.co/datasets/open-r1/OpenR1-Math-cn_k12-86k) - I believe in the 7B setting, it is possible to host the model on a single device (H100), vlm does not support DDP natively, but perhaps we could implement something. One idea is that in the 2 node setting, there are 8 accelerate processes on the node running the optimization loop. We could spawn 8 independent vllm instances on the second node and have each accelerate process send prompts to its own dedicated vllm instance, which would be specified by a unique port per process.
- The final point is how we send batches to the vllm instance. If I have understood correctly, the prompts are not grouped based on the number of gradient accumulation steps, could a "mega-batch" of prompts be sampled so that vllm can further benefit from its scheduler and continuous batching. It could be that this is already the case, I am sure @qgallouedec can answer this.
Motivation
Your contribution
Metadata
Metadata
Assignees
Labels
✨ enhancementNew feature or requestNew feature or request🏋 GRPORelated to GRPORelated to GRPO🚀 deepspeedRelated to deepspeedRelated to deepspeed