GRPO:It takes the majority of time in generation using vllm

My model is 13b,and 8*A800,80GB,key settings are:

> num_processes: 7
bf16: true
use_vllm: true
vllm_device: auto
vllm_gpu_memory_utilization: 0.55
do_eval: false
gradient_accumulation_steps: 4
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
num_generations: 7
num_train_epochs: 1
per_device_eval_batch_size: 1
per_device_train_batch_size: 1

In training step,it takes about 5 min per step,and the time generating complements are 4 min, I want to increase `vllm_gpu_memory_utilization` ,but it OOM when `num_generations=7`,how to make it quicklier?
I tried setting vllm on 2GPUS, but it failed in socket timeout.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GRPO:It takes the majority of time in generation using vllm #2971

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GRPO:It takes the majority of time in generation using vllm #2971

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions