Skip to content

GRPO:It takes the majority of time in generation using vllm #2971

@wusijie123

Description

@wusijie123

My model is 13b,and 8*A800,80GB,key settings are:

num_processes: 7
bf16: true
use_vllm: true
vllm_device: auto
vllm_gpu_memory_utilization: 0.55
do_eval: false
gradient_accumulation_steps: 4
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
num_generations: 7
num_train_epochs: 1
per_device_eval_batch_size: 1
per_device_train_batch_size: 1

In training step,it takes about 5 min per step,and the time generating complements are 4 min, I want to increase vllm_gpu_memory_utilization ,but it OOM when num_generations=7,how to make it quicklier?
I tried setting vllm on 2GPUS, but it failed in socket timeout.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions