generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Open
Labels
🏋 GRPORelated to GRPORelated to GRPO
Description
My model is 13b,and 8*A800,80GB,key settings are:
num_processes: 7
bf16: true
use_vllm: true
vllm_device: auto
vllm_gpu_memory_utilization: 0.55
do_eval: false
gradient_accumulation_steps: 4
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
num_generations: 7
num_train_epochs: 1
per_device_eval_batch_size: 1
per_device_train_batch_size: 1
In training step,it takes about 5 min per step,and the time generating complements are 4 min, I want to increase vllm_gpu_memory_utilization
,but it OOM when num_generations=7
,how to make it quicklier?
I tried setting vllm on 2GPUS, but it failed in socket timeout.
JJJoeha and xz259
Metadata
Metadata
Assignees
Labels
🏋 GRPORelated to GRPORelated to GRPO