-
Notifications
You must be signed in to change notification settings - Fork 122
Closed
Labels
trainingTraining relatedTraining related
Description
Describe the bug
When the HFPolicy is supposed to be 'asleep' (while vLLM is doing generation), it still takes up about ~11GB on the GPU for Llama8B according to nvidia-smi. It should be ~0.
Reproduce
8xH100 (1 Node)
After fixes from #32 (or after it's merged to main)
uv run examples/run_grpo_math.py --config examples/configs/grpo_math_8B.yaml
info
I have the gpu_memory_utililzation of vllm set to 0.6 (default), so after a few steps (when things stabilize), I see 48GB of vllm usage and 11GB of hf usage per GPU during vLLM generation. When HF is training, I see vLLM correctly offloads down to ~700MB per GPU.
Metadata
Metadata
Assignees
Labels
trainingTraining relatedTraining related