Skip to content

Improve memory offloading for FSDP models (v1 & v2) #33

@SahilJain314

Description

@SahilJain314

Describe the bug
When the HFPolicy is supposed to be 'asleep' (while vLLM is doing generation), it still takes up about ~11GB on the GPU for Llama8B according to nvidia-smi. It should be ~0.

Reproduce
8xH100 (1 Node)
After fixes from #32 (or after it's merged to main)
uv run examples/run_grpo_math.py --config examples/configs/grpo_math_8B.yaml

info
I have the gpu_memory_utililzation of vllm set to 0.6 (default), so after a few steps (when things stabilize), I see 48GB of vllm usage and 11GB of hf usage per GPU during vLLM generation. When HF is training, I see vLLM correctly offloads down to ~700MB per GPU.

Metadata

Metadata

Assignees

Labels

trainingTraining related

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions