Improve memory offloading for FSDP models (v1 & v2)

**Describe the bug**
When the HFPolicy is supposed to be 'asleep' (while vLLM is doing generation), it still takes up about ~11GB on the GPU for Llama8B according to nvidia-smi. It should be ~0. 

**Reproduce**
8xH100 (1 Node)
After fixes from #32 (or after it's merged to main)
`uv run examples/run_grpo_math.py --config examples/configs/grpo_math_8B.yaml`

**info**
I have the gpu_memory_utililzation of vllm set to 0.6 (default), so after a few steps (when things stabilize), I see 48GB of vllm usage and 11GB of hf usage per GPU during vLLM generation. When HF is training, I see vLLM correctly offloads down to ~700MB per GPU. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve memory offloading for FSDP models (v1 & v2) #33

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve memory offloading for FSDP models (v1 & v2) #33

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions