Training model in GRPO Trainer will use the vllm_device to compute logps,  which causes sudden vram increases of vllm device and caused OOM error.

### Reproduction

Question：The vram suddenly increase in vllm device, when training model update the parameters.

Description:

I'm fine-tuning the R1-32b-int4 model using the 2*A100(40gb) trl library, and I've implemented qlora support for grpo by using the code in unsloth-zoo, where one card is trained (cuda:0) and the other (cuda:1) generates the data, and the two cards just push the lora parameter, which lets me use the two a100 training grpo.

During training, the two models are loaded with the following explicit memory:

Neither one showed oom.

![Image](https://github.com/user-attachments/assets/fd2b3019-7313-4418-aa58-46e59c376fcd)

While training the model to update the parameters, I noticed:

![Image](https://github.com/user-attachments/assets/6227a74c-ff33-49ed-a78a-2bef5e834609)

Both cards showed a rise in video memory, which shouldn't be the case, only cuda:0 should be rising, and cuda:1 should not be changing.

I think the reason is on move_model_vllm, I found out by setting a breakpoint that the function was not called when the oom error occurred, which means that the increase in cuda:1's video memory was not caused by updating cuda0's model parameters to vllm.

And then, I found out by further debugging that the model called when the model calls self._get_per_token_logps is encapsulated as a dataparallel model with devices ids 0 and 1.

![Image](https://github.com/user-attachments/assets/0b17ff7b-f934-4a5b-a0ed-8df91b756800)

This makes the model call vllm's devices when calculating or updating parameters.


I further tracked this down and found that in the mid-transformer's trainer, if the gpu is greater than 1 the model is automatically encapsulated as a dataparallel model and all graphics cards are included:
```python
        if self.args.n_gpu > 1 and not getattr(model, "is_loaded_in_8bit", False):
            model = nn.DataParallel(model)
```






outputs:

```
Traceback (most recent call last):
  File "example.py", line 42, in <module>
    ...
```


### System Info


Copy-paste the following information when reporting an issue:

- Platform: Linux-5.15.0-60-generic-x86_64-with-glibc2.31
- Python version: 3.10.14
- TRL version: 0.16.0.dev0+b55d9f0
- PyTorch version: 2.5.1
- CUDA device(s): NVIDIA A100-SXM4-40GB, NVIDIA A100-SXM4-40GB
- Transformers version: 4.48.3
- Accelerate version: 1.4.0
- Accelerate config: not found
- Datasets version: 3.0.1
- HF Hub version: 0.29.1
- bitsandbytes version: 0.45.3
- DeepSpeed version: not installed
- Diffusers version: 0.32.2
- Liger-Kernel version: not installed
- LLM-Blender version: not installed
- OpenAI version: 1.65.1
- PEFT version: 0.14.0
- vLLM version: 0.7.3


### Checklist

- [x] I have checked that my issue isn't already filed (see [open issues](https://github.com/huggingface/trl/issues?q=is%3Aissue))
- [x] I have included my system information
- [x] Any code provided is minimal, complete, and reproducible ([more on MREs](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any code provided is properly formatted in code blocks, (no screenshot, [more on code blocks](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any traceback provided is complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training model in GRPO Trainer will use the vllm_device to compute logps, which causes sudden vram increases of vllm device and caused OOM error. #3086

Reproduction

System Info

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training model in GRPO Trainer will use the vllm_device to compute logps, which causes sudden vram increases of vllm device and caused OOM error. #3086

Description

Reproduction

System Info

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions