RLOO Trainer Stopping After 1 Epoch

### System Info

- Platform: Linux-3.10.0-693.11.6.el7.x86_64-x86_64-with-glibc2.17
- Python version: 3.9.5
- PyTorch version: 2.4.0
- CUDA device(s): not available
- Transformers version: 4.46.2
- Accelerate version: 1.1.1
- Accelerate config: not found
- Datasets version: 3.1.0
- HF Hub version: 0.26.2
- TRL version: 0.13.0.dev0
- bitsandbytes version: not installed
- DeepSpeed version: 0.15.4
- Diffusers version: not installed
- Liger-Kernel version: not installed
- LLM-Blender version: not installed
- OpenAI version: 1.54.4
- PEFT version: not installed

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder
- [ ] My own task or dataset (give details below)

### Reproduction

While reproducing RLOO using a multi-GPU setup with official [script](https://huggingface.co/docs/trl/en/rloo_trainer#benchmark-experiments), training consistently halts midway, regardless of whether it's set for 1,000 or 1 million episodes. An example wandb [run](https://wandb.ai/omerveyselcagatan/huggingface/runs/zdftqdx5?nw=nwuseromerveyselcagatan) that ended with 1954 steps, whereas it should  3908. 

### Expected behavior

Should have run for 3908, or possible step miscalculation.

### Checklist

- [X] I have checked that my issue isn't already filed (see [open issues](https://github.com/huggingface/trl/issues?q=is%3Aissue))
- [X] I have included my system information
- [X] Any code provided is minimal, complete, and reproducible ([more on MREs](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [X] Any code provided is properly formatted in code blocks, (no screenshot, [more on code blocks](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [X] Any traceback provided is complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RLOO Trainer Stopping After 1 Epoch #2401

System Info

Information

Tasks

Reproduction

Expected behavior

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RLOO Trainer Stopping After 1 Epoch #2401

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions