PPOTrainer: num_mini_batches setting affects training progress bar in an unexpected way

### System Info

- Platform: Linux-5.15.0-122-generic-x86_64-with-glibc2.35
- Python version: 3.10.15
- PyTorch version: 2.5.1
- CUDA device(s): not available
- Transformers version: 4.46.3
- Accelerate version: 1.1.1
- Accelerate config: not found
- Datasets version: 3.1.0
- HF Hub version: 0.26.2
- TRL version: 0.13.0
- bitsandbytes version: not installed
- DeepSpeed version: 0.16.0
- Diffusers version: not installed
- Liger-Kernel version: not installed
- LLM-Blender version: not installed
- OpenAI version: 1.57.1
- PEFT version: 0.13.2

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder
- [ ] My own task or dataset (give details below)

### Reproduction

I tested PPOTrainer with `tests/test_ppo_trainer.py/test_basic_training()` just changing `num_mini_batches` value in `PPOConfig`. Train dataset length is 17 and default number of epochs is 3.
* `num_mini_batches=1` - it ends on progress bar value 13/13, logging ~3 epochs,
* `num_mini_batches=2` - it ends on progress bar value 7/14, logging ~3 epochs,
* `num_mini_batches=4` - it ends on progress bar value 4/16, logging ~3 epochs.

### Expected behavior

Correct progress bar values. 

Please note that steps aren't actual parameter updates in current implementation.
`max_steps = args.num_total_batches * args.num_mini_batches` but `global_step` update doesn't depend on `num_mini_batches`. Solution for this may require changing the convention of what `step` is (I discussed it for RLOO in #2515). It can be made an equivalent of episodes which seems to be consistent with PPOTrainer documentation (https://huggingface.co/docs/trl/main/en/ppo_trainer#explanation-of-the-logged-metrics):
> episode: episode: The current global step or episode count in the training process.



### Checklist

- [X] I have checked that my issue isn't already filed (see [open issues](https://github.com/huggingface/trl/issues?q=is%3Aissue))
- [X] I have included my system information
- [X] Any code provided is minimal, complete, and reproducible ([more on MREs](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [X] Any code provided is properly formatted in code blocks, (no screenshot, [more on code blocks](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [X] Any traceback provided is complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PPOTrainer: num_mini_batches setting affects training progress bar in an unexpected way #2530

System Info

Information

Tasks

Reproduction

Expected behavior

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PPOTrainer: num_mini_batches setting affects training progress bar in an unexpected way #2530

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions