Skip to content

PPOTrainer: num_mini_batches setting affects training progress bar in an unexpected way #2530

@dawidm

Description

@dawidm

System Info

  • Platform: Linux-5.15.0-122-generic-x86_64-with-glibc2.35
  • Python version: 3.10.15
  • PyTorch version: 2.5.1
  • CUDA device(s): not available
  • Transformers version: 4.46.3
  • Accelerate version: 1.1.1
  • Accelerate config: not found
  • Datasets version: 3.1.0
  • HF Hub version: 0.26.2
  • TRL version: 0.13.0
  • bitsandbytes version: not installed
  • DeepSpeed version: 0.16.0
  • Diffusers version: not installed
  • Liger-Kernel version: not installed
  • LLM-Blender version: not installed
  • OpenAI version: 1.57.1
  • PEFT version: 0.13.2

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

I tested PPOTrainer with tests/test_ppo_trainer.py/test_basic_training() just changing num_mini_batches value in PPOConfig. Train dataset length is 17 and default number of epochs is 3.

  • num_mini_batches=1 - it ends on progress bar value 13/13, logging ~3 epochs,
  • num_mini_batches=2 - it ends on progress bar value 7/14, logging ~3 epochs,
  • num_mini_batches=4 - it ends on progress bar value 4/16, logging ~3 epochs.

Expected behavior

Correct progress bar values.

Please note that steps aren't actual parameter updates in current implementation.
max_steps = args.num_total_batches * args.num_mini_batches but global_step update doesn't depend on num_mini_batches. Solution for this may require changing the convention of what step is (I discussed it for RLOO in #2515). It can be made an equivalent of episodes which seems to be consistent with PPOTrainer documentation (https://huggingface.co/docs/trl/main/en/ppo_trainer#explanation-of-the-logged-metrics):

episode: episode: The current global step or episode count in the training process.

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete

Metadata

Metadata

Assignees

No one assigned

    Labels

    🏋 PPORelated to PPO🐛 bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions