-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
System Info
- Platform: Linux-5.15.0-122-generic-x86_64-with-glibc2.35
- Python version: 3.10.15
- PyTorch version: 2.5.1
- CUDA device(s): not available
- Transformers version: 4.46.3
- Accelerate version: 1.1.1
- Accelerate config: not found
- Datasets version: 3.1.0
- HF Hub version: 0.26.2
- TRL version: 0.13.0
- bitsandbytes version: not installed
- DeepSpeed version: 0.16.0
- Diffusers version: not installed
- Liger-Kernel version: not installed
- LLM-Blender version: not installed
- OpenAI version: 1.57.1
- PEFT version: 0.13.2
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder - My own task or dataset (give details below)
Reproduction
I tested PPOTrainer with tests/test_ppo_trainer.py/test_basic_training()
just changing num_mini_batches
value in PPOConfig
. Train dataset length is 17 and default number of epochs is 3.
num_mini_batches=1
- it ends on progress bar value 13/13, logging ~3 epochs,num_mini_batches=2
- it ends on progress bar value 7/14, logging ~3 epochs,num_mini_batches=4
- it ends on progress bar value 4/16, logging ~3 epochs.
Expected behavior
Correct progress bar values.
Please note that steps aren't actual parameter updates in current implementation.
max_steps = args.num_total_batches * args.num_mini_batches
but global_step
update doesn't depend on num_mini_batches
. Solution for this may require changing the convention of what step
is (I discussed it for RLOO in #2515). It can be made an equivalent of episodes which seems to be consistent with PPOTrainer documentation (https://huggingface.co/docs/trl/main/en/ppo_trainer#explanation-of-the-logged-metrics):
episode: episode: The current global step or episode count in the training process.
Checklist
- I have checked that my issue isn't already filed (see open issues)
- I have included my system information
- Any code provided is minimal, complete, and reproducible (more on MREs)
- Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
- Any traceback provided is complete