Skip to content

Value Model in PPOTrainer: Unsupported Optional type #3292

@AMindToThink

Description

@AMindToThink

Reproduction

The value_model parameter of PPOTrainer is value_model: Optional[nn.Module] = None, but if it is set as None or left as default, errors occur.

I'm unsure which of the following possibilities was intended:

  1. Someone planned to later add support for skipping the use of a value model and using the rewards directly instead of the advantage. This might make sense in memory-constrained applications, at the expense of much worse performance (though really, just use RLOO).
  2. This is a simple typo, and the type should be changed from Optional[nn.Module] to nn.Module and the parameter should be required.

I'm eager to make more and more substantial contributions to trl, so if these changes would be helpful, I'm happy to implement it!

The following is what happens if you run examples/scripts/ppo/tmp_ppo.py but set the value_model to None.

(interpretable-fine-tuning) cs29824@sting-vm-1:~/matthew/trl-SAC$ python -i examples/scripts/ppo/tmp_ppo.py --dataset_name trl-internal-testing/descriptiveness-sentiment-trl-style --dataset_train_split descriptiveness --learning_rate 3e-6 --output_dir models/minimal/ppo --per_device_train_batch_size 64 --gradient_accumulation_steps 1 --total_episodes 10000 --model_name_or_path EleutherAI/pythia-1b-deduped --missing_eos_penalty 1.0 [2025-04-15 04:10:55,588] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) Some weights of GPTNeoXForSequenceClassification were not initialized from the model checkpoint at EleutherAI/pythia-160m and are newly initialized: ['score.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Traceback (most recent call last): File "/home/cs29824/matthew/trl-SAC/examples/scripts/ppo/tmp_ppo.py", line 153, in <module> trainer = PPOTrainer( ^^^^^^^^^^^ File "/home/cs29824/matthew/interpretable-fine-tuning/.venv/lib/python3.11/site-packages/trl/trainer/ppo_trainer.py", line 222, in __init__ self.model = PolicyAndValueWrapper(self.policy_model, self.value_model) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cs29824/matthew/interpretable-fine-tuning/.venv/lib/python3.11/site-packages/trl/trainer/ppo_trainer.py", line 89, in __init__ self.critic_backbone = getattr(value_model, value_model.base_model_prefix) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'base_model_prefix'

System Info

  • Platform: Linux-5.15.0-1073-kvm-x86_64-with-glibc2.35
  • Python version: 3.11.11
  • PyTorch version: 2.6.0
  • CUDA device(s): Quadro RTX 8000, Quadro RTX 8000
  • Transformers version: 4.49.0
  • Accelerate version: 1.4.0
  • Accelerate config:
    • compute_environment: LOCAL_MACHINE
    • distributed_type: NO
    • mixed_precision: fp16
    • use_cpu: False
    • debug: False
    • num_processes: 1
    • machine_rank: 0
    • num_machines: 1
    • gpu_ids: 0
    • rdzv_backend: static
    • same_network: True
    • main_training_function: main
    • enable_cpu_affinity: False
    • downcast_bf16: no
    • tpu_use_cluster: False
    • tpu_use_sudo: False
    • tpu_env: []
  • Datasets version: 2.21.0
  • HF Hub version: 0.29.1
  • TRL version: 0.15.1
  • bitsandbytes version: 0.45.3
  • DeepSpeed version: 0.16.4
  • Diffusers version: not installed
  • Liger-Kernel version: not installed
  • LLM-Blender version: not installed
  • OpenAI version: not installed
  • PEFT version: 0.14.0

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete

Metadata

Metadata

Assignees

No one assigned

    Labels

    🏋 PPORelated to PPO🐛 bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions