Value Model in PPOTrainer: Unsupported Optional type

### Reproduction

The `value_model` parameter of PPOTrainer is `value_model: Optional[nn.Module] = None`, but if it is set as None or left as default, errors occur. 

I'm unsure which of the following possibilities was intended:

1. Someone planned to later add support for skipping the use of a value model and using the rewards directly instead of the advantage. This might make sense in memory-constrained applications, at the expense of much worse performance (though really, just use RLOO).
2. This is a simple typo, and the type should be changed from `Optional[nn.Module]` to `nn.Module` and the parameter should be required.


I'm eager to make more and more substantial contributions to `trl`, so if these changes would be helpful, I'm happy to implement it!

The following is what happens if you run examples/scripts/ppo/tmp_ppo.py but set the `value_model` to `None`.

`(interpretable-fine-tuning) cs29824@sting-vm-1:~/matthew/trl-SAC$ python -i examples/scripts/ppo/tmp_ppo.py     --dataset_name trl-internal-testing/descriptiveness-sentiment-trl-style     --dataset_train_split descriptiveness     --learning_rate 3e-6     --output_dir models/minimal/ppo     --per_device_train_batch_size 64     --gradient_accumulation_steps 1     --total_episodes 10000     --model_name_or_path EleutherAI/pythia-1b-deduped     --missing_eos_penalty 1.0
[2025-04-15 04:10:55,588] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Some weights of GPTNeoXForSequenceClassification were not initialized from the model checkpoint at EleutherAI/pythia-160m and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "/home/cs29824/matthew/trl-SAC/examples/scripts/ppo/tmp_ppo.py", line 153, in <module>
    trainer = PPOTrainer(
              ^^^^^^^^^^^
  File "/home/cs29824/matthew/interpretable-fine-tuning/.venv/lib/python3.11/site-packages/trl/trainer/ppo_trainer.py", line 222, in __init__
    self.model = PolicyAndValueWrapper(self.policy_model, self.value_model)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cs29824/matthew/interpretable-fine-tuning/.venv/lib/python3.11/site-packages/trl/trainer/ppo_trainer.py", line 89, in __init__
    self.critic_backbone = getattr(value_model, value_model.base_model_prefix)
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'base_model_prefix'`



### System Info

- Platform: Linux-5.15.0-1073-kvm-x86_64-with-glibc2.35
- Python version: 3.11.11
- PyTorch version: 2.6.0
- CUDA device(s): Quadro RTX 8000, Quadro RTX 8000
- Transformers version: 4.49.0
- Accelerate version: 1.4.0
- Accelerate config: 
  - compute_environment: LOCAL_MACHINE
  - distributed_type: NO
  - mixed_precision: fp16
  - use_cpu: False
  - debug: False
  - num_processes: 1
  - machine_rank: 0
  - num_machines: 1
  - gpu_ids: 0
  - rdzv_backend: static
  - same_network: True
  - main_training_function: main
  - enable_cpu_affinity: False
  - downcast_bf16: no
  - tpu_use_cluster: False
  - tpu_use_sudo: False
  - tpu_env: []
- Datasets version: 2.21.0
- HF Hub version: 0.29.1
- TRL version: 0.15.1
- bitsandbytes version: 0.45.3
- DeepSpeed version: 0.16.4
- Diffusers version: not installed
- Liger-Kernel version: not installed
- LLM-Blender version: not installed
- OpenAI version: not installed
- PEFT version: 0.14.0

### Checklist

- [x] I have checked that my issue isn't already filed (see [open issues](https://github.com/huggingface/trl/issues?q=is%3Aissue))
- [x] I have included my system information
- [x] Any code provided is minimal, complete, and reproducible ([more on MREs](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any code provided is properly formatted in code blocks, (no screenshot, [more on code blocks](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any traceback provided is complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Value Model in PPOTrainer: Unsupported Optional type #3292

Reproduction

System Info

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Value Model in PPOTrainer: Unsupported Optional type #3292

Description

Reproduction

System Info

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions