-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Reproduction
The value_model
parameter of PPOTrainer is value_model: Optional[nn.Module] = None
, but if it is set as None or left as default, errors occur.
I'm unsure which of the following possibilities was intended:
- Someone planned to later add support for skipping the use of a value model and using the rewards directly instead of the advantage. This might make sense in memory-constrained applications, at the expense of much worse performance (though really, just use RLOO).
- This is a simple typo, and the type should be changed from
Optional[nn.Module]
tonn.Module
and the parameter should be required.
I'm eager to make more and more substantial contributions to trl
, so if these changes would be helpful, I'm happy to implement it!
The following is what happens if you run examples/scripts/ppo/tmp_ppo.py but set the value_model
to None
.
(interpretable-fine-tuning) cs29824@sting-vm-1:~/matthew/trl-SAC$ python -i examples/scripts/ppo/tmp_ppo.py --dataset_name trl-internal-testing/descriptiveness-sentiment-trl-style --dataset_train_split descriptiveness --learning_rate 3e-6 --output_dir models/minimal/ppo --per_device_train_batch_size 64 --gradient_accumulation_steps 1 --total_episodes 10000 --model_name_or_path EleutherAI/pythia-1b-deduped --missing_eos_penalty 1.0 [2025-04-15 04:10:55,588] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) Some weights of GPTNeoXForSequenceClassification were not initialized from the model checkpoint at EleutherAI/pythia-160m and are newly initialized: ['score.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Traceback (most recent call last): File "/home/cs29824/matthew/trl-SAC/examples/scripts/ppo/tmp_ppo.py", line 153, in <module> trainer = PPOTrainer( ^^^^^^^^^^^ File "/home/cs29824/matthew/interpretable-fine-tuning/.venv/lib/python3.11/site-packages/trl/trainer/ppo_trainer.py", line 222, in __init__ self.model = PolicyAndValueWrapper(self.policy_model, self.value_model) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cs29824/matthew/interpretable-fine-tuning/.venv/lib/python3.11/site-packages/trl/trainer/ppo_trainer.py", line 89, in __init__ self.critic_backbone = getattr(value_model, value_model.base_model_prefix) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'base_model_prefix'
System Info
- Platform: Linux-5.15.0-1073-kvm-x86_64-with-glibc2.35
- Python version: 3.11.11
- PyTorch version: 2.6.0
- CUDA device(s): Quadro RTX 8000, Quadro RTX 8000
- Transformers version: 4.49.0
- Accelerate version: 1.4.0
- Accelerate config:
- compute_environment: LOCAL_MACHINE
- distributed_type: NO
- mixed_precision: fp16
- use_cpu: False
- debug: False
- num_processes: 1
- machine_rank: 0
- num_machines: 1
- gpu_ids: 0
- rdzv_backend: static
- same_network: True
- main_training_function: main
- enable_cpu_affinity: False
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
- Datasets version: 2.21.0
- HF Hub version: 0.29.1
- TRL version: 0.15.1
- bitsandbytes version: 0.45.3
- DeepSpeed version: 0.16.4
- Diffusers version: not installed
- Liger-Kernel version: not installed
- LLM-Blender version: not installed
- OpenAI version: not installed
- PEFT version: 0.14.0
Checklist
- I have checked that my issue isn't already filed (see open issues)
- I have included my system information
- Any code provided is minimal, complete, and reproducible (more on MREs)
- Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
- Any traceback provided is complete