Online DPO Meets Error When Using Deepspeed for Speed Up.

### System Info

!pip install git+https://github.com/huggingface/trl.git

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder
- [ ] My own task or dataset (give details below)

### Reproduction

!ACCELERATE_LOG_LEVEL=info accelerate launch --config_file multi_gpu.yaml \
    online_dpo.py \
    --model_name_or_path mistralai/Mistral-7B-v0.1 \
    --reward_model_path Ray2333/GRM-Llama3.2-3B-rewardmodel-ft \
    --dataset_name nvidia/HelpSteer2 \
    --learning_rate 5.0e-6 \
    --output_dir pythia-1b-tldr-online-dpo \
    --per_device_train_batch_size 16 \
    --gradient_accumulation_steps 8 \
    --warmup_ratio 0.1 \
    --missing_eos_penalty 1.0 \
    --use_peft

Traceback (most recent call last):
  File "/home/ec2-user/SageMaker/Zhichao/UNA_online/UNA_peft/una_peft.py", line 356, in <module>
[2024-11-28 16:59:10,071] [INFO] [config.py:999:print] DeepSpeedEngine configuration:
    trainer = OnlineDPOTrainer(
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 165, in wrapped_func
    return func(*args, **kwargs)
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/trl/trainer/online_dpo_trainer.py", line 286, in __init__
  File "/home/ec2-user/SageMaker/Zhichao/UNA_online/UNA_peft/una_peft.py", line 356, in <module>
    self.ref_model = prepare_deepspeed(self.ref_model, args.per_device_train_batch_size, args.fp16, args.bf16)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/trl/trainer/utils.py", line 1212, in prepare_deepspeed
    trainer = OnlineDPOTrainer(
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 165, in wrapped_func
    return func(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/trl/trainer/online_dpo_trainer.py", line 286, in __init__
    model, *_ = deepspeed.initialize(model=model, config=config_kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/deepspeed/__init__.py", line 139, in initialize
    assert model is not None, "deepspeed.initialize requires a model"
    AssertionErrorself.ref_model = prepare_deepspeed(self.ref_model, args.per_device_train_batch_size, args.fp16, args.bf16): 
deepspeed.initialize requires a model  File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/trl/trainer/utils.py", line 1212, in prepare_deepspeed

### Expected behavior

It should be able to run.

### Checklist

- [X] I have checked that my issue isn't already filed (see [open issues](https://github.com/huggingface/trl/issues?q=is%3Aissue))
- [X] I have included my system information
- [X] Any code provided is minimal, complete, and reproducible ([more on MREs](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [X] Any code provided is properly formatted in code blocks, (no screenshot, [more on code blocks](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [X] Any traceback provided is complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Online DPO Meets Error When Using Deepspeed for Speed Up. #2410

System Info

Information

Tasks

Reproduction

Expected behavior

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Online DPO Meets Error When Using Deepspeed for Speed Up. #2410

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions