Skip to content

DeepSpeedZeRoOffload is incompatible with DeepSpeed>=0.16.4 #2962

@jamesbraza

Description

@jamesbraza

Reproduction

deepspeedai/DeepSpeed#6847 released in deepspeed==0.16.4 renamed DeepSpeedZeRoOffload._register_hooks_recursively, which trl uses here: https://github.com/huggingface/trl/blob/v0.15.2/trl/models/utils.py#L174

9: [rank75]:   File "/home/james/code/molr1-two/.venv/lib/python3.12/site-packages/trl/models/utils.py", line 173, in add_hooks
9: [rank75]:     optimizer_offload._register_hooks_recursively(optimizer_offload.module)
9: [rank75]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9: [rank75]: AttributeError: 'DeepSpeedZeRoOffload' object has no attribute '_register_hooks_recursively'

System Info

  • Platform: Linux-5.15.0-112-generic-x86_64-with-glibc2.35
  • Python version: 3.12.9
  • PyTorch version: 2.5.1+cu124
  • CUDA device(s): NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3
  • Transformers version: 4.50.0.dev0
  • Accelerate version: 1.3.0
  • Accelerate config: not found
  • Datasets version: 3.2.0
  • HF Hub version: 0.28.1
  • TRL version: 0.15.0.dev0
  • bitsandbytes version: not installed
  • DeepSpeed version: 0.15.4
  • Diffusers version: not installed
  • Liger-Kernel version: 0.5.2
  • LLM-Blender version: not installed
  • OpenAI version: 1.61.1
  • PEFT version: 0.14.0

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete

Metadata

Metadata

Assignees

No one assigned

    Labels

    ⚡ PEFTRelated to PEFT🐛 bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions