Skip to content

0.19.0 breaks the chat format in GRPO trainer leading to severely mangled chat generations #3644

@mchugha2

Description

@mchugha2

Reproduction

At line 1068 in grpo_trainer.py:

prompts_text = self.processing_class.batch_decode(
    prompt_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
skip_special_tokens=True

This removes the necessary chat format tokens for Qwen instruct models and many others. This leads to severely mangled generations.

I assume it should be:

skip_special_tokens=False

System Info

  • Platform: Linux-5.15.0-139-generic-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • TRL version: 0.19.0
  • PyTorch version: 2.7.0
  • accelerator(s): NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3
  • Transformers version: 4.52.4
  • Accelerate version: 1.8.1
  • Accelerate config: not found
  • Datasets version: 3.6.0
  • HF Hub version: 0.33.0
  • bitsandbytes version: not installed
  • DeepSpeed version: 0.17.1
  • Diffusers version: not installed
  • Liger-Kernel version: 0.5.10
  • LLM-Blender version: not installed
  • OpenAI version: 1.91.0
  • PEFT version: not installed
  • vLLM version: 0.9.1

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete

Metadata

Metadata

Assignees

No one assigned

    Labels

    🏋 GRPORelated to GRPO🐛 bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions