0.19.0 breaks the chat format in GRPO trainer leading to severely mangled chat generations

### Reproduction

At line 1068 in grpo_trainer.py:

```python
prompts_text = self.processing_class.batch_decode(
    prompt_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
```

```python
skip_special_tokens=True
```

This removes the necessary chat format tokens for Qwen instruct models and many others. This leads to severely mangled generations. 

I assume it should be:

```python
skip_special_tokens=False
```

### System Info

- Platform: Linux-5.15.0-139-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- TRL version: 0.19.0
- PyTorch version: 2.7.0
- accelerator(s): NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3
- Transformers version: 4.52.4
- Accelerate version: 1.8.1
- Accelerate config: not found
- Datasets version: 3.6.0
- HF Hub version: 0.33.0
- bitsandbytes version: not installed
- DeepSpeed version: 0.17.1
- Diffusers version: not installed
- Liger-Kernel version: 0.5.10
- LLM-Blender version: not installed
- OpenAI version: 1.91.0
- PEFT version: not installed
- vLLM version: 0.9.1

### Checklist

- [x] I have checked that my issue isn't already filed (see [open issues](https://github.com/huggingface/trl/issues?q=is%3Aissue))
- [x] I have included my system information
- [x] Any code provided is minimal, complete, and reproducible ([more on MREs](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any code provided is properly formatted in code blocks, (no screenshot, [more on code blocks](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any traceback provided is complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

0.19.0 breaks the chat format in GRPO trainer leading to severely mangled chat generations #3644

Reproduction

System Info

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

0.19.0 breaks the chat format in GRPO trainer leading to severely mangled chat generations #3644

Description

Reproduction

System Info

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions