generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Closed
Labels
Description
Reproduction
At line 1068 in grpo_trainer.py:
prompts_text = self.processing_class.batch_decode(
prompt_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
skip_special_tokens=True
This removes the necessary chat format tokens for Qwen instruct models and many others. This leads to severely mangled generations.
I assume it should be:
skip_special_tokens=False
System Info
- Platform: Linux-5.15.0-139-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- TRL version: 0.19.0
- PyTorch version: 2.7.0
- accelerator(s): NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3
- Transformers version: 4.52.4
- Accelerate version: 1.8.1
- Accelerate config: not found
- Datasets version: 3.6.0
- HF Hub version: 0.33.0
- bitsandbytes version: not installed
- DeepSpeed version: 0.17.1
- Diffusers version: not installed
- Liger-Kernel version: 0.5.10
- LLM-Blender version: not installed
- OpenAI version: 1.91.0
- PEFT version: not installed
- vLLM version: 0.9.1
Checklist
- I have checked that my issue isn't already filed (see open issues)
- I have included my system information
- Any code provided is minimal, complete, and reproducible (more on MREs)
- Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
- Any traceback provided is complete
carlesonielfa, daniel-dona and adamkarvonen