## Paper [Dr. GRPO Paper](https://github.com/sail-sg/understand-r1-zero) ## Motivation/Benefits - Fixes optimization bias while maintaining reasoning performance - Reduces average incorrect response length by 38% (Fig.5 in paper) - Backward-compatible with existing GRPO workflows