📇 GRPO: print completions to console and update docs #2951

nopepper · 2025-02-24T20:39:04Z

What does this PR do?

Update GRPOConfig to replace log_completions with log_completions_steps
Add print_prompt_completions_sample() utility function for rich console logging
Modify GRPOTrainer to additionally print 5 random prompt-completion pairs every log_completions_steps steps

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

- Update `GRPOConfig` to replace `log_completions` with `log_completions_steps` - Add `print_prompt_completions_sample()` utility function for rich console logging - Modify `GRPOTrainer` to additionally print 5 random prompt-completion pairs every log_completions_steps steps

trl/trainer/grpo_trainer.py

trl/trainer/utils.py

…ons_sample when rich is not available

trl/trainer/grpo_trainer.py

trl/trainer/utils.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

trl/trainer/grpo_config.py

qgallouedec · 2025-02-24T21:46:34Z

LGTM, once the latest nit recommendations are applied, and CI green, we're good to merge, thanks :)

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

HuggingFaceDocBuilderDev · 2025-02-24T21:50:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2025-02-24T22:12:28Z

qgallouedec · 2025-02-24T22:15:15Z

@bot /style

qgallouedec · 2025-02-24T22:44:11Z

Failing tests are because liger-kernel introduced a bug in their latest version. We can safely ignore it, I guess they'll do a patch release soon. See linkedin/Liger-Kernel#586

* ✨ Enhance GRPO logging with configurable completions sampling - Update `GRPOConfig` to replace `log_completions` with `log_completions_steps` - Add `print_prompt_completions_sample()` utility function for rich console logging - Modify `GRPOTrainer` to additionally print 5 random prompt-completion pairs every log_completions_steps steps * GRPO trainer completions logging, move wandb checks together * Add rich availability check and use fallback in print_prompt_completions_sample when rich is not available * Update docstrings on print_prompt_completions_sample Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Revert back to simple log_completions bool * GRPO log completions fully * Remove print fallback from print_prompt_completions_sample * Move accelerator main process check up for grpo log completions * Explicit variable names in print_prompt_completions_sample * Make GRPOConfig docstring match field description * Update log_completions docs again Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update GRPOConfig docs to match field * improve readibility when prompt or completions are multilines * log reward * prevent hanging, don't print without rich, print reward * style --------- Co-authored-by: Robert Veres <robert.veres@languagetool.org> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

Robert Veres added 2 commits February 24, 2025 21:35

GRPO trainer completions logging, move wandb checks together

e27bc4b