-
Notifications
You must be signed in to change notification settings - Fork 2.1k
📇 GRPO: print completions to console and update docs #2951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📇 GRPO: print completions to console and update docs #2951
Conversation
- Update `GRPOConfig` to replace `log_completions` with `log_completions_steps` - Add `print_prompt_completions_sample()` utility function for rich console logging - Modify `GRPOTrainer` to additionally print 5 random prompt-completion pairs every log_completions_steps steps
…ons_sample when rich is not available
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
LGTM, once the latest nit recommendations are applied, and CI green, we're good to merge, thanks :) |
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@bot /style |
Failing tests are because liger-kernel introduced a bug in their latest version. We can safely ignore it, I guess they'll do a patch release soon. See linkedin/Liger-Kernel#586 |
* ✨ Enhance GRPO logging with configurable completions sampling - Update `GRPOConfig` to replace `log_completions` with `log_completions_steps` - Add `print_prompt_completions_sample()` utility function for rich console logging - Modify `GRPOTrainer` to additionally print 5 random prompt-completion pairs every log_completions_steps steps * GRPO trainer completions logging, move wandb checks together * Add rich availability check and use fallback in print_prompt_completions_sample when rich is not available * Update docstrings on print_prompt_completions_sample Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Revert back to simple log_completions bool * GRPO log completions fully * Remove print fallback from print_prompt_completions_sample * Move accelerator main process check up for grpo log completions * Explicit variable names in print_prompt_completions_sample * Make GRPOConfig docstring match field description * Update log_completions docs again Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update GRPOConfig docs to match field * improve readibility when prompt or completions are multilines * log reward * prevent hanging, don't print without rich, print reward * style --------- Co-authored-by: Robert Veres <robert.veres@languagetool.org> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
* ✨ Enhance GRPO logging with configurable completions sampling - Update `GRPOConfig` to replace `log_completions` with `log_completions_steps` - Add `print_prompt_completions_sample()` utility function for rich console logging - Modify `GRPOTrainer` to additionally print 5 random prompt-completion pairs every log_completions_steps steps * GRPO trainer completions logging, move wandb checks together * Add rich availability check and use fallback in print_prompt_completions_sample when rich is not available * Update docstrings on print_prompt_completions_sample Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Revert back to simple log_completions bool * GRPO log completions fully * Remove print fallback from print_prompt_completions_sample * Move accelerator main process check up for grpo log completions * Explicit variable names in print_prompt_completions_sample * Make GRPOConfig docstring match field description * Update log_completions docs again Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update GRPOConfig docs to match field * improve readibility when prompt or completions are multilines * log reward * prevent hanging, don't print without rich, print reward * style --------- Co-authored-by: Robert Veres <robert.veres@languagetool.org> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
* ✨ Enhance GRPO logging with configurable completions sampling - Update `GRPOConfig` to replace `log_completions` with `log_completions_steps` - Add `print_prompt_completions_sample()` utility function for rich console logging - Modify `GRPOTrainer` to additionally print 5 random prompt-completion pairs every log_completions_steps steps * GRPO trainer completions logging, move wandb checks together * Add rich availability check and use fallback in print_prompt_completions_sample when rich is not available * Update docstrings on print_prompt_completions_sample Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Revert back to simple log_completions bool * GRPO log completions fully * Remove print fallback from print_prompt_completions_sample * Move accelerator main process check up for grpo log completions * Explicit variable names in print_prompt_completions_sample * Make GRPOConfig docstring match field description * Update log_completions docs again Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update GRPOConfig docs to match field * improve readibility when prompt or completions are multilines * log reward * prevent hanging, don't print without rich, print reward * style --------- Co-authored-by: Robert Veres <robert.veres@languagetool.org> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
What does this PR do?
GRPOConfig
to replacelog_completions
withlog_completions_steps
print_prompt_completions_sample()
utility function for rich console loggingGRPOTrainer
to additionally print 5 random prompt-completion pairs every log_completions_steps stepsFixes #2948
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.