Skip to content

[GRPO] Fix: Processing ref logprobs in batches #3740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 20, 2025

Conversation

idanshen
Copy link
Contributor

Until commit d6a969f, ref_per_token_logps was calculated inside _compute_loss or compute_liger_loss, which gets input of size per_device_batch_size.
Since it moved to _generate_and_score_completions it now need to be passed a batch_size. Without it, memory consumption is not independent of the number of gradient accumulations and can spike when beta != 0

Until commit d6a969f ref_per_token_logps used to be calculated inside _compute_loss or compute_liger_loss which used to get input of size per_device_batch_size.
Since it moved to _generate_and_score_completions it now need to be passed a batch_size. Without it, memory consumption is not independent of the number of gradient accumulations.
@idanshen idanshen changed the title [GRPO] Processing ref logprobs in batches [GRPO] Fix: Processing ref logprobs in batches Jul 17, 2025
Copy link
Collaborator

@LeonEricsson LeonEricsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice catch

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@LeonEricsson LeonEricsson merged commit 5787f3b into huggingface:main Jul 20, 2025
8 of 10 checks passed
marcandrelarochelle pushed a commit to marcandrelarochelle/trl that referenced this pull request Jul 29, 2025
Co-authored-by: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants