[GRPO] Fix: Processing ref logprobs in batches #3740

idanshen · 2025-07-16T22:18:37Z

Until commit d6a969f, ref_per_token_logps was calculated inside _compute_loss or compute_liger_loss, which gets input of size per_device_batch_size.
Since it moved to _generate_and_score_completions it now need to be passed a batch_size. Without it, memory consumption is not independent of the number of gradient accumulations and can spike when beta != 0

Until commit d6a969f ref_per_token_logps used to be calculated inside _compute_loss or compute_liger_loss which used to get input of size per_device_batch_size. Since it moved to _generate_and_score_completions it now need to be passed a batch_size. Without it, memory consumption is not independent of the number of gradient accumulations.

LeonEricsson

LGTM! Nice catch

HuggingFaceDocBuilderDev · 2025-07-20T10:00:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Co-authored-by: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com>

idanshen changed the title ~~[GRPO] Processing ref logprobs in batches~~ [GRPO] Fix: Processing ref logprobs in batches Jul 17, 2025

Merge branch 'main' into main

d84f2f5

LeonEricsson approved these changes Jul 20, 2025

View reviewed changes

LeonEricsson merged commit 5787f3b into huggingface:main Jul 20, 2025
8 of 10 checks passed

marcandrelarochelle pushed a commit to marcandrelarochelle/trl that referenced this pull request Jul 29, 2025

[GRPO] Fix: Processing ref logprobs in batches (huggingface#3740)

81934b6

Co-authored-by: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GRPO] Fix: Processing ref logprobs in batches #3740

[GRPO] Fix: Processing ref logprobs in batches #3740

Uh oh!

idanshen commented Jul 16, 2025

Uh oh!

LeonEricsson left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Jul 20, 2025

Uh oh!

Uh oh!

Uh oh!

[GRPO] Fix: Processing ref logprobs in batches #3740

[GRPO] Fix: Processing ref logprobs in batches #3740

Uh oh!

Conversation

idanshen commented Jul 16, 2025

Uh oh!

LeonEricsson left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jul 20, 2025

Uh oh!

Uh oh!

Uh oh!