Suggestions for Decouple gradient accumulation from the number of minibatches generated #3396

qgallouedec · 2025-05-01T06:31:24Z

Some suggestions for PR #3388:

Move generation_batch_size under the section Parameters that control generation
use int | None instead of int | str because the the CLI parser doesn't support multi-type (that's why the CI is failing)
rephrase some comments to be more precise (are we one the same page on these terms?):

Term	Size
Local batch	`per_device_train_batch_size`
Local accumulated batch	`per_device_train_batch_size * gradient_accumulation_steps`
Global batch	`per_device_train_batch_size * world_size`
Effective batch	`per_device_train_batch_size * world_size * gradient_accumulation_steps`
Local generation batch	`per_device_train_batch_size * steps_per_generation`
Generation batch	`per_device_train_batch_size * world_size * steps_per_generation`

since num_generations will make less sense to use with this PR, I removed it from the figure, and hopefully gained clarity
support the case self.args._steps_per_generation < self.args.gradient_accumulation_steps (not sure why you'd want that, but it just requires to change != into <.

HuggingFaceDocBuilderDev · 2025-05-01T06:41:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

edbeeching

Thanks for the suggestions, I will merge them

my suggestions

f1d7a98

qgallouedec requested a review from edbeeching May 1, 2025 06:31

qgallouedec added 3 commits May 1, 2025 06:33

fix term

8b4f4c2

consistency

c7099a8

minor changes

3a7d31e

qgallouedec changed the title ~~my suggestions~~ Suggestions for Decouple gradient accumulation from the number of minibatches generated May 1, 2025

edbeeching approved these changes May 2, 2025

View reviewed changes

edbeeching merged commit fefe183 into grpo-decouple-grad-acc May 2, 2025
10 checks passed

edbeeching deleted the grpo-decouple-grad-acc-suggestion branch May 2, 2025 06:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suggestions for Decouple gradient accumulation from the number of minibatches generated #3396

Suggestions for Decouple gradient accumulation from the number of minibatches generated #3396

Uh oh!

qgallouedec commented May 1, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented May 1, 2025

Uh oh!

edbeeching left a comment

Uh oh!

Uh oh!

Uh oh!

Suggestions for Decouple gradient accumulation from the number of minibatches generated #3396

Suggestions for Decouple gradient accumulation from the number of minibatches generated #3396

Uh oh!

Conversation

qgallouedec commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 1, 2025

Uh oh!

edbeeching left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

qgallouedec commented May 1, 2025 •

edited

Loading