🎀 [SFT][Bugfix] sets average_tokens_across_devices to true in SFTConfig #3538

edbeeching · 2025-06-04T18:37:36Z

What does this PR do?

Sets average_tokens_across_devices to true in SFTConfig, PRMConfig and RewardModelConfig.

But defautl In a multi-device setting, the training loss is not compariable even you use the same effective batch size. (world_size * per_device_train_batch_size * gradient_accumulation_steps)

Example loss curves for 8,4,2,1 nodes (green, pink, grey, peach respectively).

This is because in transformers.TrainingArguments average_tokens_across_devices=False by default. Setting this to true achieves parity and lower training loss in different multi-device / node. Shown in the additional blue curve below. I tested 8 vs 64 GPUs (1 vs 8 nodes) and the training curves were near identical.

HuggingFaceDocBuilderDev · 2025-06-04T18:41:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

trl/trainer/prm_config.py

trl/trainer/reward_config.py

trl/trainer/sft_config.py

trl/trainer/prm_config.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

qgallouedec

LGTM!

set average_tokens_across_devices to true in SFTConfig

90ccb83

edbeeching requested a review from qgallouedec June 4, 2025 18:37

edbeeching and others added 2 commits June 4, 2025 18:46

change the default in PRM config and RM config

27c3ce4

Merge branch 'main' into fix-average-tokens-across-devices

02cb2b6

edbeeching requested a review from shirinyamani June 4, 2025 18:55

qgallouedec reviewed Jun 4, 2025

View reviewed changes

shirinyamani reviewed Jun 4, 2025

View reviewed changes

trl/trainer/prm_config.py Outdated Show resolved Hide resolved

edbeeching and others added 7 commits June 4, 2025 21:32

Update trl/trainer/prm_config.py

beffdc6

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

Update trl/trainer/reward_config.py

cf3c4fb

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

Update trl/trainer/reward_config.py

ffe4bc0

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

Update trl/trainer/sft_config.py

49b9c6b

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

Update trl/trainer/prm_config.py

e987682

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

Update trl/trainer/sft_config.py

ce1d6fc

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

reorder

cc315d1

qgallouedec approved these changes Jun 4, 2025

View reviewed changes

qgallouedec changed the title ~~[SFT][Bugfix] sets average_tokens_across_devices to true in SFTConfig~~ 🎀 [SFT][Bugfix] sets average_tokens_across_devices to true in SFTConfig Jun 4, 2025

qgallouedec merged commit 0333108 into main Jun 4, 2025
11 checks passed

qgallouedec deleted the fix-average-tokens-across-devices branch June 4, 2025 21:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🎀 [SFT][Bugfix] sets average_tokens_across_devices to true in SFTConfig #3538

🎀 [SFT][Bugfix] sets average_tokens_across_devices to true in SFTConfig #3538

Uh oh!

edbeeching commented Jun 4, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qgallouedec left a comment

Uh oh!

Uh oh!

Uh oh!

🎀 [SFT][Bugfix] sets average_tokens_across_devices to true in SFTConfig #3538

🎀 [SFT][Bugfix] sets average_tokens_across_devices to true in SFTConfig #3538

Uh oh!

Conversation

edbeeching commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

edbeeching commented Jun 4, 2025 •

edited

Loading