👎 [GRPO] Adds option to disable dropout #3234

edbeeching · 2025-04-04T11:15:17Z

What does this PR do?

Adds an option to disable dropout.

The RLOOTrainer disables dropout in policy, ref_model and reward model. This PR adds the option to disable dropout to the GRPOTrainer, which may improve training stability.

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Provides more insight about why this option may improve training stability:

We disable the dropout layers during training, similar to the settings in Ziegler et al. (2019); Huang
et al. (2024). This is important for PPO training, especially because with dropout activated, the log
probabilities of tokens will not be reproducible, making calculating the KL penalty unreliable while
also causing the ratios of the PPO to be not 1s during the first epoch, causing PPO optimization
problems. For consistency, we also disable dropout for SFT and RM training.

HuggingFaceDocBuilderDev · 2025-04-04T11:19:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

lewtun

Nice feature - LGTM.

lewtun · 2025-04-04T12:23:17Z

trl/trainer/grpo_trainer.py

@@ -359,6 +366,10 @@ def __init__(
                reward_funcs[i] = AutoModelForSequenceClassification.from_pretrained(
                    reward_func, num_labels=1, **model_init_kwargs
                )
+                if args.disable_dropout:
+                    if isinstance(reward_funcs[i], nn.Module):


I think AutoModelForSequenceClassification is loaded in eval model by default, so technically we don't need this here (happy to keep it though if we want to be safe)

Indeed!

>>> from transformers import AutoModelForSequenceClassification >>> model = AutoModelForSequenceClassification.from_pretrained("trl-lib/Qwen2-0.5B-Reward", num_labels=1) >>> model.training False

qgallouedec · 2025-04-04T13:59:50Z

trl/trainer/grpo_config.py

@@ -101,6 +101,9 @@ class GRPOConfig(TrainingArguments):
            speed, but may be numerically unstable for long training runs.
        num_iterations (`int`, *optional*, defaults to `1`):
            Number of iterations per batch (denoted as μ in the algorithm).
+        disable_dropout (`bool`, *optional*, defaults to `False`):


In the other trainers this is set to True, maybe we should do the same here?

qgallouedec · 2025-04-04T14:02:49Z

trl/trainer/grpo_trainer.py

+        if args.disable_dropout:
+            if isinstance(model, nn.Module):
+                disable_dropout_in_model(model)
+            if self.ref_model is not None and isinstance(self.ref_model, nn.Module):
+                disable_dropout_in_model(self.ref_model)
+


we only support PreTrainedModel, which are nn.Module (what else could it be?)

Suggested change

if args.disable_dropout:

if isinstance(model, nn.Module):

disable_dropout_in_model(model)

if self.ref_model is not None and isinstance(self.ref_model, nn.Module):

disable_dropout_in_model(self.ref_model)

if args.disable_dropout:

disable_dropout_in_model(model)

if self.ref_model is not None:

disable_dropout_in_model(self.ref_model)

qgallouedec

LGTM, in the future we could use a default to True (let's see if it improve stability)

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

adds option to disable dropout

b63aafd

edbeeching requested review from qgallouedec and lewtun April 4, 2025 11:15

lewtun approved these changes Apr 4, 2025

View reviewed changes

qgallouedec reviewed Apr 4, 2025

View reviewed changes

qgallouedec and others added 3 commits April 8, 2025 08:18

Merge branch 'main' into grpo-disable-dropout

7375b13

Merge branch 'main' into grpo-disable-dropout

5efca5e

minor

eb06418

qgallouedec approved these changes Apr 9, 2025

View reviewed changes

modelS

402871d

qgallouedec changed the title ~~[GRPO] Adds option to disable dropout~~ 👎 [GRPO] Adds option to disable dropout Apr 9, 2025

qgallouedec merged commit 47b9515 into main Apr 9, 2025
8 of 10 checks passed

qgallouedec deleted the grpo-disable-dropout branch April 9, 2025 16:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

👎 [GRPO] Adds option to disable dropout #3234

👎 [GRPO] Adds option to disable dropout #3234

Uh oh!

edbeeching commented Apr 4, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 4, 2025

Uh oh!

lewtun left a comment

Uh oh!

lewtun Apr 4, 2025

Uh oh!

qgallouedec Apr 4, 2025

Uh oh!

qgallouedec Apr 4, 2025

Uh oh!

qgallouedec Apr 4, 2025

Uh oh!

qgallouedec left a comment

Uh oh!

Uh oh!

Uh oh!

👎 [GRPO] Adds option to disable dropout #3234

👎 [GRPO] Adds option to disable dropout #3234

Uh oh!

Conversation

edbeeching commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 4, 2025

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

lewtun Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

qgallouedec Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

qgallouedec Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

qgallouedec Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

edbeeching commented Apr 4, 2025 •

edited

Loading