🛠️ Initialize reward_kwargs to prevent UnboundLocalError in GRPOTrainer #3459

teilomillet · 2025-05-16T13:52:14Z

This change initializes the reward_kwargs variable as an empty dictionary to avoid potential UnboundLocalError during its usage in the GRPOTrainer class. This ensures that the variable is defined before it is accessed.

What does this PR do?

This PR addresses a potential UnboundLocalError that can occur in trl.trainer.GRPOTrainer._generate_and_score_completions.

The reward_kwargs dictionary is intended to store additional arguments (beyond prompt and completion) that are passed to custom, non-module-based reward functions. This dictionary is populated within an else block when iterating through self.reward_funcs.

If self.reward_funcs is empty (e.g., when a downstream library or user provides an empty list, intending to bypass TRL's internal reward calculation and supply pre-computed rewards/advantages) or if all provided reward functions are instances of torch.nn.Module, the else block that defines reward_kwargs is never entered.

Later in the same method, there's a warning mechanism that checks if all reward functions returned None for a sample. This warning message attempts to access reward_kwargs to provide context:
row_reward_kwargs = {key: value[nan_row_idx] for key, value in reward_kwargs.items()}.
If reward_kwargs was not initialized due to the conditions mentioned above, this access results in an UnboundLocalError, crashing the process.

This PR fixes the error by ensuring reward_kwargs is always defined by initializing it to an empty dictionary ({}) before the loop that processes reward functions. This allows the warning logic to execute safely, even if no custom reward functions populate reward_kwargs.

This change initializes the reward_kwargs variable as an empty dictionary to avoid potential UnboundLocalError during its usage in the GRPOTrainer class. This ensures that the variable is defined before it is accessed.

qgallouedec

Thanks!

HuggingFaceDocBuilderDev · 2025-05-27T01:12:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

teilomillet and others added 2 commits May 16, 2025 15:47

another way

b390cdf

qgallouedec approved these changes May 27, 2025

View reviewed changes

qgallouedec changed the title ~~Initialize reward_kwargs to prevent UnboundLocalError in GRPOTrainer~~ 🛠️ Initialize reward_kwargs to prevent UnboundLocalError in GRPOTrainer May 27, 2025

qgallouedec merged commit d1174ad into huggingface:main May 27, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🛠️ Initialize reward_kwargs to prevent UnboundLocalError in GRPOTrainer #3459

🛠️ Initialize reward_kwargs to prevent UnboundLocalError in GRPOTrainer #3459

Uh oh!

teilomillet commented May 16, 2025

Uh oh!

qgallouedec left a comment

Uh oh!

HuggingFaceDocBuilderDev commented May 27, 2025

Uh oh!

Uh oh!

Uh oh!

🛠️ Initialize reward_kwargs to prevent UnboundLocalError in GRPOTrainer #3459

🛠️ Initialize reward_kwargs to prevent UnboundLocalError in GRPOTrainer #3459

Uh oh!

Conversation

teilomillet commented May 16, 2025

What does this PR do?

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented May 27, 2025

Uh oh!

Uh oh!

Uh oh!