Skip to content

Conversation

teilomillet
Copy link
Contributor

This change initializes the reward_kwargs variable as an empty dictionary to avoid potential UnboundLocalError during its usage in the GRPOTrainer class. This ensures that the variable is defined before it is accessed.

What does this PR do?

This PR addresses a potential UnboundLocalError that can occur in trl.trainer.GRPOTrainer._generate_and_score_completions.

The reward_kwargs dictionary is intended to store additional arguments (beyond prompt and completion) that are passed to custom, non-module-based reward functions. This dictionary is populated within an else block when iterating through self.reward_funcs.

If self.reward_funcs is empty (e.g., when a downstream library or user provides an empty list, intending to bypass TRL's internal reward calculation and supply pre-computed rewards/advantages) or if all provided reward functions are instances of torch.nn.Module, the else block that defines reward_kwargs is never entered.

Later in the same method, there's a warning mechanism that checks if all reward functions returned None for a sample. This warning message attempts to access reward_kwargs to provide context:
row_reward_kwargs = {key: value[nan_row_idx] for key, value in reward_kwargs.items()}.
If reward_kwargs was not initialized due to the conditions mentioned above, this access results in an UnboundLocalError, crashing the process.

This PR fixes the error by ensuring reward_kwargs is always defined by initializing it to an empty dictionary ({}) before the loop that processes reward functions. This allows the warning logic to execute safely, even if no custom reward functions populate reward_kwargs.

teilomillet and others added 2 commits May 16, 2025 15:47
This change initializes the reward_kwargs variable as an empty dictionary to avoid potential UnboundLocalError during its usage in the GRPOTrainer class. This ensures that the variable is defined before it is accessed.
Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec qgallouedec changed the title Initialize reward_kwargs to prevent UnboundLocalError in GRPOTrainer 🛠️ Initialize reward_kwargs to prevent UnboundLocalError in GRPOTrainer May 27, 2025
@qgallouedec qgallouedec merged commit d1174ad into huggingface:main May 27, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants