🛠️ Initialize reward_kwargs to prevent UnboundLocalError in GRPOTrainer #3459
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change initializes the reward_kwargs variable as an empty dictionary to avoid potential UnboundLocalError during its usage in the GRPOTrainer class. This ensures that the variable is defined before it is accessed.
What does this PR do?
This PR addresses a potential
UnboundLocalError
that can occur intrl.trainer.GRPOTrainer._generate_and_score_completions
.The
reward_kwargs
dictionary is intended to store additional arguments (beyond prompt and completion) that are passed to custom, non-module-based reward functions. This dictionary is populated within anelse
block when iterating throughself.reward_funcs
.If
self.reward_funcs
is empty (e.g., when a downstream library or user provides an empty list, intending to bypass TRL's internal reward calculation and supply pre-computed rewards/advantages) or if all provided reward functions are instances oftorch.nn.Module
, theelse
block that definesreward_kwargs
is never entered.Later in the same method, there's a warning mechanism that checks if all reward functions returned
None
for a sample. This warning message attempts to accessreward_kwargs
to provide context:row_reward_kwargs = {key: value[nan_row_idx] for key, value in reward_kwargs.items()}
.If
reward_kwargs
was not initialized due to the conditions mentioned above, this access results in anUnboundLocalError
, crashing the process.This PR fixes the error by ensuring
reward_kwargs
is always defined by initializing it to an empty dictionary ({}
) before the loop that processes reward functions. This allows the warning logic to execute safely, even if no custom reward functions populatereward_kwargs
.