Skip to content

☕️ GRPO script reward_funcs error #3639

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 25, 2025
Merged

Conversation

tcapelle
Copy link
Contributor

We should pass the list of functions to the Trainer, and not just the reward model (that is actually None most of the time)

@shirinyamani
Copy link
Member

Hi @tcapelle, thanks for pointing it.
But i think the reason is technically grpo supports both model and func in terms of acceptable reward, and in this specific example, since reward_funcs=[] while we have a reward_model, that is why we put the reward_model instead of the func.

    # Get the reward models and functions
    reward_funcs = []
    if script_args.reward_model_name_or_path:
        reward_model = AutoModelForSequenceClassification.from_pretrained(
            script_args.reward_model_name_or_path, trust_remote_code=model_args.trust_remote_code, num_labels=1
        )
        reward_funcs.append(reward_model)

    if script_args.reward_funcs:
        for func_name in script_args.reward_funcs:
            if func_name in reward_funcs_registry:
                reward_funcs.append(reward_funcs_registry[func_name])
            elif "." in func_name:
                module_path, func_name = func_name.rsplit(".", 1)
                sys.path.insert(0, os.getcwd())
                module = importlib.import_module(module_path)
                reward_func = getattr(module, func_name)
                reward_funcs.append(reward_func)
            else:
                raise ValueError(
                    f"Could not load reward function '{func_name}'. Expected one of "
                    f"{list(reward_funcs_registry.keys())} or a valid import path."
                ) 

@tcapelle
Copy link
Contributor Author

As of right now, if you pass reward funcs they don't get passed to the Trainer.

@shirinyamani
Copy link
Member

@tcapelle correct!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@shirinyamani shirinyamani changed the title GRPO script error ☕️ GRPO script reward_funcs error Jun 25, 2025
@shirinyamani shirinyamani enabled auto-merge (squash) June 25, 2025 14:46
@shirinyamani shirinyamani self-requested a review June 25, 2025 14:46
@shirinyamani shirinyamani merged commit 0336e4b into huggingface:main Jun 25, 2025
9 of 10 checks passed
marcandrelarochelle pushed a commit to marcandrelarochelle/trl that referenced this pull request Jul 29, 2025
Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants