Skip to content

🎁 Reward submodule #3430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 16, 2025
Merged

🎁 Reward submodule #3430

merged 11 commits into from
May 16, 2025

Conversation

qgallouedec
Copy link
Member

@qgallouedec qgallouedec commented May 9, 2025

This PR creates a sub-module dedicated to reward functions.

Please note that this PR is not intended to integrate all reward functions; this may be done in the future. Its purpose is simply to create a structure for future additions.

@qgallouedec qgallouedec marked this pull request as ready for review May 9, 2025 17:58
@qgallouedec qgallouedec changed the title Reward submodule [DRAFT] Reward submodule May 9, 2025
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec qgallouedec changed the title [DRAFT] Reward submodule 🎁 Reward submodule May 9, 2025
@lavaman131
Copy link

Hi @qgallouedec, thanks for setting up the boilerplate.

Are you intending the rewards to be importing as paths to the module or strings in a reward function registry? (that is the impression I got from skimming your changes). As a user, I would think it would be most intuitive to import reward callables from the reward module instead and include them as a List[Callable] type like the current API. Please correct me if I'm misunderstanding. That would also give very simple extensibility for custom reward functions.

If you do intend the registry option, maybe something similar to the gym register environment API would be interesting? For example, you could use a string argument for names of functions in the registry and have a static method to query the existing functions and their descriptions. And for custom reward functions, there could be a register_reward_fn which updates the registry.

@qgallouedec
Copy link
Member Author

qgallouedec commented May 11, 2025

Are you intending the rewards to be importing as paths to the module or strings in a reward function registry?

No no, this is just for the script (ie when you run trl grpo --reward_funcs my_lib.my_reward), not for the trainer per se. The idea is not support this in the trainer

@qgallouedec
Copy link
Member Author

If you do intend the registry option, maybe something similar to the gym register environment API would be interesting?

I'm strongly against this idea tbh. Gym registry is a nightmare and not pythonic at all

@lavaman131
Copy link

lavaman131 commented May 11, 2025

If you do intend the registry option, maybe something similar to the gym register environment API would be interesting?

I'm strongly against this idea tbh. Gym registry is a nightmare and not pythonic at all

Ah ok 😅, fair enough.

So, to clarify, will the structure remain the same outside of the cli, where the user includes their rewards functions as a list of callables? But, now they can import from this module as well for common reward functions? (Just want to scope out how implementing rewards functions for the module would work).

@qgallouedec
Copy link
Member Author

My idea is that everything remain the same for the user. The only difference would be that they would be able to do

from trl.rewards import think_format_reward

Instead of having to implement it themselves, or copy from another library

@lavaman131
Copy link

Awesome, that makes sense and aligns with what I was thinking as well. Once this change is out, happy to work on adding to the submodule!

@qgallouedec
Copy link
Member Author

Perfect! Thanks!

If you want to start now, you can you create a new branch from this one

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a question on testing whether the think process can have a preamble (and should thus invalid)

@qgallouedec qgallouedec merged commit 54d4f6b into main May 16, 2025
10 of 11 checks passed
@qgallouedec qgallouedec deleted the rewards_submodule branch May 16, 2025 02:10
shirinyamani pushed a commit that referenced this pull request May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants