-
Notifications
You must be signed in to change notification settings - Fork 2.1k
🎁 Reward submodule #3430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎁 Reward submodule #3430
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hi @qgallouedec, thanks for setting up the boilerplate. Are you intending the rewards to be importing as paths to the module or strings in a reward function registry? (that is the impression I got from skimming your changes). As a user, I would think it would be most intuitive to import reward callables from the reward module instead and include them as a If you do intend the registry option, maybe something similar to the gym register environment API would be interesting? For example, you could use a string argument for names of functions in the registry and have a static method to query the existing functions and their descriptions. And for custom reward functions, there could be a |
No no, this is just for the script (ie when you run |
I'm strongly against this idea tbh. Gym registry is a nightmare and not pythonic at all |
Ah ok 😅, fair enough. So, to clarify, will the structure remain the same outside of the cli, where the user includes their rewards functions as a list of callables? But, now they can import from this module as well for common reward functions? (Just want to scope out how implementing rewards functions for the module would work). |
My idea is that everything remain the same for the user. The only difference would be that they would be able to do from trl.rewards import think_format_reward Instead of having to implement it themselves, or copy from another library |
Awesome, that makes sense and aligns with what I was thinking as well. Once this change is out, happy to work on adding to the submodule! |
Perfect! Thanks! If you want to start now, you can you create a new branch from this one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a question on testing whether the think process can have a preamble (and should thus invalid)
This PR creates a sub-module dedicated to reward functions.
Please note that this PR is not intended to integrate all reward functions; this may be done in the future. Its purpose is simply to create a structure for future additions.