🎁 Reward submodule #3430

qgallouedec · 2025-05-09T17:33:05Z

This PR creates a sub-module dedicated to reward functions.

Please note that this PR is not intended to integrate all reward functions; this may be done in the future. Its purpose is simply to create a structure for future additions.

trl/rewards/format_rewards.py

HuggingFaceDocBuilderDev · 2025-05-09T19:05:12Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

lavaman131 · 2025-05-11T01:06:59Z

Hi @qgallouedec, thanks for setting up the boilerplate.

Are you intending the rewards to be importing as paths to the module or strings in a reward function registry? (that is the impression I got from skimming your changes). As a user, I would think it would be most intuitive to import reward callables from the reward module instead and include them as a List[Callable] type like the current API. Please correct me if I'm misunderstanding. That would also give very simple extensibility for custom reward functions.

If you do intend the registry option, maybe something similar to the gym register environment API would be interesting? For example, you could use a string argument for names of functions in the registry and have a static method to query the existing functions and their descriptions. And for custom reward functions, there could be a register_reward_fn which updates the registry.

qgallouedec · 2025-05-11T01:23:05Z

Are you intending the rewards to be importing as paths to the module or strings in a reward function registry?

No no, this is just for the script (ie when you run trl grpo --reward_funcs my_lib.my_reward), not for the trainer per se. The idea is not support this in the trainer

qgallouedec · 2025-05-11T01:24:01Z

If you do intend the registry option, maybe something similar to the gym register environment API would be interesting?

I'm strongly against this idea tbh. Gym registry is a nightmare and not pythonic at all

lavaman131 · 2025-05-11T01:41:02Z

If you do intend the registry option, maybe something similar to the gym register environment API would be interesting?

I'm strongly against this idea tbh. Gym registry is a nightmare and not pythonic at all

Ah ok 😅, fair enough.

So, to clarify, will the structure remain the same outside of the cli, where the user includes their rewards functions as a list of callables? But, now they can import from this module as well for common reward functions? (Just want to scope out how implementing rewards functions for the module would work).

qgallouedec · 2025-05-11T01:50:22Z

My idea is that everything remain the same for the user. The only difference would be that they would be able to do

from trl.rewards import think_format_reward

Instead of having to implement it themselves, or copy from another library

lavaman131 · 2025-05-11T01:54:34Z

Awesome, that makes sense and aligns with what I was thinking as well. Once this change is out, happy to work on adding to the submodule!

qgallouedec · 2025-05-11T02:16:04Z

Perfect! Thanks!

If you want to start now, you can you create a new branch from this one

lewtun

LGTM with a question on testing whether the think process can have a preamble (and should thus invalid)

tests/test_rewards.py

initial commit

d405fa2

lewtun reviewed May 9, 2025

View reviewed changes

trl/rewards/format_rewards.py Outdated Show resolved Hide resolved

qgallouedec marked this pull request as ready for review May 9, 2025 17:58

qgallouedec changed the title ~~Reward submodule~~ [DRAFT] Reward submodule May 9, 2025

qgallouedec added 3 commits May 9, 2025 18:16

registery in script

1953b6c

think_format_reward

335bcc7

fix doc

b7ec190

qgallouedec mentioned this pull request May 9, 2025

Feature: Implemented DAPO #3413

Closed

try to fix doc

b9bab38

qgallouedec added 4 commits May 9, 2025 19:07

fix doc rendering

c724f87

minor format fix

3783650

clarify doc

093f26e

fix script

f5b0ef8

qgallouedec requested review from lewtun, kashif, edbeeching and shirinyamani May 9, 2025 19:18

qgallouedec changed the title ~~[DRAFT] Reward submodule~~ 🎁 Reward submodule May 9, 2025

lewtun approved these changes May 14, 2025

View reviewed changes

tests/test_rewards.py Outdated Show resolved Hide resolved

qgallouedec and others added 2 commits May 15, 2025 17:34

Merge branch 'main' into rewards_submodule

391ed70

additional invalid test

e776045

qgallouedec merged commit 54d4f6b into main May 16, 2025
10 of 11 checks passed

qgallouedec deleted the rewards_submodule branch May 16, 2025 02:10

qgallouedec mentioned this pull request May 16, 2025

Rewards functions for Command Line Interfaces GRPO trainer #3448

Closed

shirinyamani pushed a commit that referenced this pull request May 16, 2025

🎁 Reward submodule (#3430)

ed70024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🎁 Reward submodule #3430

🎁 Reward submodule #3430

Uh oh!

qgallouedec commented May 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 9, 2025

Uh oh!

lavaman131 commented May 11, 2025

Uh oh!

qgallouedec commented May 11, 2025 •

edited

Loading

Uh oh!

qgallouedec commented May 11, 2025

Uh oh!

lavaman131 commented May 11, 2025 •

edited

Loading

Uh oh!

qgallouedec commented May 11, 2025

Uh oh!

lavaman131 commented May 11, 2025

Uh oh!

qgallouedec commented May 11, 2025

Uh oh!

lewtun left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

🎁 Reward submodule #3430

🎁 Reward submodule #3430

Uh oh!

Conversation

qgallouedec commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 9, 2025

Uh oh!

lavaman131 commented May 11, 2025

Uh oh!

qgallouedec commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec commented May 11, 2025

Uh oh!

lavaman131 commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec commented May 11, 2025

Uh oh!

lavaman131 commented May 11, 2025

Uh oh!

qgallouedec commented May 11, 2025

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qgallouedec commented May 9, 2025 •

edited

Loading

qgallouedec commented May 11, 2025 •

edited

Loading

lavaman131 commented May 11, 2025 •

edited

Loading