-
Notifications
You must be signed in to change notification settings - Fork 2.1k
🎀 New defaults: preparing the new structure #3530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ingface/trl into gradient_checkpointing_true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
A preparatory PR to standardize and document overridden default training arguments across multiple trainer config classes.
- Unifies class docstrings to reference full
TrainingArguments
docs and note potential default differences. - Introduces or updates a “# Parameters whose default values are overridden from TrainingArguments” section in each config.
- Removes duplicate
learning_rate
definitions and centralizes metadata for overridden defaults.
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
trl/trainer/utils.py | Added class doc updates and override‐defaults placeholder |
trl/trainer/sft_config.py | Unified docstring, moved learning_rate override section |
trl/trainer/rloo_config.py | Updated docstring to reference base classes and defaults note |
trl/trainer/reward_config.py | Added doc intro and override‐defaults placeholder |
trl/trainer/prm_config.py | Unified docstring, migrated learning_rate override section |
trl/trainer/ppo_config.py | Updated docstring to mention OnPolicyConfig and defaults note |
trl/trainer/orpo_config.py | Unified docstring, updated override‐defaults section |
trl/trainer/online_dpo_config.py | Updated doc intro and override‐defaults section |
trl/trainer/kto_config.py | Unified docstring, added override‐defaults placeholder |
trl/trainer/iterative_sft_config.py | Updated docstring and added override‐defaults placeholder |
trl/trainer/grpo_config.py | Migrated learning_rate override section and cleaned duplicates |
trl/trainer/gkd_config.py | Updated doc intro (missing override‐defaults section) |
trl/trainer/dpo_config.py | Introduced override‐defaults section, removed duplicate field |
trl/trainer/cpo_config.py | Unified docstring and override‐defaults section |
trl/trainer/bco_config.py | Added doc intro and override‐defaults placeholder |
Comments suppressed due to low confidence (1)
trl/trainer/gkd_config.py:26
- This class omits the '# Parameters whose default values are overridden from TrainingArguments' section present in other configs. For consistency, either add the header (even if empty) or remove all such placeholders across classes.
class GKDConfig(SFTConfig):
trl/trainer/utils.py
Outdated
@@ -1063,6 +1067,9 @@ class OnPolicyConfig(TrainingArguments): | |||
Whether to push the model to the Hub after training. | |||
""" | |||
|
|||
# Parameters whose default values are overridden from TrainingArguments | |||
# None yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The placeholder comment '# None yet' is ambiguous. Consider either removing it when no defaults are overridden or replacing it with a clearer note like 'No default overrides currently'.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make a suggestion so that I can just commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'll have to take care of it myself 😢
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this PR is setting the new LR, and for all the chaged trainers are almost identical to 1e-6
except for sft which is 2e-5
, curious to know why its lower in sft case?
On average, these values should work best. In any case, these are more standard values. |
A preparatory PR to cleanly overwrite the default values of the training argument.