[BREAKING] config: set the default value of actor.entropy_coeff to 0 #1770
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Checklist Before Starting
What does this PR do?
entropy_coeff shall be set carefully during RL. When enabled, inappropriate coefficient may case training to collapse. You can see more empirical experiments from Skywork Open Reasoner 1 Technical Report (https://arxiv.org/pdf/2505.22312).
In this PR, the default value of entropy_coeff is set to 0. This is a breaking change that may affect your experiment, although majority of verl example scripts set it to 0 manually already.
We let most example script just pick up the default value of 0 for entropy_coeff. For a few documentation page where the reference model performance and commands are provided, we modify the doc so that the experiment result is consistent with the config setup.
Usage Example
To enable entropy loss coefficient, use
actor_rollout_ref.actor.entropy_coeff=0.001 # or other values
Test
Additional Info.
Checklist Before Submitting
[BREAKING]
to the PR title if it breaks any API.