Skip to content

Add Adaptive Entropy Control to GRPOTrainer #3320

@cuts2k

Description

@cuts2k

Feature request

Reading Skywork's blogpost on their OR1 model I found this interesting modification where they use entropy as an extra regularization but with a dynamic weight such that it would try to aim for a specific value.
The relevant part of the blog is here:
https://capricious-hydrogen-41c.notion.site/Skywork-Open-Reasoner-Series-1d0bc9ae823a80459b46c149e4f51680?pvs=25#1d1bc9ae823a801592a0c3891ea5328f
While I haven't tested this myself this looks like a promising enhancement to the GRPOTrainer and hopefully shouldn't be very hard to add.

Motivation

Just thought I'd point out an interesting enhancement in a likely overlooked post. I'm not in any way associated with Skywork and haven't tested this, but theoretically makes sense.

Your contribution

Unlikely I'll be able to help, sorry.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions