Add Adaptive Entropy Control to GRPOTrainer

### Feature request

Reading Skywork's blogpost on their OR1 model I found this interesting modification where they use entropy as an extra regularization but with a dynamic weight such that it would try to aim for a specific value. 
The relevant part of the blog is here:
https://capricious-hydrogen-41c.notion.site/Skywork-Open-Reasoner-Series-1d0bc9ae823a80459b46c149e4f51680?pvs=25#1d1bc9ae823a801592a0c3891ea5328f
While I haven't tested this myself this looks like a promising enhancement to the GRPOTrainer and hopefully shouldn't be very hard to add.

### Motivation

Just thought I'd point out an interesting enhancement in a likely overlooked post. I'm not in any way associated with Skywork and haven't tested this, but theoretically makes sense.

### Your contribution

Unlikely I'll be able to help, sorry.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Adaptive Entropy Control to GRPOTrainer #3320

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Adaptive Entropy Control to GRPOTrainer #3320

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions