Keep PPOTrainer Value Model (Critic Model)

### Feature request

Proposed change: Add an option in PPOConfig called `save_value_model:bool=False`. If true, ppo_trainer's save_model will set `self.model` to the value model and save it after saving the `self.model.policy` as normal.

### Motivation

Right now, the PPOTrainer trains a classification model which takes a state and outputs an estimate for how much reward it expects the policy to earn in that state.
After training it, the value model is discarded.

That seems really weird to me! It seems extremely useful to have a classifier model which can predict how well your text generation model will do given a prompt. If the value model predicts a bad response before the policy generates a response, you can take precautions. 

For instance: 

- If the value model predicts that `model1` would do poorly, you can switch to `model2`, which may do better.
- You can have a content warning for responses expected to bother the user. For instance, if the user asks "What's the best religion and why?", then you could have a pop-up window saying "You have asked a sensitive question. Please note that the views of the model do not reflect the views of `our_corp`, and that we do our best to satisfy all our customers and stakeholders." (The question has no good answer because if the AI refuses to answer, that bothers people, and if it gives an answer, many people will be upset, so the value model would give that prompt a low value.)
- The Value model is an interesting object to research, and allowing users to save it will facilitate research.


### Your contribution

I'm eager to make contributions to trl, so if these changes would be helpful, I'm happy to implement it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Keep PPOTrainer Value Model (Critic Model) #3293

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Keep PPOTrainer Value Model (Critic Model) #3293

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions