Skip to content

Keep PPOTrainer Value Model (Critic Model) #3293

@AMindToThink

Description

@AMindToThink

Feature request

Proposed change: Add an option in PPOConfig called save_value_model:bool=False. If true, ppo_trainer's save_model will set self.model to the value model and save it after saving the self.model.policy as normal.

Motivation

Right now, the PPOTrainer trains a classification model which takes a state and outputs an estimate for how much reward it expects the policy to earn in that state.
After training it, the value model is discarded.

That seems really weird to me! It seems extremely useful to have a classifier model which can predict how well your text generation model will do given a prompt. If the value model predicts a bad response before the policy generates a response, you can take precautions.

For instance:

  • If the value model predicts that model1 would do poorly, you can switch to model2, which may do better.
  • You can have a content warning for responses expected to bother the user. For instance, if the user asks "What's the best religion and why?", then you could have a pop-up window saying "You have asked a sensitive question. Please note that the views of the model do not reflect the views of our_corp, and that we do our best to satisfy all our customers and stakeholders." (The question has no good answer because if the AI refuses to answer, that bothers people, and if it gives an answer, many people will be upset, so the value model would give that prompt a low value.)
  • The Value model is an interesting object to research, and allowing users to save it will facilitate research.

Your contribution

I'm eager to make contributions to trl, so if these changes would be helpful, I'm happy to implement it!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions