-
Notifications
You must be signed in to change notification settings - Fork 501
feat: support resume training from ckpt #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only some of the .yaml files have added the parameters save_checkpoint
and load_checkpoint
, for example, align_anything/configs/train/text_to_text/dpo.yaml
does not have these two hyperparameters. It would be best to add these two parameters to the yaml files of modalities that support resuming from checkpoints.
There are still many YAML files that haven't fully added the parameters
Additionally, do algorithms like PPO and GRPO support checkpoint resumption? I noticed that
don't have checkpoint resumption parameters. These are just partial results from my checks, and some files may not be listed here. |
Thanks @XuyaoWang , I have fixed those comments |
It seems that currently there is no support for PPO checkpoint retraining, but in the latest commit, checkpoint retraining parameters were added to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Description
This PR includes the following changes:
Usage details:
save_checkpoint
to save the optimizer states and models:We set the default value of
save_checkpoint
asTrue
because we don't want new users to train once again since they have no idea of it.Motivation and Context
resolve #150
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!