Skip to content

Conversation

fegin
Copy link
Contributor

@fegin fegin commented May 7, 2024

[ghstack-poisoned]
Copy link

pytorch-bot bot commented May 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125708

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 3d827e5 with merge base 196a0b1 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
Copy link
Contributor

@wz337 wz337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@fegin fegin added the ciflow/trunk Trigger trunk jobs on your pull request label May 7, 2024
@fegin
Copy link
Contributor Author

fegin commented May 8, 2024

@pytorchbot merge -f "The failing tests are not related."

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request May 8, 2024
…5338)

Summary:
This is useful if users would like to avoid CPU memory OOM when loading from a full state_dict.

Pull Request resolved: #125338
Approved by: https://github.com/weifengpy
ghstack dependencies: #125708
pytorchmergebot pushed a commit that referenced this pull request May 8, 2024
…25339)

Summary:
This is useful if users would like to avoid CPU memory OOM when loading from a full state_dict.

Pull Request resolved: #125339
Approved by: https://github.com/weifengpy
ghstack dependencies: #125708, #125338
@github-actions github-actions bot deleted the gh/fegin/239/head branch June 8, 2024 01:54
@Craigacp
Copy link

Craigacp commented Jun 18, 2024

Is there going to be a PyTorch 2.3.2, and if so would it be possible to get this fix in it? I've spent all day running down slight parameter differences in my model when loading checkpoints as this is called in get_optimizer_state_dict which is necessary to get the optimizer dictionary to load into with dist_cp.load. When I loaded in the optimizer checkpoint it changed the freshly loaded model checkpoint because it stepped an empty optimizer so weight decayed all my parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants