Skip to content

Conversation

Root may not exist due to FSDP lazy initialization.

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Mar 8, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121544

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0a21915 with merge base 34a28f0 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

fegin added a commit that referenced this pull request Mar 8, 2024
Root may not exist due to FSDP lazy initialization.

ghstack-source-id: d01961f
Pull Request resolved: #121544
@github-actions github-actions bot added oncall: distributed Add this issue/PR to distributed oncall triage queue module: distributed_checkpoint labels Mar 8, 2024
@fegin fegin requested a review from wz337 March 8, 2024 21:25
@fegin fegin added ciflow/trunk Trigger trunk jobs on your pull request release notes: distributed (checkpoint) labels Mar 8, 2024
@fegin fegin requested a review from LucasLLC March 8, 2024 21:26
@fegin
Copy link
Contributor Author

fegin commented Mar 12, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

wz337 added a commit that referenced this pull request Mar 14, 2024


cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj tianyu-l wconstab yf225 chauhang LucasLLC

Thanks fegin for removing the fsdp root module check in DCP to unblock test updates. #121544

This PR adds "optimzer_class" as a kwarg for the subtests of the following tests to add AdamW as an option.

- test_fsdp
- test_compiled_fsdp
- test_fsdp2
- test_ddp
- test_fsdp_ddp
- test_cpu_offload_full_state_dict

In addition, we temporarily remove the two _verify_osd_by_load in _test_save_load, as state dict loading seems affect parameters. Creating an issue #121186 to keep track.

[ghstack-poisoned]
wz337 added a commit that referenced this pull request Mar 14, 2024


cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj tianyu-l wconstab yf225 chauhang LucasLLC

Thanks fegin for removing the fsdp root module check in DCP to unblock test updates. #121544

This PR adds "optimzer_class" as a kwarg for the subtests of the following tests to add AdamW as an option.

- test_fsdp
- test_compiled_fsdp
- test_fsdp2
- test_ddp
- test_fsdp_ddp
- test_cpu_offload_full_state_dict

In addition, we temporarily remove the two _verify_osd_by_load in _test_save_load, as state dict loading seems affect parameters. Creating an issue #121186 to keep track.

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Mar 15, 2024
Thanks @fegin for removing the fsdp root module check in DCP to unblock test updates. #121544

This PR adds "optimzer_class" as a kwarg for the subtests of the following tests to add AdamW as an option.

- test_fsdp
- test_compiled_fsdp
- test_fsdp2
- test_ddp
- test_fsdp_ddp
- test_cpu_offload_full_state_dict

In addition, we temporarily remove the two _verify_osd_by_load in _test_save_load, as state dict loading seems affect parameters. Creating an issue #121186 to keep track.
Pull Request resolved: #121774
Approved by: https://github.com/Skylion007
ghstack dependencies: #121773
@github-actions github-actions bot deleted the gh/fegin/219/head branch April 12, 2024 01:52
mvpatel2000 pushed a commit to mvpatel2000/pytorch that referenced this pull request May 17, 2024
Root may not exist due to FSDP lazy initialization.

Pull Request resolved: pytorch#121544
Approved by: https://github.com/Skylion007
ghstack dependencies: pytorch#121273, pytorch#121276, pytorch#121290
atalman pushed a commit that referenced this pull request May 27, 2024
Root may not exist due to FSDP lazy initialization.

Pull Request resolved: #121544
Approved by: https://github.com/Skylion007
ghstack dependencies: #121273, #121276, #121290

Co-authored-by: Chien-Chin Huang <chienchin@fb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (checkpoint)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants