Skip to content

Conversation

LucasLLC
Copy link
Contributor

@LucasLLC LucasLLC commented Mar 6, 2024

[DCP] Removes `no_dist` and `coordinator_rank` from public DCP API's

Differential Revision: [D54591181](https://our.internmc.facebook.com/intern/diff/D54591181/)

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Mar 6, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121317

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit baf785e with merge base 967dd31 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54591181

LucasLLC added a commit that referenced this pull request Mar 6, 2024
[DCP] Removes `no_dist` and `coordinator_rank` from public DCP API's

Differential Revision: [D54591181](https://our.internmc.facebook.com/intern/diff/D54591181/)

ghstack-source-id: 217653051
Pull Request resolved: #121317
@github-actions github-actions bot added oncall: distributed Add this issue/PR to distributed oncall triage queue module: distributed_checkpoint labels Mar 6, 2024
@LucasLLC LucasLLC requested review from wz337 and fegin and removed request for wz337 March 6, 2024 17:48
@LucasLLC LucasLLC self-assigned this Mar 6, 2024
Copy link
Contributor

@fegin fegin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fegin fegin added the suppress-bc-linter Suppresses the failures of API backward-compatibility linter (Lint/bc_linter) label Mar 7, 2024
… DCP API's"

[DCP] Removes `no_dist` and `coordinator_rank` from public DCP API's

Differential Revision: [D54591181](https://our.internmc.facebook.com/intern/diff/D54591181/)

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54591181

… DCP API's"

[DCP] Removes `no_dist` and `coordinator_rank` from public DCP API's

Differential Revision: [D54591181](https://our.internmc.facebook.com/intern/diff/D54591181/)

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54591181

… DCP API's"

[DCP] Removes `no_dist` and `coordinator_rank` from public DCP API's

Differential Revision: [D54591181](https://our.internmc.facebook.com/intern/diff/D54591181/)

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54591181

… DCP API's"

[DCP] Removes `no_dist` and `coordinator_rank` from public DCP API's

Differential Revision: [D54591181](https://our.internmc.facebook.com/intern/diff/D54591181/)

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54591181

@LucasLLC
Copy link
Contributor Author

LucasLLC commented Mar 7, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 7, 2024
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@LucasLLC
Copy link
Contributor Author

LucasLLC commented Mar 7, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 3 jobs have failed, first few of them are: trunk, linux-binary-libtorch-pre-cxx11, linux-binary-manywheel

Details for Dev Infra team Raised by workflow job

@LucasLLC
Copy link
Contributor Author

LucasLLC commented Mar 8, 2024

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 3 checks: trunk, linux-binary-libtorch-pre-cxx11, linux-binary-manywheel

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Mar 8, 2024
Makes async_save public

Differential Revision: [D54593610](https://our.internmc.facebook.com/intern/diff/D54593610/)

Pull Request resolved: #121325
Approved by: https://github.com/wz337
ghstack dependencies: #121317
huydhn added a commit to pytorch/test-infra that referenced this pull request Mar 12, 2024
Fixes #4987

Dr.CI logic to detect `isInfraFlakyJob` and `isLogClassifierFailed` has
a FP where it misclassifies the GH failure to dispatch the whole
workflow as flaky, for example
pytorch/pytorch#121317.

These logic should only be applicable to workflow job, not workflow run.
The way to separate them is to check the `workflowId` field where it is
set to `null` whenever it is a workflow run.

### Testing

Unit test + local curl command will mark them as legit failures:

```
curl --request POST \
--url "http://localhost:3000/api/drci/drci?prNumber=121317" \
--header "Authorization: TOKEN" \
--data 'repo=pytorch'
```
@github-actions github-actions bot deleted the gh/lucasllc/15/head branch April 12, 2024 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (checkpoint) suppress-bc-linter Suppresses the failures of API backward-compatibility linter (Lint/bc_linter)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants