-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Open
Labels
module: c10dIssues/PRs related to collective communications and process groupsIssues/PRs related to collective communications and process groupsoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Describe the bug
A minimal reproducible example:
import torch
import torch.distributed as dist
dist.init_process_group(backend='gloo')
# dist.init_process_group(backend='nccl')
# torch.cuda.set_device(dist.get_rank())
with torch.inference_mode():
data = [torch.ones((3, 3))] * dist.get_world_size()
obj = data[dist.get_rank()]
dist.all_gather(data, obj)
# dist.broadcast(obj, src=0)
The error is:
E RuntimeError: Inplace update to inference tensor outside InferenceMode is not allowed.You can make a clone to get a normal tensor before doing inplace update.See pytorch/rfcs#17 for more details.
It looks strange, that nccl
backend works in this case. broadcast
works, too. Only all_gather
does not work.
Versions
pytorch 2.3.0
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k
jetjodh
Metadata
Metadata
Assignees
Labels
module: c10dIssues/PRs related to collective communications and process groupsIssues/PRs related to collective communications and process groupsoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module