-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Closed
Labels
high prioritymodule: dynamooncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone
Description
🐛 Describe the bug
models traced with torch.compile don't seem to be freeing CUDA memory
import torch
import gc
def main():
x = torch.randn(1000, 3000, device="cuda", requires_grad=True)
model = torch.nn.Sequential(
torch.nn.Linear(3000, 10000),
torch.nn.ReLU(),
torch.nn.Linear(10000, 50000),
torch.nn.ReLU(),
torch.nn.Linear(50000, 20000),
torch.nn.ReLU(),
torch.nn.Linear(20000, 1234),
).to("cuda")
model = torch.compile(model, backend="eager")
model(x)
if __name__ == "__main__":
main()
# tried clearing with a few ways
torch.cuda.synchronize()
torch.cuda.empty_cache()
torch._C._cuda_clearCublasWorkspaces()
gc.collect()
print(f"{torch.cuda.memory_allocated()/1e9} GB!!") # 6.219729408 GB!!
one high priority use case to fix this is for compiled autograd, which calls torch.compile for compiled fw and once for compiled bw, leading to 2x memory use
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @bdhirsh @anijain2305 @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @aakhundov @Chillee
Versions
2.2.0
trunk
Metadata
Metadata
Assignees
Labels
high prioritymodule: dynamooncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module