Skip to content

CUDA Memory leak w/ torch.compile in both stable and trunk #119607

@xmfan

Description

@xmfan

🐛 Describe the bug

models traced with torch.compile don't seem to be freeing CUDA memory

import torch
import gc

def main():
    x = torch.randn(1000, 3000, device="cuda", requires_grad=True)
    model = torch.nn.Sequential(
        torch.nn.Linear(3000, 10000),
        torch.nn.ReLU(),
        torch.nn.Linear(10000, 50000),
        torch.nn.ReLU(),
        torch.nn.Linear(50000, 20000),
        torch.nn.ReLU(),
        torch.nn.Linear(20000, 1234),
    ).to("cuda")
    model = torch.compile(model, backend="eager")
    model(x)

if __name__ == "__main__":
    main()

    # tried clearing with a few ways
    torch.cuda.synchronize()
    torch.cuda.empty_cache()
    torch._C._cuda_clearCublasWorkspaces()
    gc.collect()

    print(f"{torch.cuda.memory_allocated()/1e9} GB!!")  # 6.219729408 GB!!

one high priority use case to fix this is for compiled autograd, which calls torch.compile for compiled fw and once for compiled bw, leading to 2x memory use

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @bdhirsh @anijain2305 @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @aakhundov @Chillee

Versions

2.2.0
trunk

Metadata

Metadata

Assignees

Labels

high prioritymodule: dynamooncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions