CUDA Memory leak w/ torch.compile in both stable and trunk

### 🐛 Describe the bug

models traced with torch.compile don't seem to be freeing CUDA memory

``` python
import torch
import gc

def main():
    x = torch.randn(1000, 3000, device="cuda", requires_grad=True)
    model = torch.nn.Sequential(
        torch.nn.Linear(3000, 10000),
        torch.nn.ReLU(),
        torch.nn.Linear(10000, 50000),
        torch.nn.ReLU(),
        torch.nn.Linear(50000, 20000),
        torch.nn.ReLU(),
        torch.nn.Linear(20000, 1234),
    ).to("cuda")
    model = torch.compile(model, backend="eager")
    model(x)

if __name__ == "__main__":
    main()

    # tried clearing with a few ways
    torch.cuda.synchronize()
    torch.cuda.empty_cache()
    torch._C._cuda_clearCublasWorkspaces()
    gc.collect()

    print(f"{torch.cuda.memory_allocated()/1e9} GB!!")  # 6.219729408 GB!!
```

one high priority use case to fix this is for compiled autograd, which calls torch.compile for compiled fw and once for compiled bw, leading to 2x memory use

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @bdhirsh @anijain2305 @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @aakhundov @Chillee 

### Versions

2.2.0
trunk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA Memory leak w/ torch.compile in both stable and trunk #119607

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA Memory leak w/ torch.compile in both stable and trunk #119607

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions