System memory leak on cuda GPU backend.

**Describe the bug**
System memory keeps increasing while using the CUDA GPU backend.

**Urgency**
very urgent

**System information**
- OS Platform and Distribution : Linux Ubuntu 16.04
- ONNX Runtime installed from (source or binary): pip install onnxruntime-gpu==1.8
- ONNX Runtime version: 1.8
- Python version: 3.7.10
- Visual Studio version (if applicable): No
- GCC/Compiler version (if compiling from source): - 
- CUDA/cuDNN version: 11.1
- GPU model and memory: A30, 24GB

**To Reproduce**

Please download the detection model from https://1drv.ms/u/s!AswpsDO2toNKsTYUYsyy9kdSZSfe?e=KPHWCL (onedrive link)
And then use the following code to test:

```
import numpy as np                                                                                                                                            
import onnxruntime                                                                                                                                            
import cv2                                                                                                                                                    
                                                                                                                                                              
model_file = 'scrfd_10g_bnkps.onnx'                                                                                                                                    
session = onnxruntime.InferenceSession(model_file, None)                                                                                                      
input_cfg = session.get_inputs()[0]                                                                                                                           
input_shape = input_cfg.shape                                                                                                                                 
input_name = input_cfg.name                                                                                                                                   
outputs = session.get_outputs()                                                                                                                               
output_names = []                                                                                                                                             
for o in outputs:                                                                                                                                             
    output_names.append(o.name)                                                                                                                               
img = np.random.randint(0, 255, size=(640,640,3), dtype=np.uint8)                                                                                             
input_std = 128.0                                                                                                                                             
input_mean = 127.5                                                                                                                                            
blob = cv2.dnn.blobFromImage(img, 1.0/input_std, (640, 640), (input_mean, input_mean, input_mean), swapRB=True)                                               
for _ in range(1000000):                                                                                                                                      
    net_outs = session.run(output_names, {input_name : blob})                                                                                                 
    pred = net_outs[0]
```

The leak is happening at `pred = net_outs[0]`. If we omit this line, there's no memory leak.
Also, 
1. If we use CPU backend by setting `session.set_providers(['CPUExecutionProvider'])`, no memory leak.
2. If we use cuda10.2 and onnxruntime-gpu==1.6, no memory leak.

**Expected behavior**
system memory cost is stable

**Screenshots**
-

**Additional context**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

System memory leak on cuda GPU backend. #8147

Screenshots

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

System memory leak on cuda GPU backend. #8147

Description

Screenshots

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions