-
Notifications
You must be signed in to change notification settings - Fork 25.1k
Description
🐛 Bug
When calling the forward function of a Module, some memory is allocated that is not de-allocated at the end of the thread.
To Reproduce
Steps to reproduce the behavior:
Module scripted from Python as in tutoriel:
import torchvision
import torch
model = torchvision.models.resnet18()
example = torch.rand(1,3,224,224)
my_torchscript_module = torch.jit.trace(model, example)
torch.jit.save(my_torchscript_module, "sciptedModule.pt")
Loaded and ran in C++ in separate thread:
#include "torch/script.h"
#include "torch/torch.h"
void runModel(at::Tensor, torch::jit::script::Module);
int main()
{
torch::NoGradGuard no_guard;
torch::jit::script::Module m_module = torch::jit::load("./sciptedModule.pt");
m_module.eval();
at::Tensor testTensor = torch::rand({ 1,3,224,224}, at::kFloat);
testTensor = testTensor.div(testTensor.norm());
for (int i = 0; i < 10000; i++) {
std::thread newThread(&runModel, testTensor, m_module);
newThread.join();
}
}
void runModel(at::Tensor testTensor, torch::jit::script::Module m_module) {
torch::NoGradGuard no_guard;
at::Tensor out = m_module.forward({ testTensor }).toTensor().detach();
}
Expected behavior
Inference is done in separate thread with no increase in memory
Environment
PyTorch version: 1.2.0
Is debug build: No
CUDA used to build PyTorch: None
OS: Microsoft Windows 10 Home
GCC version: Could not collect
CMake version: version 3.12.2
Python version: 3.6
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Versions of relevant libraries:
[pip] numpy==1.16.2
[pip] numpydoc==0.8.0
[pip] torch==1.2.0
[pip] torchvision==0.4.0
[conda] _tflow_1100_select 0.0.3 mkl
[conda] _tflow_select 2.3.0 mkl
[conda] blas 1.0 mkl
[conda] cpuonly 1.0 0 pytorch
[conda] libmklml 2019.0.3 0
[conda] mkl 2019.1 144
[conda] mkl-include 2019.1 144
[conda] mkl-service 1.1.2 py36hb782905_5
[conda] mkl_fft 1.0.10 py36h14836fe_0
[conda] mkl_random 1.0.2 py36h343c172_0
[conda] pytorch 1.2.0 py3.6_cpu_1 [cpuonly] pytorch
[conda] tensorflow-base 1.10.0 mkl_py36h81393da_0
[conda] torchvision 0.4.0 py36_cpu [cpuonly] pytorch
Additional context
When running on main thread, the memory seems to be allocated once on first call and then re-used.
Python threading doesn't have this problem