-
Notifications
You must be signed in to change notification settings - Fork 25.1k
Open
Labels
module: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generalmodule: memory usagePyTorch is using more memory than it should, or it is leaking memoryPyTorch is using more memory than it should, or it is leaking memorytriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
π Bug
When building models with transformers pytorch says my GPU does not have memory without plenty of memory being there at disposal. I have been trying to tackle this problem for some time now, I have tried switching os, lowering batch sizes etc/ Every time (both on personal machine and cluster) it gives me error like this:
RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 6.00 GiB total capacity; 4.26 GiB already allocated; 0 bytes free; 4.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
It always happens at the first/second step
There has ben similar problems but I did not find any solution for this. I also described my problem here but now I think it is more of problem with pytorch.
To Reproduce
Steps to reproduce the behavior:
- I followed https://huggingface.co/transformers/training.html tutorial
Expected behavior
I would expect the tutorial to work. I expect there to be some logical answer to the problem.
Environment
PyTorch version: 1.10.0+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.8.11 (default, Nov 2 2021, 10:56:09) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.10.60.1-microsoft-standard-WSL2-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1660 Ti
Nvidia driver version: 510.06
cuDNN version: Probably one of the following:
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.19.5
[pip3] torch==1.10.0
[pip3] torchaudio==0.8.2
[pip3] torchvision==0.10.1
[conda] Could not collect
Additional context
Putting all info if this helps:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_1721/1645165363.py in <module>
7 for batch in train_dataloader:
8 batch = {k: v.to(device) for k, v in batch.items()}
----> 9 outputs = model(**batch)
10 loss = outputs.loss
11 loss.backward()
~/.cache/pypoetry/virtualenvs/mars-48yr609M-py3.8/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []
~/.cache/pypoetry/virtualenvs/mars-48yr609M-py3.8/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
1500 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1501
-> 1502 outputs = self.bert(
1503 input_ids,
1504 attention_mask=attention_mask,
~/.cache/pypoetry/virtualenvs/mars-48yr609M-py3.8/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []
~/.cache/pypoetry/virtualenvs/mars-48yr609M-py3.8/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
969 past_key_values_length=past_key_values_length,
970 )
--> 971 encoder_outputs = self.encoder(
972 embedding_output,
973 attention_mask=extended_attention_mask,
~/.cache/pypoetry/virtualenvs/mars-48yr609M-py3.8/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []
~/.cache/pypoetry/virtualenvs/mars-48yr609M-py3.8/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
566 )
567 else:
--> 568 layer_outputs = layer_module(
569 hidden_states,
570 attention_mask,
~/.cache/pypoetry/virtualenvs/mars-48yr609M-py3.8/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []
~/.cache/pypoetry/virtualenvs/mars-48yr609M-py3.8/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
454 # decoder uni-directional self-attention cached key/values tuple is at positions 1,2
455 self_attn_past_key_value = past_key_value[:2] if past_key_value is not None else None
--> 456 self_attention_outputs = self.attention(
457 hidden_states,
458 attention_mask,
~/.cache/pypoetry/virtualenvs/mars-48yr609M-py3.8/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []
~/.cache/pypoetry/virtualenvs/mars-48yr609M-py3.8/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
385 output_attentions=False,
386 ):
--> 387 self_outputs = self.self(
388 hidden_states,
389 attention_mask,
~/.cache/pypoetry/virtualenvs/mars-48yr609M-py3.8/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []
~/.cache/pypoetry/virtualenvs/mars-48yr609M-py3.8/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
317 # This is actually dropping out entire tokens to attend to, which might
318 # seem a bit unusual, but is taken from the original Transformer paper.
--> 319 attention_probs = self.dropout(attention_probs)
320
321 # Mask heads if we want to
~/.cache/pypoetry/virtualenvs/mars-48yr609M-py3.8/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []
~/.cache/pypoetry/virtualenvs/mars-48yr609M-py3.8/lib/python3.8/site-packages/torch/nn/modules/dropout.py in forward(self, input)
56
57 def forward(self, input: Tensor) -> Tensor:
---> 58 return F.dropout(input, self.p, self.training, self.inplace)
59
60
~/.cache/pypoetry/virtualenvs/mars-48yr609M-py3.8/lib/python3.8/site-packages/torch/nn/functional.py in dropout(input, p, training, inplace)
1167 if p < 0.0 or p > 1.0:
1168 raise ValueError("dropout probability has to be between 0 and 1, " "but got {}".format(p))
-> 1169 return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training)
1170
1171
RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 6.00 GiB total capacity; 4.26 GiB already allocated; 0 bytes free; 4.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
cc @ngimel
StrangeTcy, zzj0402, kimjson, nannau, saikatdutta and 46 moreAnticonformiste, scalaboy, alimoezzi, zanvari, twisted-nematic57 and 2 moreJShull, nannau, mihuzz, scalaboy, cmj5064 and 5 more
Metadata
Metadata
Assignees
Labels
module: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generalmodule: memory usagePyTorch is using more memory than it should, or it is leaking memoryPyTorch is using more memory than it should, or it is leaking memorytriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module