-
Notifications
You must be signed in to change notification settings - Fork 25.1k
Open
Labels
module: complexRelated to complex number support in PyTorchRelated to complex number support in PyTorchmodule: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generalmodule: halfRelated to float16 half-precision floatsRelated to float16 half-precision floatstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Bug
Using torch.stft with 16 bit precision causes this error on any version of torch > 1.6.0
To Reproduce
Steps to reproduce the behavior:
- Follow the instructions to train this model from the VITs repo
2021-06-29 10:29:09.537741: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
/usr/local/lib/python3.7/dist-packages/torch/functional.py:472: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:664.)
normalized, onesided, return_complex)
Traceback (most recent call last):
File "train_ms.py", line 294, in <module>
main()
File "train_ms.py", line 50, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/content/vits/train_ms.py", line 118, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "/content/vits/train_ms.py", line 192, in train_and_evaluate
scaler.scale(loss_gen_all).backward()
File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 149, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: "fill_cuda" not implemented for 'ComplexHalf'
Environment
FP16 Works
PyTorch version: 1.6.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.11.0-37-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 2080 Ti
GPU 1: NVIDIA GeForce RTX 2080 Ti
GPU 2: NVIDIA GeForce RTX 2080 Ti
GPU 3: NVIDIA GeForce RTX 2080 Ti
Nvidia driver version: 470.63.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] mypy==0.910
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] pytorch-lightning==1.4.9
[pip3] torch==1.6.0
[pip3] torchaudio==0.6.0
[pip3] torchmetrics==0.5.1
[conda] Could not collect
FP16 not working
PyTorch version: 1.10.0+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.11.0-37-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 2080 Ti
GPU 1: NVIDIA GeForce RTX 2080 Ti
GPU 2: NVIDIA GeForce RTX 2080 Ti
GPU 3: NVIDIA GeForce RTX 2080 Ti
Nvidia driver version: 470.63.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] mypy==0.910
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] pytorch-lightning==1.4.9
[pip3] torch==1.10.0
[pip3] torchaudio==0.10.0
[pip3] torchmetrics==0.5.1
[conda] Could not collect
cc @ngimel @ezyang @anjali411 @dylanbespalko @mruberry @lezcano @nikitaved
Metadata
Metadata
Assignees
Labels
module: complexRelated to complex number support in PyTorchRelated to complex number support in PyTorchmodule: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generalmodule: halfRelated to float16 half-precision floatsRelated to float16 half-precision floatstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module