Skip to content

Spectrogram transform with float16 precision #2097

@Guillaume-oso

Description

@Guillaume-oso

🚀 The feature

Looks like spectrogram transform does not work with float16 precision.

versions:
python 3.8
torch 1.10.1
torchaudio 0.10.1
typing-extensions: 3.10.0.2
OS: ubuntu 20.04

My code to test if this feature works or not:

import torch
import torchaudio

batch = torch.rand(20, 1, 153600)

precision = "float16"

spec_transform = torchaudio.transforms.MelSpectrogram(
    sample_rate=16000,
    n_fft=1024,
    win_length=600,
    hop_length=320,
    f_min=20,
    f_max=8000,
    n_mels=128,
)
batch = batch.to(getattr(torch, precision))
spec_transform = spec_transform.to(getattr(torch, precision))

spec_transform(batch)

Error:

Traceback (most recent call last):
  File "test_half.py", line 20, in <module>
    spec_transform(batch)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/transforms.py", line 587, in forward
    specgram = self.spectrogram(waveform)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/transforms.py", line 124, in forward
    return F.spectrogram(
  File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 113, in spectrogram
    spec_f = torch.stft(
  File "/home/guillaume/.local/lib/python3.8/site-packages/torch/functional.py", line 570, in stft
    input = F.pad(input.view(extended_shape), [pad, pad], pad_mode)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 4179, in _pad
    return torch._C._nn.reflection_pad1d(input, pad)
RuntimeError: "reflection_pad1d" not implemented for 'Half'

if GPU execution enabled:

Traceback (most recent call last):
  File "test_half.py", line 20, in <module>
    spec_transform(batch)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/transforms.py", line 587, in forward
    specgram = self.spectrogram(waveform)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/transforms.py", line 124, in forward
    return F.spectrogram(
  File "/home/guillaume/.local/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 134, in spectrogram
    return spec_f.abs().pow(power)
RuntimeError: "abs_cuda" not implemented for 'ComplexHalf'

Motivation, pitch

try to reduce train and inference computation cost in an audio deep learning context.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions