[Bug] AttributeError: 'NoneType' object has no attribute 'load_wav' when using tts_with_vc_to_file

### Describe the bug

Fix #3108 breaks `tts_with_vc_to_file` at least with VITS.

See: https://github.com/coqui-ai/TTS/blob/6fef4f9067c0647258e0cd1d2998716565f59330/TTS/api.py#L463

By changing the line from:
`self.tts_to_file(text=text, speaker=None, language=language, file_path=fp.name,speaker_wav=speaker_wav)`

To its pre-0.19.1 version:
`self.tts_to_file(text=text, speaker=None, language=language, file_path=fp.name)`

The issue is solved.

Please take a look at the script below for reproduction.

### To Reproduce

Clone the Coqui TTS repository and install the dependencies as specified in the README file.
Then, run the following script from TTS's root directory, but replace `speaker_wav` with any audio file you have at hand:

```python3
#!/usr/bin/env python3

import torch
from TTS.api import TTS

device = "cuda" if torch.cuda.is_available() else "cpu"

tts = TTS("tts_models/pt/cv/vits").to(device)

tts.tts_with_vc_to_file(
    text="A radiografia apresentou algumas lesões no fêmur esquerdo ponto parágrafo",
    speaker_wav="test_audios/1693678335_24253176-processed.wav",
    file_path="test_audios/output.wav",
)
```

### Expected behavior

The output audio file defined in `file_path` is generated, saying the sentence in `text` with the voice cloned from `speaker_wav`.

### Logs

```shell
> tts_models/pt/cv/vits is already downloaded.
 > Using model: vits
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:0
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:None
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:None
 | > symmetric_norm:None
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:None
 | > pitch_fmax:None
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > initialization of speaker-embedding layers.
 > initialization of language-embedding layers.
/home/probst/.pyenv/versions/coqui-tts/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
 > Text splitted to sentences.
['A radiografia apresentou algumas lesões no fêmur esquerdo ponto parágrafo']
Traceback (most recent call last):
  File "/home/probst/Projects/TTS-iara/./test.py", line 15, in <module>
    tts.tts_with_vc_to_file(
  File "/home/probst/Projects/TTS-iara/TTS/api.py", line 488, in tts_with_vc_to_file
    wav = self.tts_with_vc(text=text, language=language, speaker_wav=speaker_wav)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/probst/Projects/TTS-iara/TTS/api.py", line 463, in tts_with_vc
    self.tts_to_file(text=text, speaker=None, language=language, file_path=fp.name, speaker_wav=speaker_wav)
  File "/home/probst/Projects/TTS-iara/TTS/api.py", line 403, in tts_to_file
    wav = self.tts(text=text, speaker=speaker, language=language, speaker_wav=speaker_wav, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/probst/Projects/TTS-iara/TTS/api.py", line 341, in tts
    wav = self.synthesizer.tts(
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/probst/Projects/TTS-iara/TTS/utils/synthesizer.py", line 362, in tts
    speaker_embedding = self.tts_model.speaker_manager.compute_embedding_from_clip(speaker_wav)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/probst/Projects/TTS-iara/TTS/tts/utils/managers.py", line 365, in compute_embedding_from_clip
    embedding = _compute(wav_file)
                ^^^^^^^^^^^^^^^^^^
  File "/home/probst/Projects/TTS-iara/TTS/tts/utils/managers.py", line 342, in _compute
    waveform = self.encoder_ap.load_wav(wav_file, sr=self.encoder_ap.sample_rate)
               ^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'load_wav'
```


### Environment

```shell
- 🐸TTS Version: 0.19.1
- PyTorch Version: 2.1.0+cu121
- OS: Artix Linux

Not using GPU.
Installed everything through pip in a virtual environment created with pyenv.
```


### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] AttributeError: 'NoneType' object has no attribute 'load_wav' when using tts_with_vc_to_file #3143

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] AttributeError: 'NoneType' object has no attribute 'load_wav' when using tts_with_vc_to_file #3143

Description

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions