-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Inferencing custom model fails to work for various reasons (Language, unable to synthesize audio, unexpected pathing, json errors)
To Reproduce
1.) Finetune model/Train model on Ljspeech dataset
2.) Run "tts --text "Text for TTS" --model_path path/to/model --config_path path/to/config.json --out_path speech.wav --language en"
3.) Errors [Language None is not supported. | raise TypeError("Invalid file: {0!r}".format(self.name))]
Expected behavior
Produces a voice fil with which to evaluate the model
Logs
(coqui) alpha78@----------:/mnt/q/Utilities/CUDA/TTS/TTS/server$ tts --text "Text for TTS" --model_path ./tts_models/en/ljspeech/ --config_path ./tts_models/en/ljspeech/config.json --out_path speech.wav --language en
> Using model: xtts
> Text: Text for TTS
> Text splitted to sentences.
['Text for TTS']
Traceback (most recent call last):
File "/home/alpha78/anaconda3/envs/coqui/bin/tts", line 8, in <module>
sys.exit(main())
File "/mnt/q/Utilities/CUDA/TTS/TTS/bin/synthesize.py", line 515, in main
wav = synthesizer.tts(
File "/mnt/q/Utilities/CUDA/TTS/TTS/utils/synthesizer.py", line 374, in tts
outputs = self.tts_model.synthesize(
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 392, in synthesize
return self.inference_with_config(text, config, ref_audio_path=speaker_wav, language=language, **kwargs)
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 400, in inference_with_config
"zh-cn" if language == "zh" else language in self.config.languages
AssertionError: ❗ Language None is not supported. Supported languages are ['en', 'es', 'fr', 'de', 'it', 'pt', 'pl', 'tr', 'ru', 'nl', 'cs', 'ar', 'zh-cn', 'hu', 'ko', 'ja']
(coqui) alpha78@----------:/mnt/q/Utilities/CUDA/TTS/TTS/server$ tts --text "Text for TTS" --model_path ./tts_models/en/ljspeech/ --config_path ./tts_models/en/ljspeech/config.json --out_path speech.wav --language en
> Using model: xtts
> Text: Text for TTS
> Text splitted to sentences.
['Text for TTS']
Traceback (most recent call last):
File "/home/alpha78/anaconda3/envs/coqui/bin/tts", line 8, in <module>
sys.exit(main())
File "/mnt/q/Utilities/CUDA/TTS/TTS/bin/synthesize.py", line 515, in main
wav = synthesizer.tts(
File "/mnt/q/Utilities/CUDA/TTS/TTS/utils/synthesizer.py", line 374, in tts
outputs = self.tts_model.synthesize(
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 392, in synthesize
return self.inference_with_config(text, config, ref_audio_path=speaker_wav, language=language, **kwargs)
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 415, in inference_with_config
return self.full_inference(text, ref_audio_path, language, **settings)
File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 476, in full_inference
(gpt_cond_latent, speaker_embedding) = self.get_conditioning_latents(
File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 351, in get_conditioning_latents
audio = load_audio(file_path, load_sr)
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 72, in load_audio
audio, lsr = torchaudio.load(audiopath)
File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torchaudio/_backend/utils.py", line 204, in load
return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size)
File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torchaudio/_backend/soundfile.py", line 27, in load
return soundfile_backend.load(uri, frame_offset, num_frames, normalize, channels_first, format)
File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torchaudio/_backend/soundfile_backend.py", line 221, in load
with soundfile.SoundFile(filepath, "r") as file_:
File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/soundfile.py", line 658, in __init__
self._file = self._open(file, mode_int, closefd)
File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/soundfile.py", line 1212, in _open
raise TypeError("Invalid file: {0!r}".format(self.name))
TypeError: Invalid file: None
Environment
{
"CUDA": {
"GPU": [
"NVIDIA GeForce RTX 3090"
],
"available": true,
"version": "12.1"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "2.1.1+cu121",
"TTS": "0.20.6",
"numpy": "1.22.0"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
"ELF"
],
"processor": "x86_64",
"python": "3.10.13",
"version": "#1 SMP Thu Oct 5 21:02:42 UTC 2023"
}
}
Additional context
Documentation pages had two different ways to infer the model, neither worked.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working