[Bug] Custom model inference error [Unresolved]

### Describe the bug

Inferencing custom model fails to work for various reasons (Language, unable to synthesize audio, unexpected pathing, json errors)

### To Reproduce

1.) Finetune model/Train model on Ljspeech dataset
2.) Run "tts --text "Text for TTS" --model_path path/to/model --config_path path/to/config.json --out_path speech.wav --language en"
3.) Errors [Language None is not supported. | raise TypeError("Invalid file: {0!r}".format(self.name))]

### Expected behavior

Produces a voice fil with which to evaluate the model

### Logs

```shell
(coqui) alpha78@----------:/mnt/q/Utilities/CUDA/TTS/TTS/server$ tts --text "Text for TTS" --model_path ./tts_models/en/ljspeech/ --config_path ./tts_models/en/ljspeech/config.json --out_path speech.wav --language en
 > Using model: xtts
 > Text: Text for TTS
 > Text splitted to sentences.
['Text for TTS']
Traceback (most recent call last):
  File "/home/alpha78/anaconda3/envs/coqui/bin/tts", line 8, in <module>
    sys.exit(main())
  File "/mnt/q/Utilities/CUDA/TTS/TTS/bin/synthesize.py", line 515, in main
    wav = synthesizer.tts(
  File "/mnt/q/Utilities/CUDA/TTS/TTS/utils/synthesizer.py", line 374, in tts
    outputs = self.tts_model.synthesize(
  File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 392, in synthesize
    return self.inference_with_config(text, config, ref_audio_path=speaker_wav, language=language, **kwargs)
  File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 400, in inference_with_config
    "zh-cn" if language == "zh" else language in self.config.languages
AssertionError:  ❗ Language None is not supported. Supported languages are ['en', 'es', 'fr', 'de', 'it', 'pt', 'pl', 'tr', 'ru', 'nl', 'cs', 'ar', 'zh-cn', 'hu', 'ko', 'ja']


(coqui) alpha78@----------:/mnt/q/Utilities/CUDA/TTS/TTS/server$ tts --text "Text for TTS" --model_path ./tts_models/en/ljspeech/ --config_path ./tts_models/en/ljspeech/config.json --out_path speech.wav --language en
 > Using model: xtts
 > Text: Text for TTS
 > Text splitted to sentences.
['Text for TTS']
Traceback (most recent call last):
  File "/home/alpha78/anaconda3/envs/coqui/bin/tts", line 8, in <module>
    sys.exit(main())
  File "/mnt/q/Utilities/CUDA/TTS/TTS/bin/synthesize.py", line 515, in main
    wav = synthesizer.tts(
  File "/mnt/q/Utilities/CUDA/TTS/TTS/utils/synthesizer.py", line 374, in tts
    outputs = self.tts_model.synthesize(
  File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 392, in synthesize
    return self.inference_with_config(text, config, ref_audio_path=speaker_wav, language=language, **kwargs)
  File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 415, in inference_with_config
    return self.full_inference(text, ref_audio_path, language, **settings)
  File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 476, in full_inference
    (gpt_cond_latent, speaker_embedding) = self.get_conditioning_latents(
  File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 351, in get_conditioning_latents
    audio = load_audio(file_path, load_sr)
  File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 72, in load_audio
    audio, lsr = torchaudio.load(audiopath)
  File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torchaudio/_backend/utils.py", line 204, in load
    return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size)
  File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torchaudio/_backend/soundfile.py", line 27, in load
    return soundfile_backend.load(uri, frame_offset, num_frames, normalize, channels_first, format)
  File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torchaudio/_backend/soundfile_backend.py", line 221, in load
    with soundfile.SoundFile(filepath, "r") as file_:
  File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/soundfile.py", line 658, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/soundfile.py", line 1212, in _open
    raise TypeError("Invalid file: {0!r}".format(self.name))
TypeError: Invalid file: None
```


### Environment

```shell
{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 3090"
        ],
        "available": true,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.1+cu121",
        "TTS": "0.20.6",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.13",
        "version": "#1 SMP Thu Oct 5 21:02:42 UTC 2023"
    }
}
```


### Additional context

Documentation pages had two different ways to infer the model, neither worked.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Custom model inference error [Unresolved] #3291

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Custom model inference error [Unresolved] #3291

Description

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions