-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
Describe the bug
I'm getting strange noises at the end of the generation after fine-tuning the XTTS model. It is
happening with both versions 1.1 and 2.0. It sounds like it's saying 'A' or some random mumbling.
To Reproduce
Im using the inference example in docs
https://tts.readthedocs.io/en/dev/models/xtts.html
import os
import torch
import torchaudio
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
Add here the xtts_config path
CONFIG_PATH = "recipes/ljspeech/xtts_v1/run/training/GPT_XTTS_LJSpeech_FT-October-23-2023_10+36AM-653f2e75/config.json"
Add here the vocab file that you have used to train the model
TOKENIZER_PATH = "recipes/ljspeech/xtts_v1/run/training/XTTS_v2_original_model_files/vocab.json"
Add here the checkpoint that you want to do inference with
XTTS_CHECKPOINT = "recipes/ljspeech/xtts_v1/run/training/GPT_XTTS_LJSpeech_FT/best_model.pth"
Add here the speaker reference
SPEAKER_REFERENCE = "LjSpeech_reference.wav"
output wav path
OUTPUT_WAV_PATH = "xtts-ft.wav"
print("Loading model...")
config = XttsConfig()
config.load_json(CONFIG_PATH)
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_path=XTTS_CHECKPOINT, vocab_path=TOKENIZER_PATH, use_deepspeed=False)
model.cuda()
print("Computing speaker latents...")
gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(audio_path=[SPEAKER_REFERENCE])
print("Inference...")
out = model.inference(
"It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
"en",
gpt_cond_latent,
speaker_embedding,
temperature=0.7, # Add custom parameters here
)
torchaudio.save(OUTPUT_WAV_PATH, torch.tensor(out["wav"]).unsqueeze(0), 24000)
Expected behavior
No response
Logs
No response
Environment
TTS v0.20.3
Python 3.10
Additional context
No response