Poor audio quality after fine-tuning

I'm trying to fine-tune the LibriTTS checkpoint on ~1 hour of LJSpeech but get poor results. Could you please give me some directions or help to spot the issue?

How I fine-tuned:
1. Pulled the latest changes from the repo
2. Replaced `Data/train_list.txt` with a copy that only has the first 1000 lines (~1 hour for training)
3. Changed batch_size to 4 and max_len to 100, otherwise it doesn't fit into the memory of my 4090 (24GB).
4. After training it for 50-100 epochs, I tested new checkpoints with both `Inference_LibriTTS.ipynb` and `Inference_LJSpeech.ipynb` notebooks by changing the `multispeaker` parameter in the config to true/false.
5. `Inference_LJSpeech.ipynb` produces very noisy results with a poor pronunciation.
6. `Inference_LibriTTS.ipynb` with reference audio from LJSpeech has a good pronunciation, but there are noticeable noises (example - https://voca.ro/1nQ8Ltjhsh9y)

Thank you again for the awesome project!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Poor audio quality after fine-tuning #49

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Poor audio quality after fine-tuning #49

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions