Hi, for the finetuning dataset, should we use Whisper -> Phonemizer to make it from a list of audio files?