-
Notifications
You must be signed in to change notification settings - Fork 605
Closed
Description
A lot of useful information is found in the other (closed) issues, but these questions come to mind.
- How does
max_len
impact the training/finetuning process exactly?
In the LJS dataset, there are audio files with a duration far longer than the max_len: 400
(=5 seconds) as it is specified in the example config file. Many files are 10 seconds long and a great majority are longer than 5 seconds. They are also included in the train_list.txt
. Was this intentional?
- Are audiofiles truncated once the maximum number of frames is reached?
Should the datasets be carefully edited so that audiofiles do not exceed the maximum duration set in the config file? Is there a detrimental effect on adherence to punctuation or spelling when the model only sees short or clipped speech?
- Is there a maximum permissible length / does the architecture impose restrictions? Could
max_len
be set to something like1200
and thus make full use of long audio files? (Ignoring the VRAM requirements in the current DP implementation)
Metadata
Metadata
Assignees
Labels
No labels