(Question) Max Length and datasets

A lot of useful information is found in the other (closed) issues, but these questions come to mind.

- **How does `max_len` impact the training/finetuning process exactly?**

In the LJS dataset, there are audio files with a duration far longer than the `max_len: 400` (=5 seconds) as it is specified in the example config file. Many files are 10 seconds long and a great majority are longer than 5 seconds. They are also included in the `train_list.txt` . Was this intentional?

- **Are audiofiles truncated once the maximum number of frames is reached?**

Should the datasets be carefully edited so that audiofiles do not exceed the maximum duration set in the config file? Is there a detrimental effect on adherence to punctuation or spelling when the model only sees short or clipped speech?

- **Is there a maximum permissible length / does the architecture impose restrictions?** Could `max_len` be set to something like `1200` and thus make full use of long audio files? (Ignoring the VRAM requirements in the current DP implementation)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(Question) Max Length and datasets #52

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

(Question) Max Length and datasets #52

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions