Skip to content

(Question) Max Length and datasets #52

@Kreevoz

Description

@Kreevoz

A lot of useful information is found in the other (closed) issues, but these questions come to mind.

  • How does max_len impact the training/finetuning process exactly?

In the LJS dataset, there are audio files with a duration far longer than the max_len: 400 (=5 seconds) as it is specified in the example config file. Many files are 10 seconds long and a great majority are longer than 5 seconds. They are also included in the train_list.txt . Was this intentional?

  • Are audiofiles truncated once the maximum number of frames is reached?

Should the datasets be carefully edited so that audiofiles do not exceed the maximum duration set in the config file? Is there a detrimental effect on adherence to punctuation or spelling when the model only sees short or clipped speech?

  • Is there a maximum permissible length / does the architecture impose restrictions? Could max_len be set to something like 1200 and thus make full use of long audio files? (Ignoring the VRAM requirements in the current DP implementation)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions