Skip to content

Bug during Second Stage Training #112

@asusdisciple

Description

@asusdisciple

So I tried to follow the first, second training approach. The first stage went through with little problems, but when I use the second training stage after a few iterations I get this message. I set the first_stage_path parameter to the model I trained in the first stage.
Then I set second_stage_load_pretrained to False. I use a LJSpeech style dataset with a single speaker, directory structure is exactly the same as in LJSpeech. Any ideas what leads to this kind of behaviour?

Traceback (most recent call last):
  File "/raid/nils/projects/StyleTTS2/train_second.py", line 788, in <module>
    main()
  File "/raid/nils/projects/StyleTTS2/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/raid/nils/projects/StyleTTS2/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/raid/nils/projects/StyleTTS2/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/raid/nils/projects/StyleTTS2/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/raid/nils/projects/StyleTTS2/train_second.py", line 308, in main
    bert_dur = model.bert(texts, attention_mask=(~text_mask).int())
  File "/raid/nils/projects/StyleTTS2/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/raid/nils/projects/StyleTTS2/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/raid/nils/projects/StyleTTS2/venv/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
    outputs = self.parallel_apply(replicas, inputs, module_kwargs)
  File "/raid/nils/projects/StyleTTS2/venv/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/raid/nils/projects/StyleTTS2/venv/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 110, in parallel_apply
    output.reraise()
  File "/raid/nils/projects/StyleTTS2/venv/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/raid/nils/projects/StyleTTS2/venv/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in _worker
    output = module(*input, **kwargs)
  File "/raid/nils/projects/StyleTTS2/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/raid/nils/projects/StyleTTS2/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/raid/nils/projects/StyleTTS2/Utils/PLBERT/util.py", line 9, in forward
    outputs = super().forward(*args, **kwargs)
  File "/raid/nils/projects/StyleTTS2/venv/lib/python3.10/site-packages/transformers/models/albert/modeling_albert.py", line 719, in forward
    buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
RuntimeError: The expanded size of the tensor (780) must match the existing size (512) at non-singleton dimension 1.  Target sizes: [2, 780].  Tensor sizes: [1, 512]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions