Skip to content

FineTuning under Windows Issue #79

@FlareP1

Description

@FlareP1

Hi thanks for this amazing TTS system, the inference is the best quality open source system that I have heard and works well and very fast under windows. However the fine tune script does not appear to work unmodified in the windows environment. I am trying to get the train_finetine.py to run locally under windows. I have made a couple of fixes (below) have have resolved some errors.

  1. Python needs to be called with -Xutf8 to fource UTF8 locale
  2. In _load_tenser(self, data) ~line 142 needs the following update osp.join(self.root_path, wave_path).replace("\","/") to ensure the correct slash is used within the file path when loading wav files.

However now I am stuck with the error below. Does anyone know what this might indicate? I can run the code in a debugger but I am not that familar with python to understand what is causing this error or what the correct behaviour should be.

Thanks in advance

(venv) C:\Users\xxxx\Documents\StyleTTS2>python -Xutf8 train_finetune.py --config_path ./Configs/config_ft.yml
Some weights of the model checkpoint at microsoft/wavlm-base-plus were not used when initializing WavLMModel: ['encoder.pos_conv_embed.conv.weight_v', 'encoder.pos_conv_embed.conv.weight_g']
- This IS expected if you are initializing WavLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing WavLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of WavLMModel were not initialized from the model checkpoint at microsoft/wavlm-base-plus and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
bert loaded
bert_encoder loaded
predictor loaded
decoder loaded
text_encoder loaded
predictor_encoder loaded
style_encoder loaded
diffusion loaded
text_aligner loaded
pitch_extractor loaded
mpd loaded
msd loaded
wd loaded
BERT AdamW (
Parameter Group 0
    amsgrad: False
    base_momentum: 0.85
    betas: (0.9, 0.99)
    capturable: False
    differentiable: False
    eps: 1e-09
    foreach: None
    fused: None
    initial_lr: 1e-05
    lr: 1e-05
    max_lr: 2e-05
    max_momentum: 0.95
    maximize: False
    min_lr: 0
    weight_decay: 0.01
)
decoder AdamW (
Parameter Group 0
    amsgrad: False
    base_momentum: 0.85
    betas: (0.0, 0.99)
    capturable: False
    differentiable: False
    eps: 1e-09
    foreach: None
    fused: None
    initial_lr: 0.0001
    lr: 0.0001
    max_lr: 0.0002
    max_momentum: 0.95
    maximize: False
    min_lr: 0
    weight_decay: 0.0001
)
Data/wavs/LJ045-0051.wav
Data/wavs/LJ034-0213.wav
Data/wavs/LJ038-0268.wav
Data/wavs/LJ004-0067.wav
Data/wavs/LJ049-0084.wav
Data/wavs/LJ003-0198.wav
Data/wavs/LJ022-0011.wav
Data/wavs/LJ028-0352.wav
Data/wavs/LJ047-0047.wav
Data/wavs/LJ008-0175.wav
Data/wavs/LJ015-0273.wav
Data/wavs/LJ004-0067.wav
Data/wavs/LJ015-0100.wav
Data/wavs/LJ032-0052.wav
Data/wavs/LJ011-0105.wav
Data/wavs/LJ012-0036.wav
Data/wavs/LJ049-0118.wav
Data/wavs/LJ028-0352.wav
Data/wavs/LJ006-0132.wav
Data/wavs/LJ034-0114.wav
Traceback (most recent call last):
  File "C:\Users\Chris\Documents\StyleTTS2\train_finetune.py", line 707, in <module>
    main()
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\click\core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\click\core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\Chris\Documents\StyleTTS2\train_finetune.py", line 396, in main
    y_rec_gt_pred = model.decoder(en, F0_real, N_real, s)
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\torch\nn\parallel\data_parallel.py", line 185, in forward
    outputs = self.parallel_apply(replicas, inputs, module_kwargs)
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\torch\nn\parallel\data_parallel.py", line 200, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 110, in parallel_apply
    output.reraise()
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\torch\_utils.py", line 694, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 85, in _worker
    output = module(*input, **kwargs)
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Chris\Documents\StyleTTS2\Modules\hifigan.py", line 458, in forward
    F0 = self.F0_conv(F0_curve.unsqueeze(1))
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\torch\nn\modules\module.py", line 1568, in _call_impl
    result = forward_call(*args, **kwargs)
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\torch\nn\modules\conv.py", line 310, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "C:\Users\Chris\Documents\StyleTTS2\venv\lib\site-packages\torch\nn\modules\conv.py", line 306, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [1, 1, 3], expected input[1, 100, 1] to have 1 channels, but got 100 channels instead


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions