Fix slm training bug (https://github.com/yl4579/StyleTTS2/issues/72) #74

kmn1024 · 2023-11-24T03:11:15Z

SLM joint training bug in finetuning code: #72 (comment)

yl4579 · 2023-11-24T03:18:23Z

It’s probably related to this: #15
I couldn’t reproduce because my pytorch version doesn’t have this problem. Does the order of backward matter though? Do you also have to change the order of the generator loss backward?

kmn1024 · 2023-11-24T05:02:44Z

Sorry you are correct! The order of backwards also needs to be changed. The run now works on my setup.

For completeness, this if my setup:

> python -c "import torch; print(torch.version.cuda)"
12.1
> python -c "import torch; print(torch.__version__)"
2.1.1
> nvidia-smi
Fri Nov 24 04:49:56 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.06              Driver Version: 545.23.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A6000               On  | 00000000:05:00.0 Off |                  Off |
| 30%   51C    P2             118W / 300W |  32545MiB / 49140MiB |     28%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A6000               On  | 00000000:45:00.0 Off |                  Off |
| 30%   53C    P2             111W / 300W |  27637MiB / 49140MiB |     30%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA RTX A6000               On  | 00000000:85:00.0 Off |                  Off |
| 30%   55C    P2             113W / 300W |  27617MiB / 49140MiB |     22%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA RTX A6000               On  | 00000000:C5:00.0 Off |                  Off |
| 30%   52C    P2             107W / 300W |  27529MiB / 49140MiB |     26%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

yl4579 · 2023-11-24T05:41:35Z

I actually found a bug in this fix. It calls optimizer.zero_grad() twice when the discriminator loss isn’t 0, so it implicitly overwrites the gradient in that iteration and optimizes against the generator loss. I think we have to bring optimizer steps lines before the discriminator line as well.

yl4579 · 2023-11-24T05:42:11Z

Can you please make the change and test if it doesn’t cause any problems in your settings?

kmn1024 · 2023-11-24T06:09:14Z

Ah, another good catch. Trying...

This issue also seems to be in train_second.py?

yl4579 · 2023-11-24T06:23:02Z

Yes, so if you could verify it has no problems running on your system I’ll change that too. I couldn’t reproduce in my environment.

Continued fix of SLM training (see #74)

Fix slm training bug (yl4579#72)

…ut when . See yl4579#74 (comment). Also tuck all logic related to SLM under

Continued fix of SLM training (see yl4579#74)

Fix slm training bug (#72)

71fe3c9

also need order change, slm gen loss then disc loss

19f679f

kmn1024 mentioned this pull request Nov 24, 2023

Attempting to continue Finetuning #72

Closed

yl4579 merged commit 23c16b7 into yl4579:main Nov 24, 2023

kmn1024 mentioned this pull request Nov 24, 2023

Continued fix of SLM training (see https://github.com/yl4579/StyleTTS2/pull/74) #75

Merged

yl4579 added a commit that referenced this pull request Nov 24, 2023

Merge pull request #75 from kmn1024/finetune-slm-bug

d783aec

Continued fix of SLM training (see #74)

yl4579 added a commit that referenced this pull request Nov 24, 2023

fix buggy logic from #15 per #74

e2fabb3

nawed2611 pushed a commit to team-listnr/StyleTTS2 that referenced this pull request Feb 8, 2024

Merge pull request yl4579#74 from kmn1024/finetune-slm-bug

027b085

Fix slm training bug (yl4579#72)

nawed2611 pushed a commit to team-listnr/StyleTTS2 that referenced this pull request Feb 8, 2024

Move generator related optimizer.steps up, so they won't get zeroed o…

ecec3a7

…ut when . See yl4579#74 (comment). Also tuck all logic related to SLM under

nawed2611 pushed a commit to team-listnr/StyleTTS2 that referenced this pull request Feb 8, 2024

Merge pull request yl4579#75 from kmn1024/finetune-slm-bug

1bc21c8

Continued fix of SLM training (see yl4579#74)

nawed2611 pushed a commit to team-listnr/StyleTTS2 that referenced this pull request Feb 8, 2024

fix buggy logic from yl4579#15 per yl4579#74

90b9e6d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix slm training bug (https://github.com/yl4579/StyleTTS2/issues/72) #74

Fix slm training bug (https://github.com/yl4579/StyleTTS2/issues/72) #74

Uh oh!

kmn1024 commented Nov 24, 2023

Uh oh!

yl4579 commented Nov 24, 2023

Uh oh!

kmn1024 commented Nov 24, 2023

Uh oh!

yl4579 commented Nov 24, 2023

Uh oh!

yl4579 commented Nov 24, 2023

Uh oh!

kmn1024 commented Nov 24, 2023 •

edited

Loading

Uh oh!

yl4579 commented Nov 24, 2023

Uh oh!

Uh oh!

Fix slm training bug (https://github.com/yl4579/StyleTTS2/issues/72) #74

Fix slm training bug (https://github.com/yl4579/StyleTTS2/issues/72) #74

Uh oh!

Conversation

kmn1024 commented Nov 24, 2023

Uh oh!

yl4579 commented Nov 24, 2023

Uh oh!

kmn1024 commented Nov 24, 2023

Uh oh!

yl4579 commented Nov 24, 2023

Uh oh!

yl4579 commented Nov 24, 2023

Uh oh!

kmn1024 commented Nov 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yl4579 commented Nov 24, 2023

Uh oh!

Uh oh!

kmn1024 commented Nov 24, 2023 •

edited

Loading