[TTS]Diffsinger opencpop baseline #2834

lym0302 · 2023-01-16T02:43:05Z

PR types
new feature

PR changes
add Diffsinger opencpop baseline （fft training）

Describe
add Diffsinger opencpop baseline （fft training）

fix #2821

…o develop

examples/opencpop/svs1/run.sh

paddlespeech/t2s/datasets/get_feats.py

paddlespeech/t2s/datasets/preprocess_utils.py

paddlespeech/t2s/exps/diffsinger/preprocess.py

paddlespeech/t2s/exps/syn_utils.py

paddlespeech/t2s/models/diffsinger/diffsinger.py

paddlespeech/t2s/modules/transformer/encoder.py

yt605155624 · 2023-02-01T11:49:48Z

examples/opencpop/svs1/local/train.sh

@@ -0,0 +1,12 @@
+#!/bin/bash


这个其实可以直接软连接 csmsc/tts3/local/train.sh (除了 ngpu 不一样)，可能等后期再改

paddlespeech/t2s/models/diffsinger/diffsinger.py

yt605155624 · 2023-02-06T02:58:51Z

paddlespeech/t2s/models/diffsinger/fastspeech2midi.py

+        return outs[0], d_outs[0], p_outs[0], e_outs[0]
+
+
+class FastSpeech2MIDILoss(nn.Layer):


可以继承自 FastSpeech2Loss, 直接使用父类的 __init__

yt605155624 · 2023-02-06T02:59:40Z

paddlespeech/t2s/models/diffsinger/diffsinger.py

+        self.fs2 = FastSpeech2MIDI(
+            idim=idim,
+            odim=odim,
+            fastspeech2_config=fastspeech2_params,


一个是 fastspeech2_config 一个是 fastspeech2_params 此处是否保持一致

yt605155624 · 2023-02-06T03:03:14Z

paddlespeech/t2s/models/diffsinger/diffsinger_updater.py

+            optimizers: Dict[str, Optimizer],
+            criterions: Dict[str, Layer],
+            dataloader: DataLoader,
+            fs2_train_start_steps: int=0,


这个参数感觉不需要，fs2_train_start_steps 存在不是 0 的情况吗？

yt605155624 · 2023-02-06T03:05:41Z

paddlespeech/t2s/models/diffsinger/fastspeech2midi.py

+            spk_id = paddle.cast(spk_id, 'int64')
+        # forward propagation
+        before_outs, after_outs, d_outs, p_outs, e_outs, spk_logits = self._forward(
+            xs,


建议调用时加上形参，参数太多了，容易乱

yt605155624 · 2023-02-06T03:07:17Z

paddlespeech/t2s/models/diffsinger/fastspeech2midi.py

+            es = e.unsqueeze(0) if e is not None else None
+
+            # (1, L, odim)
+            _, outs, d_outs, p_outs, e_outs, _ = self._forward(


调用时建议加上形参

yt605155624 · 2023-02-06T03:07:22Z

paddlespeech/t2s/models/diffsinger/fastspeech2midi.py

+                is_inference=True)
+        else:
+            # (1, L, odim)
+            _, outs, d_outs, p_outs, e_outs, _ = self._forward(


调用时建议加上形参

yt605155624 · 2023-02-06T03:07:29Z

paddlespeech/t2s/models/diffsinger/fastspeech2midi.py

+
+        # (1, L, odim)
+        # use *_ to avoid bug in dygraph to static graph    
+        hs, h_masks = self._forward(


调用时建议加上形参

yt605155624 · 2023-02-06T03:07:35Z

paddlespeech/t2s/models/diffsinger/fastspeech2midi.py

+
+        # (1, L, odim)
+        # use *_ to avoid bug in dygraph to static graph    
+        hs, _ = self._forward(


调用时建议加上形参

yt605155624 · 2023-02-06T03:09:05Z

paddlespeech/t2s/models/diffsinger/diffsinger_updater.py

+
+            report("train/loss_ds", float(loss_ds))
+            report("train/l1_loss_ds", float(l1_loss_ds))
+            losses_dict["l1_loss_ds"] = float(l1_loss_ds)


这两个 loss 一样的话是否需要 report 两遍？

yt605155624 · 2023-02-06T03:10:00Z

paddlespeech/t2s/models/diffsinger/diffsinger.py

+        self.normalizer = normalizer
+        self.acoustic_model = model
+
+    def forward(self, text, note, note_dur, is_slur, get_mel_fs2: bool=False):


yt605155624 · 2023-02-06T03:10:55Z

paddlespeech/t2s/exps/diffsinger/train.py

+from paddlespeech.t2s.training.trainer import Trainer
+from paddlespeech.t2s.utils import str2bool
+
+# from paddlespeech.t2s.models.fastspeech2 import FastSpeech2Loss


可以删掉

yt605155624 · 2023-02-06T03:31:39Z

paddlespeech/t2s/datasets/preprocess_utils.py

+        dataset (str): dataset name
+    Returns: 
+        Dict: the information of sentence, include [phone id (int)], [the frame of phone (int)], [note id (int)], [note duration (float)], [is slur (int)], text(str), speaker name (str)
+        tunple: speaker name


tunple 拼写错误

yt605155624 · 2023-02-07T02:35:03Z

paddlespeech/t2s/exps/diffsinger/train.py

+    print("========Config========")
+    print(config)
+    print(
+        f"master see the word size: {dist.get_world_size()}, from pid: {os.getpid()}"


yt605155624 · 2023-02-09T06:00:19Z

paddlespeech/t2s/models/diffsinger/diffsinger.py

+        mel_fs2 = mel_fs2.unsqueeze(0).transpose((0, 2, 1))
+        cond_fs2 = self.fs2.encoder_infer(text, note, note_dur, is_slur)
+        cond_fs2 = cond_fs2.transpose((0, 2, 1))
+        mel, _ = self.diffusion(mel_fs2, cond_fs2)


此处是否应该调用 self.diffusion.inference() , 如果是的话应该加一个 num_inference_steps 参数控制下步数，默认的 1000 太大了

mergify · 2023-02-16T02:17:56Z

This pull request is now in conflict :(

lym0302 and others added 30 commits August 26, 2022 06:58

updata readme, test=doc

f58de66

Merge branch 'PaddlePaddle:develop' into develop

0251c38

Merge branch 'PaddlePaddle:develop' into develop

034aef5

Merge branch 'PaddlePaddle:develop' into develop

ccce14f

Merge branch 'PaddlePaddle:develop' into develop

2244b53

Merge branch 'PaddlePaddle:develop' into develop

5c197e7

update yaml and readme, test=tts

8e5e265

Merge branch 'PaddlePaddle:develop' into develop

6b4cccb

Merge branch 'PaddlePaddle:develop' into develop

697e1f7

fix batch_size, test=tts

f6cf18e

Merge branch 'PaddlePaddle:develop' into develop

20ccc05

Merge branch 'PaddlePaddle:develop' into develop

c737dab

Merge branch 'PaddlePaddle:develop' into develop

8dc3c98

Merge branch 'PaddlePaddle:develop' into develop

fa434cb

Merge branch 'PaddlePaddle:develop' into develop

2b9d7c8

Merge branch 'PaddlePaddle:develop' into develop

8164d86

Merge branch 'PaddlePaddle:develop' into develop

8964190

Merge branch 'PaddlePaddle:develop' into develop

06383d5

Merge branch 'PaddlePaddle:develop' into develop

2a978bc

Merge branch 'PaddlePaddle:develop' into develop

664aed4

update readme, test=doc

003ff8f

Merge branch 'develop' of https://github.com/lym0302/PaddleSpeech int…

d3eb589

…o develop

chmod, test=tts

dc71ad0

Merge branch 'develop' of https://github.com/lym0302/PaddleSpeech int…

8457159

…o develop

Merge branch 'PaddlePaddle:develop' into develop

eef87bb

Merge branch 'PaddlePaddle:develop' into develop

2e5af47

Merge branch 'develop' of https://github.com/lym0302/PaddleSpeech int…

5c67d95

…o develop

Merge branch 'PaddlePaddle:develop' into develop

4d8ef8c

Merge branch 'PaddlePaddle:develop' into develop

152ebcb

Merge branch 'PaddlePaddle:develop' into develop

5c8b75e

lym0302 and others added 2 commits January 15, 2023 14:04

Merge branch 'PaddlePaddle:develop' into develop

82378e5

diffsinger opencpop fft train, test=tts

c463b35

lym0302 requested a review from yt605155624 January 16, 2023 02:43

mergify bot added T2S Example labels Jan 16, 2023

lym0302 marked this pull request as draft January 16, 2023 02:44

fix pitch_mask

6fb281c

lym0302 force-pushed the diffsinger branch from b6528cc to 6fb281c Compare January 16, 2023 08:23

yt605155624 added this to the r1.4.0 milestone Jan 19, 2023

yt605155624 changed the title ~~Diffsinger opencpop baseline~~ [TTS]Diffsinger opencpop baseline Jan 19, 2023

lym0302 force-pushed the diffsinger branch 2 times, most recently from 9888edc to 4d5a6b4 Compare February 1, 2023 08:30

yt605155624 marked this pull request as ready for review February 1, 2023 09:25

base diffsinger, test=tts

ef7d15d

lym0302 force-pushed the diffsinger branch from 4d5a6b4 to ef7d15d Compare February 1, 2023 09:33

yt605155624 reviewed Feb 1, 2023

View reviewed changes

yt605155624 reviewed Feb 2, 2023

View reviewed changes

paddlespeech/t2s/models/diffsinger/diffsinger.py Show resolved Hide resolved

fix diffsinger, test=tts

c91dc02

yt605155624 reviewed Feb 6, 2023

View reviewed changes

yt605155624 assigned lym0302 Feb 7, 2023

yt605155624 reviewed Feb 7, 2023

View reviewed changes

yt605155624 mentioned this pull request Feb 9, 2023

[TTS] DiffSinger #2821

Closed

yt605155624 reviewed Feb 9, 2023

View reviewed changes

mergify bot added the conflicts label Feb 16, 2023

yt605155624 closed this Mar 13, 2023

mergify bot removed the conflicts label Mar 13, 2023

		return outs[0], d_outs[0], p_outs[0], e_outs[0]


		class FastSpeech2MIDILoss(nn.Layer):

[TTS]Diffsinger opencpop baseline #2834

[TTS]Diffsinger opencpop baseline #2834

Uh oh!

Conversation

lym0302 commented Jan 16, 2023 • edited by yt605155624 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yt605155624 Feb 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Feb 16, 2023

Uh oh!

Uh oh!

lym0302 commented Jan 16, 2023 •

edited by yt605155624

Loading

yt605155624 Feb 9, 2023 •

edited

Loading