Skip to content

Conversation

lym0302
Copy link
Contributor

@lym0302 lym0302 commented Jan 16, 2023

PR types
new feature

PR changes
add Diffsinger opencpop baseline (fft training)

Describe
add Diffsinger opencpop baseline (fft training)

fix #2821

lym0302 and others added 30 commits August 26, 2022 06:58
@lym0302 lym0302 requested a review from yt605155624 January 16, 2023 02:43
@lym0302 lym0302 marked this pull request as draft January 16, 2023 02:44
@yt605155624 yt605155624 added this to the r1.4.0 milestone Jan 19, 2023
@yt605155624 yt605155624 changed the title Diffsinger opencpop baseline [TTS]Diffsinger opencpop baseline Jan 19, 2023
@lym0302 lym0302 force-pushed the diffsinger branch 2 times, most recently from 9888edc to 4d5a6b4 Compare February 1, 2023 08:30
@yt605155624 yt605155624 marked this pull request as ready for review February 1, 2023 09:25
@@ -0,0 +1,12 @@
#!/bin/bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个其实可以直接软连接 csmsc/tts3/local/train.sh (除了 ngpu 不一样),可能等后期再改

return outs[0], d_outs[0], p_outs[0], e_outs[0]


class FastSpeech2MIDILoss(nn.Layer):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以继承自 FastSpeech2Loss, 直接使用父类的 __init__

self.fs2 = FastSpeech2MIDI(
idim=idim,
odim=odim,
fastspeech2_config=fastspeech2_params,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一个是 fastspeech2_config 一个是 fastspeech2_params 此处是否保持一致

optimizers: Dict[str, Optimizer],
criterions: Dict[str, Layer],
dataloader: DataLoader,
fs2_train_start_steps: int=0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个参数感觉不需要,fs2_train_start_steps 存在不是 0 的情况吗?

spk_id = paddle.cast(spk_id, 'int64')
# forward propagation
before_outs, after_outs, d_outs, p_outs, e_outs, spk_logits = self._forward(
xs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议调用时加上形参,参数太多了,容易乱

es = e.unsqueeze(0) if e is not None else None

# (1, L, odim)
_, outs, d_outs, p_outs, e_outs, _ = self._forward(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

调用时建议加上形参

is_inference=True)
else:
# (1, L, odim)
_, outs, d_outs, p_outs, e_outs, _ = self._forward(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

调用时建议加上形参


# (1, L, odim)
# use *_ to avoid bug in dygraph to static graph
hs, h_masks = self._forward(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

调用时建议加上形参


# (1, L, odim)
# use *_ to avoid bug in dygraph to static graph
hs, _ = self._forward(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

调用时建议加上形参


report("train/loss_ds", float(loss_ds))
report("train/l1_loss_ds", float(l1_loss_ds))
losses_dict["l1_loss_ds"] = float(l1_loss_ds)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个 loss 一样的话是否需要 report 两遍?

self.normalizer = normalizer
self.acoustic_model = model

def forward(self, text, note, note_dur, is_slur, get_mel_fs2: bool=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typehint

from paddlespeech.t2s.training.trainer import Trainer
from paddlespeech.t2s.utils import str2bool

# from paddlespeech.t2s.models.fastspeech2 import FastSpeech2Loss
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以删掉

dataset (str): dataset name
Returns:
Dict: the information of sentence, include [phone id (int)], [the frame of phone (int)], [note id (int)], [note duration (float)], [is slur (int)], text(str), speaker name (str)
tunple: speaker name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tunple 拼写错误

print("========Config========")
print(config)
print(
f"master see the word size: {dist.get_world_size()}, from pid: {os.getpid()}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

world

@yt605155624 yt605155624 mentioned this pull request Feb 9, 2023
mel_fs2 = mel_fs2.unsqueeze(0).transpose((0, 2, 1))
cond_fs2 = self.fs2.encoder_infer(text, note, note_dur, is_slur)
cond_fs2 = cond_fs2.transpose((0, 2, 1))
mel, _ = self.diffusion(mel_fs2, cond_fs2)
Copy link
Collaborator

@yt605155624 yt605155624 Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处是否应该调用 self.diffusion.inference() , 如果是的话 应该加一个 num_inference_steps 参数控制下步数,默认的 1000 太大了

@mergify
Copy link

mergify bot commented Feb 16, 2023

This pull request is now in conflict :(

@mergify mergify bot added the conflicts label Feb 16, 2023
@mergify mergify bot removed the conflicts label Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[TTS] DiffSinger
2 participants