Skip to content

[Hackathon 7th] 修复 vctk 中 spk_emb 维度问题 #3916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 29, 2024

Conversation

megemini
Copy link
Collaborator

PR types

Bug fixes

PR changes

Others

Describe

修复 vctk 中 spk_emb 维度问题 ~

不修改 dim ,则下面的命令报维度错误:

> CUDA_VISIBLE_DEVICES=0,1 ./local/synthesize.sh conf/default.yaml ./output snapshot_iter_1332.pdz
...
Traceback (most recent call last):
  File "/home/aistudio/PaddleSpeech/paddlespeech/t2s/exps/fastspeech2/../synthesize.py", line 281, in <module>
    main()
  File "/home/aistudio/PaddleSpeech/paddlespeech/t2s/exps/fastspeech2/../synthesize.py", line 277, in main
    evaluate(args)
  File "/home/aistudio/PaddleSpeech/paddlespeech/t2s/exps/fastspeech2/../synthesize.py", line 100, in evaluate
    mel = am_inference(
  File "/home/aistudio/.local/lib/python3.8/site-packages/paddle/nn/layer/layers.py", line 1532, in __call__
    return self.forward(*inputs, **kwargs)
  File "/home/aistudio/PaddleSpeech/paddlespeech/t2s/models/fastspeech2/fastspeech2.py", line 943, in forward
    normalized_mel, d_outs, p_outs, e_outs = self.acoustic_model.inference(
  File "/home/aistudio/PaddleSpeech/paddlespeech/t2s/models/fastspeech2/fastspeech2.py", line 815, in inference
    _, outs, d_outs, p_outs, e_outs, _ = self._forward(
  File "/home/aistudio/PaddleSpeech/paddlespeech/t2s/models/fastspeech2/fastspeech2.py", line 623, in _forward
    hs = self._integrate_with_spk_embed(hs, spk_emb)
  File "/home/aistudio/PaddleSpeech/paddlespeech/t2s/models/fastspeech2/fastspeech2.py", line 859, in _integrate_with_spk_embed
    spk_emb = F.normalize(spk_emb).unsqueeze(1).expand(
  File "/home/aistudio/.local/lib/python3.8/site-packages/paddle/nn/functional/norm.py", line 107, in normalize
    out = _C_ops.p_norm(x, float(p), axis, epsilon, True, False)
ValueError: (InvalidArgument) Attr(axis) value should be in range [-R, R-1], R is the rank of Input(X). But received axis: 1, R: 1. Current Input(X)'s shape is=[256].
  [Hint: Expected axis < x_rank, but received axis:1 >= x_rank:1.] (at ../paddle/phi/infermeta/unary.cc:3354)

修复后,以下命令正常执行(可正常训练,并 synthesize):

> CUDA_VISIBLE_DEVICES=0,1 ./local/train.sh conf/default.yaml ./output
[2024-11-28 05:35:12] [INFO] [fastspeech2_updater.py:236] Evaluate: l1_loss: 0.764606, duration_loss: 0.075403, pitch_loss: 0.073118, energy_loss: 0.066784, loss: 0.979911
[2024-11-28 05:35:12] [INFO] [fastspeech2_updater.py:236] Evaluate: l1_loss: 0.764558, duration_loss: 0.092114, pitch_loss: 0.190829, energy_loss: 0.133041, loss: 1.180542

> CUDA_VISIBLE_DEVICES=0,1 ./local/synthesize.sh conf/default.yaml ./output snapshot_iter_1332.pdz
...
s5_397 done!
s5_398, mel: [180, 80], wave: 54000, time: 86s, Hz: 627.9069767441861, RTF: 38.22222222222222.
s5_398 done!
s5_399, mel: [234, 80], wave: 70200, time: 96s, Hz: 731.25, RTF: 32.82051282051282.
s5_399 done!
s5_400, mel: [157, 80], wave: 47100, time: 85s, Hz: 554.1176470588235, RTF: 43.31210191082803.
s5_400 done!
generation speed: 458.7905774202815Hz, RTF: 52.31144923452616

@zxcd @Liyulingyue

Copy link

paddle-bot bot commented Nov 28, 2024

Thanks for your contribution!

@mergify mergify bot added the T2S label Nov 28, 2024
@@ -842,6 +842,8 @@ def _integrate_with_spk_embed(self, hs, spk_emb):
hs = hs + spk_emb.unsqueeze(1)
elif self.spk_embed_integration_type == "concat":
# concat hidden states with spk embeds and then apply projection
if spk_emb.dim() < 2:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还有>2的情况吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是有为 1 的情况 ~

CUDA_VISIBLE_DEVICES=0,1 ./local/synthesize.sh conf/default.yaml ./output snapshot_iter_1332.pdz 这里传进来的shape是 [256]

Copy link
Collaborator

@zxcd zxcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zxcd zxcd merged commit 3e53497 into PaddlePaddle:develop Nov 29, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants