Skip to content

Conversation

Echo-Nie
Copy link
Contributor

@Echo-Nie Echo-Nie commented Mar 15, 2025

PR types

Function optimization, Docs

PR changes

Docs, Others

Describe

本次修改主要包含:

  • 对 examples/csmsc/ 文件夹下的 tts0tts2tts3;以及examples/csmsc/tts3_rhy/ 下的READEME.md文档和run.sh脚本均进行修改。其中,在修改过程中发现 tts3 下还有README_cn.md文档,也对其同时进行修改。

  • 脚本优化: 为 run.sh 中的合成阶段添加 --stage 参数,根据对应的sh下文件的合成阶段进行stage添加

  • 文档完善: 在 README.md 中补充 stage 参数说明,明确 vocoder 选择逻辑,优化文档措辞,如将0 or 1 or 2 or 3 ...改为0-4

Issue链接:#3997

@luotao1 @zxcd

Copy link

paddle-bot bot commented Mar 15, 2025

Thanks for your contribution!

@Echo-Nie Echo-Nie changed the title PaddleSpeech 快乐开源活动【任务二:No.7-10】 【Doc】补全合成系列中的脚本中参数缺失No.7-10 Mar 16, 2025
@luotao1 luotao1 changed the title 【Doc】补全合成系列中的脚本中参数缺失No.7-10 【PaddleSpeech No.7-10】补全合成系列中的脚本中参数缺失 Mar 17, 2025
@luotao1 luotao1 added the HappyOpenSource 快乐开源活动issue与PR label Mar 17, 2025
@zxcd
Copy link
Collaborator

zxcd commented Mar 18, 2025

建议: 是否可以让开发者们参考 tss3 下的中文文档为项目的其他文件构建类似的中文文档呢?

如果您看到了文档缺失的部分,可以提出来,我会把它继续新增到快乐开源任务中。

fi

if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
# synthesize_e2e, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
# synthesize_e2e, vocoder is pwgan by default stage 0, stage 1 will use hifigan as vocoder
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with above

```
`--stage` 用于合成过程中控制声码器模型,可取值为 `0` 或 `1`,分别对应使用 `pwgan` 或 `hifigan` 模型作为声码器。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with above. pls check all files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also change this.

```
`--stage` controls the vocoder model during synthesis, which can be `0` or `1` or `2` or `3`, use `pwgan` or `multi band melgan` or `style melgan` or `hifigan`model as vocoder.
Copy link
Collaborator

@zxcd zxcd Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't use dict to present this message?
such as use stage 0-4 to select the vocoder to use {pwgan, multi band melgan, ....}
This kind of expression is a bit cumbersome now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I checked the README and sh files in the four folders under csmsc and believe there should be no issues.

Copy link
Contributor Author

@Echo-Nie Echo-Nie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zxcd pls review

1. **source path**.
2. preprocess the dataset.
3. train the model.
4. synthesize wavs.
- synthesize waveform from `metadata.jsonl`.
- use stage `1,3,4` to select the vocoder to use {`multi band melgan`, `hifigan`, `wavernn`}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the usage of synthesize.sh and synthesize_e2e.sh like other files.

@@ -14,11 +14,13 @@ Remember in our repo, you should add `--rhy-with-duration` flag to obtain the rh
Assume the path to the dataset is `~/datasets/BZNSYP`.
Assume the path to the MFA result of CSMSC is `./baker_alignment_tone`.
Run the command below to

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also change ths file add stage information?

@@ -28,11 +28,12 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
fi

if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
# synthesize, vocoder is pwgan by default
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
# synthesize, vocoder is pwgan by default stage 0, stage 1 will use hifigan as vocoder
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stage 0-4

```
`--stage` 用于合成过程中控制声码器模型,可取值为 `0` 或 `1`,分别对应使用 `pwgan` 或 `hifigan` 模型作为声码器。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also change this.

@Echo-Nie
Copy link
Contributor Author

Echo-Nie commented Apr 3, 2025

This PR has too much content and is a bit messy, so close

@Echo-Nie Echo-Nie closed this Apr 3, 2025
@Echo-Nie Echo-Nie deleted the csmscUpdate branch April 9, 2025 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants