Skip to content

为什么使用ERNIE-SAT声音克隆,从中文生成的英文语音完全听不懂? #2586

@guo453585719

Description

@guo453585719

您好,我正在尝试https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3_vctk/ernie_sat的声音克隆,发现使用我自己的中文语音生成的英文语音完全听不懂,和乱码一样。

我的输入是:
--old_str="除此之外,在看电影生肉时,中文字幕搭配中文原声,让你更加声临其境。"
--new_str="In addition, when watching the movie raw meat, Chinese subtitles with the Chinese original sound, let you more immersive."

我打印了synthesize_e2e.py中的with_dur_outs,并没有发现什么问题:
with_dur_outs:{'new_wav': array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), 'new_phns': ['sp', 'ch', 'u2', 'c', 'ii3', 'zh', 'iii1', 'uai4', 'sp', 'z', 'ai4', 'k', 'an4', 'd', 'ian4', 'ing3', 'sh', 'eng1', 'r', 'ou4', 'sh', 'iii2', 'sp', 'zh', 'ong1', 'uen2', 'z', 'ii4', 'm', 'u4', 'd', 'a1', 'p', 'ei4', 'zh', 'ong1', 'uen2', 'van2', 'sh', 'eng1', 'sp', 'r', 'ang4', 'n', 'i3', 'g', 'eng4', 'j', 'ia1', 'sh', 'eng1', 'l', 'in2', 'q', 'i2', 'j', 'ing4', 'sp', 'IH0', 'N', 'AH0', 'D', 'IH1', 'SH', 'AH0', 'N', 'W', 'EH1', 'N', 'W', 'AA1', 'CH', 'IH0', 'NG', 'DH', 'AH0', 'M', 'UW1', 'V', 'IY0', 'R', 'AA1', 'M', 'IY1', 'T', 'CH', 'AY0', 'N', 'IY1', 'Z', 'S', 'AH1', 'B', 'T', 'AY2', 'T', 'AH0', 'L', 'Z', 'W', 'IH1', 'DH', 'DH', 'AH0', 'CH', 'AY0', 'N', 'IY1', 'Z', 'ER0', 'IH1', 'JH', 'AH0', 'N', 'AH0', 'L', 'S', 'AW1', 'N', 'D', 'L', 'EH1', 'T', 'Y', 'UW1', 'M', 'AO1', 'R', 'spn'], 'new_mfa_start': [0, 75, 84, 91, 102, 109, 116, 120, 163, 168, 183, 196, 210, 222, 232, 249, 253, 263, 270, 279, 288, 300, 327, 345, 356, 370, 379, 387, 394, 404, 416, 425, 437, 448, 459, 468, 481, 493, 511, 522, 551, 569, 575, 588, 594, 604, 610, 620, 627, 648, 660, 671, 675, 689, 706, 711, 719, 741, 767, 779, 788, 800, 812, 824, 841, 850, 862, 871, 878, 894, 905, 926, 943, 949, 960, 966, 975, 989, 1006, 1017, 1034, 1050, 1064, 1080, 1099, 1105, 1122, 1146, 1157, 1174, 1190, 1206, 1217, 1226, 1243, 1259, 1268, 1275, 1287, 1303, 1314, 1320, 1331, 1338, 1347, 1364, 1383, 1392, 1408, 1425, 1449, 1458, 1470, 1476, 1482, 1489, 1500, 1526, 1547, 1556, 1562, 1576, 1587, 1601, 1610, 1624, 1635, 1644, 1653], 'new_mfa_end': [75, 84, 91, 102, 109, 116, 120, 163, 168, 183, 196, 210, 222, 232, 249, 253, 263, 270, 279, 288, 300, 327, 345, 356, 370, 379, 387, 394, 404, 416, 425, 437, 448, 459, 468, 481, 493, 511, 522, 551, 569, 575, 588, 594, 604, 610, 620, 627, 648, 660, 671, 675, 689, 706, 711, 719, 741, 767, 779, 788, 800, 812, 824, 841, 850, 862, 871, 878, 894, 905, 926, 943, 949, 960, 966, 975, 989, 1006, 1017, 1034, 1050, 1064, 1080, 1099, 1105, 1122, 1146, 1157, 1174, 1190, 1206, 1217, 1226, 1243, 1259, 1268, 1275, 1287, 1303, 1314, 1320, 1331, 1338, 1347, 1364, 1383, 1392, 1408, 1425, 1449, 1458, 1470, 1476, 1482, 1489, 1500, 1526, 1547, 1556, 1562, 1576, 1587, 1601, 1610, 1624, 1635, 1644, 1653, 1785], 'old_span_bdy': [767, 767], 'new_span_bdy': [767, 1785]}

我的操作应该没有问题,因为我尝试从英文语音克隆生成中文语音是成功的,虽然音色不太像,但是合成的语音质量很不错。我不知道为什么中文克隆到英文就失败,请问您能帮下我嘛?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions