【Hackathon 7th】Fundable Projects No.7

# 说明
PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源套件，囊括语音识别、语音合成、语音唤醒、声纹识别等多种语音常用功能的支持。由于近期 Paddle 新版本的升级存在不兼容部分（如 paddle.fluid API 全面退场，PIR + predictor 升级， 0-d tensor，view 行为修改等），需要重新对 PaddleSpeech 中的模型进行适配开发与回归测试，保证套件正常运转。

本Issue说明关了PaddleSpeech的改动，现有教程、文档、模型的验证和支持等情况。

## Docker改进
为了适配最新版本Paddlepaddle(版本3.0.0)，对Docker进行升版 https://github.com/PaddlePaddle/PaddleSpeech/pull/3871

## Demos
本节记录了demos运行验证记录，标识中，N为无故障，E为存在问题，W为存在警告，U为未运行。
### 测试方法
1. 在Aistudio V100 32G 环境下，paddlepaddle-gpu版本为3.0，clone本仓库
2. 手动删除 setup.py 中对paddlepaddle-gpu的依赖
3. 通过 pip install . --user 安装PaddleSpeech
4. 运行Demos中相关命令

### 测试结论与记录
大部分Python API调用正常，部分问题如下：
1. 执行speech_ssl demo时，有错误 TypeError: Wav2vec2ASR.forward() missing 3 required positional arguments: 'wavs_lens_rate', 'target', and 'target_lens'
2. 执行style_fs2 demo时，存在0-D tensor的warning
3. 执行whisper demo时，如果输入文件采样率不是16000，会因Paddle侧算子不支持的数据类型报错。

|名称|说明|问题标识|PR|
|---|---|---|---|
|TTSAndroid|无相关环境，未运行|U|
|TTSArmLinux|Aistudio环境不好，Cmake未成功|U|
|TTSCppFrontend|Aistudio环境不好，Cmake未成功|U|
|asr_deployment|基于SpeechX，暂不验证|U|
|audio_content_search|未运行|U|
|audio_searching|未运行|U|
|audio_tagging|Python 成功运行|N|
|automatic_video_subtitiles|Python 成功运行|N|
|custom_streaming_asr|未运行|U|
|keyword_spotting|Python 成功运行|N|
|metaverse|未运行，该脚本和 PaddleGAN 绑定，可能会冲突|U|
|punctuation_restoration|Python 成功运行|N|
|speaker_verification|Python 成功运行|N|
|speech_recognition|Python 成功运行|N|
|speech_server|未运行|U|
|speech_ssl|TypeError: Wav2vec2ASR.forward() missing 3 required positional arguments: 'wavs_lens_rate', 'target', and 'target_lens'|E|https://github.com/PaddlePaddle/PaddleSpeech/pull/3872|
|speech_translation|Python 运行成功|N|
|speech_web|未运行|U|
|story_talker|Numpy版本导致了错误，AttributeError: module 'numpy' has no attribute 'complex'.|E|
|streaming_asr_server|未运行|U|
|streaming_tts_server|未运行|U|
|streaming_tts_serving_fastdeploy|未运行|U|
|style_fs2|成功运行，存在warning /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py:2082: UserWarning: Skip loading for encoder.embed.1.alpha. encoder.embed.1.alpha receives a shape [1], but the expected shape is [].|W|
|text_to_speech|Python 成功运行|N|
|whisper|未配置好16000的wav导致没运行成功，此外转码后代码会由于Paddle算子报错|E|

## Examples
待补充

## Models
待补充

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【Hackathon 7th】Fundable Projects No.7 #3870

说明

Docker改进

Demos

测试方法

测试结论与记录

Examples

Models

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

名称	说明	问题标识	PR
TTSAndroid	无相关环境，未运行	U
TTSArmLinux	Aistudio环境不好，Cmake未成功	U
TTSCppFrontend	Aistudio环境不好，Cmake未成功	U
asr_deployment	基于SpeechX，暂不验证	U
audio_content_search	未运行	U
audio_searching	未运行	U
audio_tagging	Python 成功运行	N
automatic_video_subtitiles	Python 成功运行	N
custom_streaming_asr	未运行	U
keyword_spotting	Python 成功运行	N
metaverse	未运行，该脚本和 PaddleGAN 绑定，可能会冲突	U
punctuation_restoration	Python 成功运行	N
speaker_verification	Python 成功运行	N
speech_recognition	Python 成功运行	N
speech_server	未运行	U
speech_ssl	TypeError: Wav2vec2ASR.forward() missing 3 required positional arguments: 'wavs_lens_rate', 'target', and 'target_lens'	E	#3872
speech_translation	Python 运行成功	N
speech_web	未运行	U
story_talker	Numpy版本导致了错误，AttributeError: module 'numpy' has no attribute 'complex'.	E
streaming_asr_server	未运行	U
streaming_tts_server	未运行	U
streaming_tts_serving_fastdeploy	未运行	U
style_fs2	成功运行，存在warning /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py:2082: UserWarning: Skip loading for encoder.embed.1.alpha. encoder.embed.1.alpha receives a shape [1], but the expected shape is [].	W
text_to_speech	Python 成功运行	N
whisper	未配置好16000的wav导致没运行成功，此外转码后代码会由于Paddle算子报错	E

【Hackathon 7th】Fundable Projects No.7 #3870

Description

说明

Docker改进

Demos

测试方法

测试结论与记录

Examples

Models

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions