[websocket] added online asr engine #1627

WilliamZhang06 · 2022-03-30T08:48:07Z

PR types

New features

PR changes

Models

Describe

added websocket framework and added online asr engine

#1623

paddlespeech/server/conf/application.yaml

zh794390558 · 2022-03-30T09:05:14Z

paddlespeech/server/utils/buffer.py

+        shift_n = int(self.sample_rate *
+                      (self.shift_ms / 1000.0) * self.sample_width)
+        offset = 0
+        timestamp = 0.0


timestamp应该需要一直累积，显示reset，否则timesstamp不连续

timestamp += shift_duration 会一直累加步长啊。

每次调用会重新初始化timestamp。应该是个全局变量

每次调用会重新初始化timestamp。应该是个全局变量

本质上就是一个chunk buffer，只对一个chunk负责，符合类的单一设计原则。实际上timestamp目前还没用到。后续考虑将timestamp作为类变量

paddlespeech/server/utils/buffer.py

zh794390558 · 2022-03-31T02:16:13Z

paddlespeech/server/utils/vad.py

+                self.ring_buffer.append((frame, is_speech))
+                num_voiced = len(
+                    [f for f, speech in self.ring_buffer if speech])
+                if num_voiced > self._ratio * self.ring_buffer.maxlen:


默认0.9的ratio在vad开始时会丢失前面的静音段，开始的ratio可以调低些

zh794390558 · 2022-03-31T02:17:52Z

paddlespeech/server/utils/vad.py

+                self.ring_buffer.append((frame, is_speech))
+                num_unvoiced = len(
+                    [f for f, speech in self.ring_buffer if not speech])
+                if num_unvoiced > self._ratio * self.ring_buffer.maxlen:


结束的ratio 0.9还是可以的。但需要考虑ring_buffer的长度，主要取决于短停是多长时间。

这部分可以再分一个变量出来设置，做到淡入淡出的效果。

paddlespeech/server/ws/asr_socket.py

zh794390558 · 2022-03-31T02:20:01Z

paddlespeech/server/ws/asr_socket.py

+
+                # vad for input bytes audio
+                vad.add_audio(message)
+                message = b''.join(f for f in vad.vad_collector() if f is not None)


vad_collector return none应该是vad的结束，需要reset解码器。

这部分逻辑需要细调下，vad效果不好应该和这部分有关。

这种Reset引擎就变成伪流式了。就像上次那个知乎一样。我个人觉得引擎要做单独的功能，后面的业务逻辑重新封装。

不是伪的，中间静音长了，就是新的一条语音了。

VAD这块还要再思考一下。

是的，这部分对效果影响很大

不是伪的，中间静音长了，就是新的一条语音了。

这个是应用封装吧，引擎也要做这种事情吗？

需要做。

zh794390558

LGTM

added online asr engine , test=doc

d847fe2

WilliamZhang06 added the S2T asr/st label Mar 30, 2022

WilliamZhang06 added this to the r0.2.0 milestone Mar 30, 2022

mergify bot added the Server label Mar 30, 2022

zh794390558 modified the milestones: r0.2.0, r1.0.0 Mar 30, 2022

lym0302 reviewed Mar 30, 2022

View reviewed changes

paddlespeech/server/conf/application.yaml Outdated Show resolved Hide resolved

lym0302 reviewed Mar 30, 2022

View reviewed changes

paddlespeech/server/conf/application.yaml Show resolved Hide resolved

zh794390558 reviewed Mar 31, 2022

View reviewed changes

fixed comments, test=doc

2ec8d60

zh794390558 approved these changes Mar 31, 2022

View reviewed changes

zh794390558 merged commit 61941d1 into PaddlePaddle:develop Mar 31, 2022

[websocket] added online asr engine #1627

[websocket] added online asr engine #1627

Uh oh!

Conversation

WilliamZhang06 commented Mar 30, 2022 • edited by zh794390558 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Describe

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zh794390558 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

WilliamZhang06 commented Mar 30, 2022 •

edited by zh794390558

Loading