Skip to content

[websocket] added online asr engine #1627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 31, 2022

Conversation

WilliamZhang06
Copy link
Collaborator

@WilliamZhang06 WilliamZhang06 commented Mar 30, 2022

PR types

New features

PR changes

Models

Describe

added websocket framework and added online asr engine

#1623

@WilliamZhang06 WilliamZhang06 added the S2T asr/st label Mar 30, 2022
@WilliamZhang06 WilliamZhang06 added this to the r0.2.0 milestone Mar 30, 2022
@mergify mergify bot added the Server label Mar 30, 2022
@zh794390558 zh794390558 modified the milestones: r0.2.0, r1.0.0 Mar 30, 2022
shift_n = int(self.sample_rate *
(self.shift_ms / 1000.0) * self.sample_width)
offset = 0
timestamp = 0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp应该需要一直累积,显示reset,否则timesstamp不连续

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp += shift_duration 会一直累加步长啊。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每次调用会重新初始化timestamp。应该是个全局变量

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每次调用会重新初始化timestamp。应该是个全局变量

本质上就是一个chunk buffer,只对一个chunk负责,符合类的单一设计原则。实际上timestamp目前还没用到。后续考虑将timestamp作为类变量

self.ring_buffer.append((frame, is_speech))
num_voiced = len(
[f for f, speech in self.ring_buffer if speech])
if num_voiced > self._ratio * self.ring_buffer.maxlen:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

默认0.9的ratio在vad开始时会丢失前面的静音段,开始的ratio可以调低些

self.ring_buffer.append((frame, is_speech))
num_unvoiced = len(
[f for f, speech in self.ring_buffer if not speech])
if num_unvoiced > self._ratio * self.ring_buffer.maxlen:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

结束的ratio 0.9还是可以的。但需要考虑ring_buffer的长度,主要取决于短停是多长时间。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这部分可以再分一个变量出来设置,做到淡入淡出的效果。


# vad for input bytes audio
vad.add_audio(message)
message = b''.join(f for f in vad.vad_collector() if f is not None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vad_collector return none应该是vad的结束,需要reset解码器。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这部分逻辑需要细调下,vad效果不好应该和这部分有关。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种Reset引擎就变成伪流式了。就像上次那个知乎一样。我个人觉得引擎要做单独的功能,后面的业务逻辑重新封装。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不是伪的,中间静音长了,就是新的一条语音了。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VAD这块还要再思考一下。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,这部分对效果影响很大

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不是伪的,中间静音长了,就是新的一条语音了。

这个是应用封装吧,引擎也要做这种事情吗?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要做。

Copy link
Collaborator

@zh794390558 zh794390558 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zh794390558 zh794390558 merged commit 61941d1 into PaddlePaddle:develop Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants