-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[websocket] added online asr engine #1627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
shift_n = int(self.sample_rate * | ||
(self.shift_ms / 1000.0) * self.sample_width) | ||
offset = 0 | ||
timestamp = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timestamp应该需要一直累积,显示reset,否则timesstamp不连续
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timestamp += shift_duration 会一直累加步长啊。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
每次调用会重新初始化timestamp。应该是个全局变量
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
每次调用会重新初始化timestamp。应该是个全局变量
本质上就是一个chunk buffer,只对一个chunk负责,符合类的单一设计原则。实际上timestamp目前还没用到。后续考虑将timestamp作为类变量
self.ring_buffer.append((frame, is_speech)) | ||
num_voiced = len( | ||
[f for f, speech in self.ring_buffer if speech]) | ||
if num_voiced > self._ratio * self.ring_buffer.maxlen: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
默认0.9的ratio在vad开始时会丢失前面的静音段,开始的ratio可以调低些
self.ring_buffer.append((frame, is_speech)) | ||
num_unvoiced = len( | ||
[f for f, speech in self.ring_buffer if not speech]) | ||
if num_unvoiced > self._ratio * self.ring_buffer.maxlen: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
结束的ratio 0.9还是可以的。但需要考虑ring_buffer的长度,主要取决于短停是多长时间。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这部分可以再分一个变量出来设置,做到淡入淡出的效果。
paddlespeech/server/ws/asr_socket.py
Outdated
|
||
# vad for input bytes audio | ||
vad.add_audio(message) | ||
message = b''.join(f for f in vad.vad_collector() if f is not None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vad_collector return none应该是vad的结束,需要reset解码器。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这部分逻辑需要细调下,vad效果不好应该和这部分有关。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种Reset引擎就变成伪流式了。就像上次那个知乎一样。我个人觉得引擎要做单独的功能,后面的业务逻辑重新封装。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不是伪的,中间静音长了,就是新的一条语音了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VAD这块还要再思考一下。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,这部分对效果影响很大
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不是伪的,中间静音长了,就是新的一条语音了。
这个是应用封装吧,引擎也要做这种事情吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要做。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
Models
Describe
added websocket framework and added online asr engine
#1623