Skip to content

Speech: encoding for speech to text ? #4360

@amgsharma

Description

@amgsharma

API: Speech
MAX OSX
Python v35

I'm trying to set up a basic example for speech to text.
I've used ffmpeg to extract audio from an mp4, then convert this audio from mp3 to flac.

My code is as follows (as per the example on the SPEECH API documentation)

import io
import os

Imports the Google Cloud client library

from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types

Instantiates a client

client = speech.SpeechClient()

The name of the audio file to transcribe

file_name = os.path.join(
os.path.dirname(file),
'data','mp4s', 'audio',
'0BuayZmFrINBZHBG7uHMAI4U6xx4MkRC.flac')

Loads the audio into memory

with io.open(file_name, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)

config = types.RecognitionConfig(
# encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
encoding='FLAC',
sample_rate_hertz=48000,
language_code='en-US')
import pdb;pdb.set_trace()

Detects speech in the audio file

response = client.recognize(config, audio)

for result in response.results:
print('Transcript: {}'.format(result.alternatives[0].transcript))

The current error I'm trying to debug is as follows:
google.gax.errors.RetryError: RetryError(Exception occurred in retry method that was not classified as transient, caused by <_Rendezvous of RPC that terminated with (StatusCode.INVALID_ARGUMENT, Invalid audio channel count)>)

Haven't seen anything about this on the googles, so pardon if its a repeat.

Metadata

Metadata

Assignees

Labels

api: speechIssues related to the Speech-to-Text API.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions