-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
API: Speech
MAX OSX
Python v35
I'm trying to set up a basic example for speech to text.
I've used ffmpeg to extract audio from an mp4, then convert this audio from mp3 to flac.
My code is as follows (as per the example on the SPEECH API documentation)
import io
import os
Imports the Google Cloud client library
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
Instantiates a client
client = speech.SpeechClient()
The name of the audio file to transcribe
file_name = os.path.join(
os.path.dirname(file),
'data','mp4s', 'audio',
'0BuayZmFrINBZHBG7uHMAI4U6xx4MkRC.flac')
Loads the audio into memory
with io.open(file_name, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(
# encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
encoding='FLAC',
sample_rate_hertz=48000,
language_code='en-US')
import pdb;pdb.set_trace()
Detects speech in the audio file
response = client.recognize(config, audio)
for result in response.results:
print('Transcript: {}'.format(result.alternatives[0].transcript))
The current error I'm trying to debug is as follows:
google.gax.errors.RetryError: RetryError(Exception occurred in retry method that was not classified as transient, caused by <_Rendezvous of RPC that terminated with (StatusCode.INVALID_ARGUMENT, Invalid audio channel count)>)
Haven't seen anything about this on the googles, so pardon if its a repeat.