Speech: encoding for speech to text ?



API: Speech
MAX OSX
Python v35

I'm trying to set up a basic example for speech to text. 
I've used ffmpeg to extract audio from an mp4, then convert this audio from mp3 to flac. 

My code is as follows (as per the example on the SPEECH API documentation)

import io
import os

# Imports the Google Cloud client library
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types

# Instantiates a client
client = speech.SpeechClient()

# The name of the audio file to transcribe
file_name = os.path.join(
    os.path.dirname(__file__),
    'data','mp4s', 'audio',
    '0BuayZmFrINBZHBG7uHMAI4U6xx4MkRC.flac')

# Loads the audio into memory
with io.open(file_name, 'rb') as audio_file:
    content = audio_file.read()
    audio = types.RecognitionAudio(content=content)

config = types.RecognitionConfig(
    # encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
    encoding='FLAC',
    sample_rate_hertz=48000,
    language_code='en-US')
import pdb;pdb.set_trace()
# Detects speech in the audio file
response = client.recognize(config, audio)

for result in response.results:
    print('Transcript: {}'.format(result.alternatives[0].transcript))


The current error I'm trying to debug is as follows:
google.gax.errors.RetryError: RetryError(Exception occurred in retry method that was not classified as transient, caused by <_Rendezvous of RPC that terminated with (StatusCode.INVALID_ARGUMENT, Invalid audio channel count)>)

Haven't seen anything about this on the googles, so pardon if its a repeat. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speech: encoding for speech to text ? #4360

Imports the Google Cloud client library

Instantiates a client

The name of the audio file to transcribe

Loads the audio into memory

Detects speech in the audio file

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Speech: encoding for speech to text ? #4360

Description

Imports the Google Cloud client library

Instantiates a client

The name of the audio file to transcribe

Loads the audio into memory

Detects speech in the audio file

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions