Stream audio to Speech to Text


I’m trying to stream real-time audio to Google Speech to Text engine. Is that possible?

Google’s V2 API has an option of streaming the audio as a file or buffer.
Since we do not have access to the audio buffer in asterisk I tried to stream the recording file.
But it looks like Google’s API stream option from file will not really stream the file while it is being written/recording. That means if I start to stream the file the stream will only go until the point where I started it will not stream the rest of the file written as it goes.

Does anyone had a success in doing it?

I can make it work fine when using the finish recording onSilence then send the full recording to the API and get the text response but that will give me a latency of about 3 seconds after the recording has finished. I would like to improve this latency.

Btw just in case someone is wondering I do not want a two-speaker transcription. The conversation is with an IVR system so I only want the transcription of the person that is talking with it.

Best regards,

EAGI has access to the streaming audio.

The ChanSpy family of applications can be used to get a a copy of the audio stream onto another channel, on which you could run EAGI.

Note that recognition accuracy will be less accurate the closer you get to real time transcription. You are presumably dealing with unknown speakers, and 3.1kHz audio,

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.