Live Speech Recognition using Google or AWS

Hi all :slight_smile:

I have an asterisk box up & running with our stasis app, and we want to use Google Speech API or AWS Transcribe service.

I found some topics on how to setup the speech recognition, but here’s the catch, we need realtime transcription of the call, not just recognizing a sentence.

Is there something that I can use to send the call audio via HTTP/2 to Google Speech API or AWS Transcribe in realtime?

I didn’t find anything that can do the “realtime” part yet.


There is nothing built in explicitly to do this. You’d need to piece things together (Chanspy and UnicastRTP maybe) and do it externally.

I have used speecj recogniton API with Asterisk it works wonderful, but only in Synchronous mode. Asynchronous mode is posible but there are some limitations, the following links will help you


Thanks for your answers,

@jcolp: could you explain a bit further your idea? We already use Chanspy but I don’t get how it can help for live speech recognition.

Also, I found that UniMRCP can do speech recognition via the Google Speech plugin, but it seems that it can’t do “live” speech recognition, anyone is using this?

As there is nothing built into Asterisk to do live as you desire the only real sensible way is to send the audio outside of Asterisk to a third party application which can then read and do what it needs to do. The UnicastRTP module when combined with a Local channel and Chanspy can be used to do this. The UnicastRTP module sends the RTP to a given IP address and port[1].


Hi Jcolp,

Thanks for the hint and the link, I’ll try that :slight_smile:

I create a repo for using GCP to do text to speech and speech to text using python and dockerizing asterisk.
I think that starting from my work using EAGI instead of AGI you should do it:

You don’t need to stream audio to AWS for that, you can process things with offline ASR like Vosk. See the project here:

Vosk supports telephony speech transcription in many languages.