Live Speech Recognition using Google or AWS

Hi all :slight_smile:

I have an asterisk box up & running with our stasis app, and we want to use Google Speech API or AWS Transcribe service.

I found some topics on how to setup the speech recognition, but here’s the catch, we need realtime transcription of the call, not just recognizing a sentence.

Is there something that I can use to send the call audio via HTTP/2 to Google Speech API or AWS Transcribe in realtime?

I didn’t find anything that can do the “realtime” part yet.

Thanks!

There is nothing built in explicitly to do this. You’d need to piece things together (Chanspy and UnicastRTP maybe) and do it externally.

I have used speecj recogniton API with Asterisk it works wonderful, but only in Synchronous mode. Asynchronous mode is posible but there are some limitations, the following links will help you

https://cloud.google.com/speech-to-text/docs/async-recognize

https://cloud.google.com/speech-to-text/quotas

Hello,

Thanks for your answers,

@jcolp: could you explain a bit further your idea? We already use Chanspy but I don’t get how it can help for live speech recognition.

Also, I found that UniMRCP can do speech recognition via the Google Speech plugin, but it seems that it can’t do “live” speech recognition, anyone is using this?

As there is nothing built into Asterisk to do live as you desire the only real sensible way is to send the audio outside of Asterisk to a third party application which can then read and do what it needs to do. The UnicastRTP module when combined with a Local channel and Chanspy can be used to do this. The UnicastRTP module sends the RTP to a given IP address and port[1].

[1] https://www.joshua-colp.com/2014/broadcasting-asterisk-conferences/

Hi Jcolp,

Thanks for the hint and the link, I’ll try that :slight_smile:

I create a repo for using GCP to do text to speech and speech to text using python and dockerizing asterisk.
I think that starting from my work using EAGI instead of AGI you should do it:

You don’t need to stream audio to AWS for that, you can process things with offline ASR like Vosk. See the project here:

Vosk supports telephony speech transcription in many languages.

Hi nshmyrev. I’m trying to use the vosk-asterisk module with no luck. The text file produced are always empty

Sorry, I’m not sure what text file are you looking for. Vosk-asterisk processes audio on the fly.

If you try to recognize a file, you probably need to convert audio to proper format first before recognition.

You might also try recently added vosk-ari setup: https://github.com/alphacep/vosk-server/tree/master/client-samples/asterisk-ari

Let us know the details

Hi nshmyrev, thanks for the reply.

My goal is to have a real time speech to text system with my Asterisk
I followed these instructions (GitHub - alphacep/vosk-asterisk: Speech Recognition in Asterisk with Vosk Server) and the result is that in /var/spool/asterisk/voicemail/default//INBOX folder I started to have new files called, for example, msg0000.wav , msg0000gsm, msg0000.txt

I assumed the txt files were the call transcripts but they are almost empty. There was only info about the files. No transcriptions.

I didn’t understand well the module scope?

Feels so. Vosk-asterisk implements Asterisk Speech API (res_speech) module. You can use it in dialplan with SpeechBackground command. It has no relation to voicemail.

Ok.
Is there an example or tutorial to view the module in action and the code to use it?

This is my dialplan.

[internal]
exten = 1,1,Answer
same = n,Wait(1)
same = n,SpeechCreate
same = n,SpeechBackground(hello)
same = n,Verbose(0,Result was ${SPEECH_TEXT(0)})

Your dialplan is ok. You can now call the corresponding extension and check the log for results.

If I call “1”, nothing happens on my microsip (no error message)

If I call “1234” (for example) in the asterisk CLI I can se this error message

res_speech_vosk.c: No such configuration file res_speech_vosk.conf

So you need to install the module properly and make sure res_speech_vosk.conf is in the etc folder. Then the module will load.