Live Speech Recognition using Google or AWS

Cyril.r · January 4, 2019, 3:43pm

Hi all

I have an asterisk box up & running with our stasis app, and we want to use Google Speech API or AWS Transcribe service.

I found some topics on how to setup the speech recognition, but here’s the catch, we need realtime transcription of the call, not just recognizing a sentence.

Is there something that I can use to send the call audio via HTTP/2 to Google Speech API or AWS Transcribe in realtime?

I didn’t find anything that can do the “realtime” part yet.

Thanks!

jcolp · January 4, 2019, 6:25pm

There is nothing built in explicitly to do this. You’d need to piece things together (Chanspy and UnicastRTP maybe) and do it externally.

ambiorixg12 · January 5, 2019, 1:59am

I have used speecj recogniton API with Asterisk it works wonderful, but only in Synchronous mode. Asynchronous mode is posible but there are some limitations, the following links will help you

https://cloud.google.com/speech-to-text/docs/async-recognize

https://cloud.google.com/speech-to-text/quotas

Cyril.r · January 7, 2019, 9:23am

Hello,

Thanks for your answers,

@jcolp: could you explain a bit further your idea? We already use Chanspy but I don’t get how it can help for live speech recognition.

Also, I found that UniMRCP can do speech recognition via the Google Speech plugin, but it seems that it can’t do “live” speech recognition, anyone is using this?

jcolp · January 7, 2019, 11:14am

As there is nothing built into Asterisk to do live as you desire the only real sensible way is to send the audio outside of Asterisk to a third party application which can then read and do what it needs to do. The UnicastRTP module when combined with a Local channel and Chanspy can be used to do this. The UnicastRTP module sends the RTP to a given IP address and port[1].

[1] https://www.joshua-colp.com/2014/broadcasting-asterisk-conferences/

Cyril.r · January 7, 2019, 2:58pm

Hi Jcolp,

Thanks for the hint and the link, I’ll try that

pierpa · August 20, 2021, 8:34am

I create a repo for using GCP to do text to speech and speech to text using python and dockerizing asterisk.
I think that starting from my work using EAGI instead of AGI you should do it:

nshmyrev · September 18, 2021, 11:02pm

You don’t need to stream audio to AWS for that, you can process things with offline ASR like Vosk. See the project here:

Vosk supports telephony speech transcription in many languages.

harding · October 20, 2023, 2:34pm

Hi nshmyrev. I’m trying to use the vosk-asterisk module with no luck. The text file produced are always empty

nshmyrev · October 20, 2023, 11:41pm

Sorry, I’m not sure what text file are you looking for. Vosk-asterisk processes audio on the fly.

If you try to recognize a file, you probably need to convert audio to proper format first before recognition.

You might also try recently added vosk-ari setup: https://github.com/alphacep/vosk-server/tree/master/client-samples/asterisk-ari

Let us know the details

harding · October 21, 2023, 8:13am

Hi nshmyrev, thanks for the reply.

My goal is to have a real time speech to text system with my Asterisk
I followed these instructions (GitHub - alphacep/vosk-asterisk: Speech Recognition in Asterisk with Vosk Server) and the result is that in /var/spool/asterisk/voicemail/default//INBOX folder I started to have new files called, for example, msg0000.wav , msg0000gsm, msg0000.txt

I assumed the txt files were the call transcripts but they are almost empty. There was only info about the files. No transcriptions.

I didn’t understand well the module scope?

nshmyrev · October 22, 2023, 8:58pm

Feels so. Vosk-asterisk implements Asterisk Speech API (res_speech) module. You can use it in dialplan with SpeechBackground command. It has no relation to voicemail.

harding · October 23, 2023, 7:05am

Ok.
Is there an example or tutorial to view the module in action and the code to use it?

This is my dialplan.

[internal]
exten = 1,1,Answer
same = n,Wait(1)
same = n,SpeechCreate
same = n,SpeechBackground(hello)
same = n,Verbose(0,Result was ${SPEECH_TEXT(0)})

nshmyrev · October 23, 2023, 8:00am

Your dialplan is ok. You can now call the corresponding extension and check the log for results.

harding · October 23, 2023, 9:22am

If I call “1”, nothing happens on my microsip (no error message)

If I call “1234” (for example) in the asterisk CLI I can se this error message

res_speech_vosk.c: No such configuration file res_speech_vosk.conf

nshmyrev · October 23, 2023, 8:02pm

So you need to install the module properly and make sure res_speech_vosk.conf is in the etc folder. Then the module will load.

Topic		Replies	Views
Speech recognition Asterisk Support	3	190	July 27, 2024
Speech to text in Asterisk Asterisk Support	8	4004	May 13, 2017
Transcribe Audio to File - Realtime Asterisk APIs	3	998	May 27, 2023
Asterisk sip trunk real-time audio and speech to text Asterisk Integration	1	2743	February 24, 2019
Asterisk 15, Jack, streams, speech recognition... so many questions! Asterisk APIs	24	9090	September 19, 2018

Live Speech Recognition using Google or AWS

Related topics