Streaming from ARI snoop channel for Speech recognition

Hi all,

I’d like to stream audio in realtime to an external speech engine (Google, Watson). I am using ARI API’s to get the recoding path of the audio file and I am reading continuously from that file, while asterisk is still writing to it parallel.

Is this a good approach to stream audio for realtime transcription. Is anybody doing realtime transcription from Asterisk. I know there are multiple approaches for speech transcription, If it is possible can you share your approach. I am talking about streaming to cloud based speech engines like Google, Watson, nuance etc.

Hi,

I’m trying to do the exact same thing you had problem on.
I’m currently fighting for getting the stream audio from Node.js ARI-Client and I can’t figure out how to do it.
I can imagine you could find a way to solve your problem, but whatever happened, if you could share your work it could be a great help for me!

Thank you in advance.

ARI itself does not currently provide a mechanism for getting the audio stream.

1 Like

Hi, thank you for this quick answer.
Does ARI provide one for getting the audio file instead?

ARI itself no, but I’ve heard that it may be possible to configure the HTTP server itself to allow downloading of such files. I don’t have any experience with it though.

Maybe found a solution :wink:
This example allow to send POST request with a file stored on the disk : https://gist.github.com/alepez/9205394
On my environment, records are stored at this location /var/spool/asterisk/recording
So I just had to replace “filename” variable with ‘/var/spool/asterisk/recording/’+recording.name+’.wav’

to get a live data, I think you need to start here
https://wiki.asterisk.org/wiki/display/AST/External+Media+and+ARI

@yeya yeah and heres an example project on how to use it to connect it to dialogflow - theres another in the nimble ape github org that takes audio and sends it to google.

This one and it’s associated ARI bridge project actually uses snoop

2 Likes

@[danjenkins] Could please inform us (or anyone else) if I we can do the following?

We have an asterisk that receives calls from several users at the same time. For each phone call we create a sound file and record only the Tx audio from the caller. After hang-up, we transfer the audio file to the google cloud and we use the STT API to receive the transcription text.

Could we made this with realtime streaming for each call channel separately?
Also, if we need that for 20 simultaneous calls, what processor/memory resources we will need?

thank you in advance

Yes, 100% possible with streaming.

Resources - very little because youre just passing media from A to B

dialogflow is a good solution for this ?

If you just want to do transcription… just use google’s speech to text engine. You can find an example of how to do that over at https://github.com/nimbleape/dana-tsg-rtp-stt-audioserver

Thank you very much for your reply. I’ll check it.