Asterisk 15, Jack, streams, speech recognition... so many questions!

It’s likely because Asterisk modules are written in C, and it’s more difficult to do things in that fashion. Using the Record and ship it off using Python, etc, is just easier and gets the job done for a lot of people to where they find it acceptable.

1 Like

Ah, OK, I understand now. Thanks - so as I understand (correct me if wrong)

If I want to do “realtime” passing of audio to an API, then Asterisk Speech Recognition might do it, Jack would probably not do it, but for ease of use, just stick the the current AGI scheme that everyone else is using? As long as there’s some kind of comforting “we’re working on it” noise going on, the delay isn’t too bad, it’s just that I was trying to almost eliminate it.

(Or, of course, I could hire someone who knows a bit of C to write a script - shouldn’t be rocket science, right? - for example, there are already official libraries for Google Speech to Text in C#, GO, JAVA, NODE.JS, PHP, PYTHON and RUBY!)

The Asterisk Speech Recognition does provide a stream of audio as received to the implementation. So it will most certainly do that. But yes, the AGI approach is the easiest.

1 Like

OK, continuing on to the next stage…

Briefly: I want to be able to have “press or say (number)”, with Asterisk listening for a spoken number, but accepting a DTMF digit, too.

I’m posting everything I found so far, here, partly to show working, but also in case anyone else finds it useful. So, moving on…

This looked hopeful for a moment until I realised that it doesn’t do DTMF:
https://wiki.asterisk.org/wiki/display/AST/Asterisk+15+Application_SpeechBackground

So then there’s https://wiki.asterisk.org/wiki/display/AST/Asterisk+15+Application_Record, which can terminate on any DTMF key with “y”, but according to the docs, “RECORD_STATUS” only sets a flag of “DTMF” (A terminating DTMF was received (’#’ or ‘*’, depending upon option ‘t’)).
So, I don’t get to know which key was pressed via that method, either.

There’s very little information I can find about the built-in functions for speech recognition.
https://wiki.asterisk.org/wiki/display/AST/Speech+Recognition+API doesn’t actually explain how to integrate the actual speech engines.

In this previous forum post, Asterisk 15, Jack, streams, speech recognition... so many questions! , jcolp explained that most people don’t use the speech interface anyway, because
"Asterisk modules are written in C, and it’s more difficult to do things in that fashion. Using the Record and ship it off using Python, etc, is just easier and gets the job done for a lot of people to where they find it acceptable.
So, AGI it is! But I’m still stuck on how I record for speech AND get a DTMF if it was dialled.

Regarding speech in general, even “Asterisk - The Definitive Guide” just says:

“Asterisk does not have speech recognition built in, but there are many third-party speech
recognition packages that integrate with Asterisk. Much of that is outside of the scope
of this book, as those applications are external to Asterisk” - helpful!

The speech-rec mailing list at http://lists.digium.com/pipermail/asterisk-speech-rec/ hasn’t been posted to since 2013

Someone else asked about speech recognition and unimrcp in this post:
http://lists.digium.com/pipermail/asterisk-users/2017-February/290875.html

uniMCRP https://mojolingo.com/blog/2015/speech-rec-asterisk-get-started/
http://www.unimrcp.org/manuals/html/AsteriskManual.html#_Toc424230605
This has a Google Speech Recogniser plugin, but it’s $50 per channel http://www.unimrcp.org/gsr

Reasons to use Lex over Google TTS
• Has just been released in eu-west-1: https://forums.aws.amazon.com/ann.jspa?annID=5186
• Supports 8KHz telepony https://forums.aws.amazon.com/ann.jspa?annID=4775
• Is in the core AWS SDK http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/LexRuntime.html
• Has a number slot type: http://docs.aws.amazon.com/lex/latest/dg/built-in-slot-number.html

  • this means no accidental recognition of “won”, “one” or “juan” instead of 1!

The pricing is definitely right: “The cost for 1,000 speech requests would be $4.00, and 1,000 text requests would cost $0.75. From the date you get started with Amazon Lex, you can process up to 10,000 text requests and 5,000 speech requests per month for free for the first year”.

Amazon Transcribe looks promising too, but is only available for developer invitation at this time:
https://aws.amazon.com/transcribe/ https://aws.amazon.com/transcribe/pricing/

But all I need now is the quickest, simplest way to send Lex a short 8KHz file and get a single digit back, as quickly and reliably as possible.

Before I travel too far down this road, can someone point me in the right direction and possibly steer me away from the wrong path?!

Hi,

@lardconcepts Did you make some progresses in your project ? I am trying to do the same with google speech api, I have something working with agi but it introduces some delay and noise which make the solution not good enough to be usable.

To be honest, I gave up for the time being. But what I DID discover was that silence takes as long to process as speech. If you have gaps at the start, end, and between words, you can definitely dramatically speed processing time by removing it - see here: https://unix.stackexchange.com/questions/293376/remove-silence-from-audio-files-while-leaving-gaps

1 Like

@lardconcepts I have a question - I am new to asterisk an I am deep in the cycle you had with this topic.
I am trying to put my hands on the incoming audio stream from the callee and process it. This time in my own application and not in external service like google/amazon speech recognition but it shouldn’t really matter.
Giving up is not an option!
Can you share some insight regarding the AGI/EAGI implementation you did?
I am trying to do it with python and so far am only able to execute the python script I found in https://github.com/ederwander/Asterisk-Google-Speech-Recognition but I am just starting and found your post somehow and happy to see that I am doing the same path as you did with all the different options available in Asterisk.
Any tips for doing that will help!

Thanks

Hi @dannyvoca - well, basically, I’m at the same stage as I was in April, which is that I gave up trying to do the streams thing, and just kept to removing the silence and sticking with plain old AGI.
One change I was going to make was to use Node instead, and take advantage of its far-simpler async operations, in order to be able to play something to the caller while the audio processed.

I’ve had to take a step back from Asterisk development this month, but I’ll try and post something up for you soon. If I’ve not replied in 4 days, nudge me!

Thanks @lardconcepts
I am trying too with Asterisk webRTC and node.js webRTC client.
Another way is to use EAGI which according to asterisk documentation should send the audio to the script executing. I hope I will have something new for you too in a couple of days.

Thanks
Danny

Hi, @lardconcepts and anyone else that is interested, I got what I need with EAGI using python script.
For every incoming call the (E)AGI app executes a python script in a different process from which you can control the sequence of the call with AGI commands.
See this for what it is capable of:
https://wiki.asterisk.org/wiki/display/AST/Asterisk+15+AGI+Commands

The E in EAGI enable you to read the incoming audio data of the call from a file descriptor which you can do with infinite loop as long as the call goes on and do whatever you like with the audio data. I am planning to feed it to our system.

In addition, its pretty easy (available also in AGI) to stream back to the call your own audio - either from file/Asterisk Playback/audio returned from another system/etc.
I used https://github.com/rdegges/asterisk-python package as Asterisk python API and got some inspiration for audio streaming processing code from both of those:
GitHub - ederwander/Asterisk-Google-Speech-Recognition
GitHub - phatjmo/eagi_lex: simple EAGI script that interacts with AWS Lex.
Although I am doing something else with the audio data itself.

I hope that helps
Danny

1 Like

and anyone else that is interested

There are definitely others interested. I have this very informative and useful thread bookmarked. Many thanks to you and @lardconcepts for your contributions.

I have got the same request here: @dannyvoca and @lardconcepts , do you have accomplished something?

@nosenicomomellamo read my comment from Jul 2. I wrote there what I did and it is working for me since then.

Yes, i have read that, but i have got some questions for you :slight_smile: :

Hi, @lardconcepts and anyone else that is interested, I got what I need with EAGI using python script.

You mean your custom one, right?

I used https://github.com/rdegges/asterisk-python package as Asterisk python API and got some inspiration for audio streaming processing code from both of those:
https://github.com/ederwander/Asterisk-Google-Speech-Recognition
https://github.com/phatjmo/eagi_lex

About these, you used only asterisk-python and then you write a custom EAGI module taking some inspiration from the other two modules, right?

Can you share some other details?

what? I don’t understand your question here. Is it about the EAGI? or about "anyone else? Anyhow, I am using EAGI and I explain it after that sentence.

My question was about if are you using a freely available python script or if you made a custom one.

@nosenicomomellamo
Yes this is right ( About these, you used only asterisk-python and then you write a custom EAGI module taking some inspiration from the other two modules, right?)
As explained, in EAGI you can get the audio data from a file descriptor while the call is going on.
The [https://github.com/ederwander/Asterisk-Google-Speech-Recognition ] project shows how to do that.

custom one off course. I am using it in a similar way to what [https://github.com/ederwander/Asterisk-Google-Speech-Recognition ] is doing but I don’t send the audio directly to speech recognition but to another system which is our core business system

ok, now it is more clear!
I will try and put how I made when done :slight_smile:

Thank youuuu

I want to do something similar but I’m unable to pass the live audio files to another file. If I run the script from home directory it just works perfect but when in agi-bin it doesn’t work