I would like to develop a phone application where the user gets called from the system and engage a conversation with it. The system talks using pre-recorded audio files, the user input should instead be continuously monitored and analyzed by a speech recognition engine. I have some experience with sphinx, and I have seen that someone got it to work with asterisk. Still, I haven’t found the possibility of handling all incoming audio as a stream in order to recognize continuous speech. Is this possible? I’d like to launch an instance of a custom program each time a call is started, and send the audio data to it, through sockets or pipes.
Looking forward to your comments about the question, and about the possible limits of the system.
Wow it sound like Artificial Intelligence, I don’t know if this might help zaf.github.io/asterisk-speech-recog/ but its really an interesting project and i was thinking something similar.
Thanks for your answer, there wouldnt be much natural language processing but mostly a different grammar for each of the states of the system, that is each time the user is prompted for an interaction. The first thing I have noticed in your link is “Records from the current channel untill 3 seconds of silence are detected”. That’s no good.
I have found this that looks very promising
The question could be more general though: say that I know how to do speech recognition given a stream of audio data, is it possible in asterisk to have some custom code that processes the audio data of the phone call in real-time? I basically just need that, then it’s matter of adapting sphinx or pocket sphinx.
You might want to look at this zaf.github.io/asterisk-speech-recog/ we have used it, on single words and very short sentances its ok , starts to trip up on longer ones.
There is a reason for this and thats todo with teh way Google do speech recognition. They use a predictive model. This can lead to odd results. googleresearch.blogspot.co.uk/20 … ng-in.html
This might help a bit for centos but th theory is the same
cyber-cottage.co.uk/en/2013/02/i … entos-6-3/
UniMRCP is exactly the the thing you need, it handles continuous stream and processes the data as soon as it arives. UniMRCP has pocketsphinx plugin which is easy to setup and which provides the decoding you need. It’s not free from issues, but once you integrate an important patch from
It will provide you a good accuracy.
If you have any troubles with UniMRCP or Pocketsphinx, feel free to ask.