Hello everyone. We have just released a module for speech recognition in Asterisk with Vosk server:
Vosk server is an open source speech recognition server which supports several protocols (websocket, grpc). You can install Vosk server with a simple docker and transcribe speech in English, Chinese or Russian like this:
docker run -d -p 2700:2700 alphacep/kaldi-en:latest
Other models like Spanish are also available on request. Other nice things about Vosk:
- Implements very accurate speech recognition with modern neural networks, much more accurate than pocketsphinx or any other public ASR toolkits (those are usually trained for wideband and do not work for telephony).
- Provides streaming API for the best user experience, you can actually process partial results and give users instant answers.
- Allows quick reconfiguration of vocabulary and grammars for the best accuracy.
- Supports speaker identification beside simple speech recognition.
Unlike Unimrcp, Vosk server doesn’t have much to configure and works over simple websocket protocol.
It is also possible to forward audio to AMI/ARI/AGI and process audio from the separate web application, but in a long term you’ll have to recreate all asterisk on Statis by yourself, so we don’t consider it as a relevant way to implement the voice interface.
In a long term, the best way to implement user input with the natural user experience is asynchronous processing of the input. And asynchronous processing requires something event-based and more complicated than current asterisk speech API. So we might implement more complex modules for speech in the future.
The module integrates Vosk with Asterisk Speech API, so the dialplan integration is really easy:
[internal]
exten = 1,1,Answer
same = n,Wait(1)
same = n,SpeechCreate
same = n,SpeechBackground(hello)
same = n,Verbose(0,Result was ${SPEECH_TEXT(0)})
Comments, opinions and test reports are welcome.