I am using res_speech_vosk.so, but I am getting a (vosk) Got error result -1.
Can anyone tell me what the issue might be? Codec is alaw and using Asterisk version 21
/etc/asterisk/res_speech_vosk.conf:
[general]
url = ws://0.0.0.0:2700
/etc/asterisk/extensions.conf:
exten => 9093,1,Answer()
same = n,Wait(1)
same = n,SpeechCreate
same = n,SpeechBackground(hello)
same = n,Verbose(0,Result was ${SPEECH_TEXT(0)})
CLI LOG:
*CLI> module load res_speech_vosk.so
Loaded res_speech_vosk.so
[2025-02-24 12:25:48] **NOTICE**[5917]: **res_speech_vosk.c**:**297** **load_module**: Load res_speech_vosk module
[2025-02-24 12:25:48] **DEBUG**[5917]: **res_speech_vosk.c**:**284** **vosk_engine_config_load**: general.url=ws://0.0.0.0:2700
> Loaded res_speech_vosk.so => (Vosk Speech Engine)
-- Executing [9093@outgoing:2] Answer("PJSIP/1002-00000007", "") in new stack
> 0x7f2fdc14dae0 -- Strict RTP learning after remote address set to: 10.0.0.3:5104
-- Executing [9093@outgoing:3] Wait("PJSIP/1002-00000007", "1") in new stack
> 0x7f2fdc14dae0 -- Strict RTP switching to RTP target address 10.0.0.3:5104 as source
-- Executing [9093@outgoing:4] SpeechCreate("PJSIP/1002-00000007", "") in new stack
-- Executing [9093@outgoing:5] Wait("PJSIP/1002-00000007", "1") in new stack
-- Executing [9093@outgoing:6] SpeechBackground("PJSIP/1002-00000007", "hello") in new stack
[2025-02-24 12:25:58] NOTICE[11657][C-00000008]: res_speech_vosk.c:193 vosk_recog_write: (vosk) Got error result -1
> 0x7f2fdc14dae0 -- Strict RTP learning complete - Locking on source address 10.0.0.3:5104
== WebSocket connection to '0.0.0.0:2700' closed <-- this happens when call is ended
EDIT: Found out that removing āsame => n,SpeechBackground(hello)ā solved the problem, but the result is:
**--** Executing [9093@outgoing:1] **Answer**("**PJSIP/1002-00000003**", "") in new stack
**>** 0x7fa68844adb0 -- Strict RTP learning after remote address set to: 10.0.0.3:5126
**--** Executing [9093@outgoing:2] **Wait**("**PJSIP/1002-00000003**", "**1**") in new stack
**>** 0x7fa68844adb0 -- Strict RTP switching to RTP target address 10.0.0.3:5126 as source
**--** Executing [9093@outgoing:3] **SpeechCreate**("**PJSIP/1002-00000003**", "") in new stack
**--** Executing [9093@outgoing:5] **Verbose**("**PJSIP/1002-00000003**", "**0,Result was** ") in new stack
If I speak there is no result in the logs. There should be the spoken words in the logs.
In the docker logs I see:
INFO:root:Connection from ('172.17.0.1', 54996)
ERROR:websockets.server:Error in connection handler
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/websockets/server.py", line 191, in handler
await self.ws_handler(self, path)
File "/opt/vosk-server/websocket/./asr_server.py", line 70, in recognize
await websocket.send(response)
File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 555, in send
await self.ensure_open()
File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 803, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedOK: code = 1000 (OK), no reason
-- Executing [9093@outgoing:1] Answer("PJSIP/1002-00000001", "") in new stack
> 0x7ff0944c8be0 -- Strict RTP learning after remote address set to: 10.0.0.3:5318
-- Executing [9093@outgoing:2] Wait("PJSIP/1002-00000001", "1") in new stack
> 0x7ff0944c8be0 -- Strict RTP switching to RTP target address 10.0.0.3:5318 as source
-- Executing [9093@outgoing:3] SpeechCreate("PJSIP/1002-00000001", "") in new stack
-- Executing [9093@outgoing:5] SpeechBackground("PJSIP/1002-00000001", "") in new stack
[2025-02-24 15:45:27] NOTICE[23557][C-00000002]: res_speech_vosk.c:192 vosk_recog_write: (vosk) Got error result -1
== WebSocket connection to '127.0.0.1:2700' closed
It should be still 8000 and mfcc.conf should have --allow-upsample=true --allow-downsample=true. You can simply use our last model, it has required data.
I have only 2048 MB memory on my server, so I only can use the small model. With docker running this is my left memory:
~# free -h
total used free shared buff/cache available
Mem: 1,9Gi 449Mi 257Mi 4,0Mi 1,2Gi 1,5Gi
Swap: 975Mi 123Mi 852Mi
Could you tell me why the recognition stops after a few seconds? Is there any way to solve this?
My current workaround is (not best practise):
[outgoing]
exten = 9000,1,Answer()
same => n,Set(DENOISE(rx)=on)
same => n,Wait(1)
; Start speech processing
same => n(start),SpeechCreate()
same => n,SpeechStart()
same => n,Set(TIMEOUT(digit)=0)
same => n,SpeechBackground()
same => n,Verbose(1,Recognized text: ${SPEECH_TEXT(0)})
; Save the recognized text & convert to lowercase
same => n,Set(SPEECH_RESULT=${SPEECH_TEXT(0)})
same => n,Set(SPEECH_RESULT=${TOLOWER(${SPEECH_RESULT})})
same => n,SpeechDestroy()
; Check if the result matches a greeting
same => n,Set(MATCH_RESULT=0)
same => n,GotoIf($["${SPEECH_RESULT}" =~ "(hello|welcome|can you hear me|phone number|good morning|good day|good evening|my name is|you are speaking with|help|hear|beautiful|what is this about|some other prompts)"]?success)
; If MATCH_RESULT is 1 (Successful regex match), go to success
same => n,ExecIf($["${MATCH_RESULT}" = "1"]?Goto(success))
; If no match - continue with regular processing
same => n,Goto(start)
; Success case ā forward when a keyword is recognized
same => n(success),Playback(/etc/asterisk/sounds/greeting)
same => n,Dial(PJSIP/1000)
Can you tell me what and why the small model wouldnāt work for telephony? Because in my case, it does work. When you say that I can delete the folders for a large model, are you referring to disk space or RAM usage during the active execution of Vosk?
I used this Dockerfile:
FROM alphacep/kaldi-vosk-server:latest
ENV MODEL_VERSION vosk-model-small-de-0.15
RUN mkdir /opt/vosk-model-de \
&& cd /opt/vosk-model-de \
&& wget -q https://alphacephei.com/vosk/models/${MODEL_VERSION}.zip \
&& unzip ${MODEL_VERSION}.zip \
&& mv ${MODEL_VERSION} model \
&& rm ${MODEL_VERSION}.zip
EXPOSE 2700
WORKDIR /opt/vosk-server/websocket
CMD [ "python3", "./asr_server.py", "/opt/vosk-model-de/model" ]
This happens after ca 2 minutes. How to solve this?
EDIT:
Damn, I spent a whole day banging my head against the wall. This change in /usr/src/asterisk-22.1.1/res/res_speech_vosk.c solved my problem:
After this change I did make and make install in the directory /usr/src/asterisk-22.1.1
If someone has the same issue, make sure to make these changes in your asterisk build directory ā /usr/src/asterisk-<version>
I doubt this change is correct either. It feels like the system is too slow to process the incoming audio so the buffer is full. Probably you need faster CPU. Overall, ASR is very resource-intensive thing.