Vosk result error

Hello everyone,

I am using res_speech_vosk.so, but I am getting a (vosk) Got error result -1.
Can anyone tell me what the issue might be? Codec is alaw and using Asterisk version 21

/etc/asterisk/res_speech_vosk.conf:

[general]
url = ws://0.0.0.0:2700

/etc/asterisk/extensions.conf:

exten => 9093,1,Answer()
same = n,Wait(1)
same = n,SpeechCreate
same = n,SpeechBackground(hello)
same = n,Verbose(0,Result was ${SPEECH_TEXT(0)})

CLI LOG:

*CLI> module load res_speech_vosk.so
Loaded res_speech_vosk.so
[2025-02-24 12:25:48] **NOTICE**[5917]: **res_speech_vosk.c**:**297** **load_module**: Load res_speech_vosk module
[2025-02-24 12:25:48] **DEBUG**[5917]: **res_speech_vosk.c**:**284** **vosk_engine_config_load**: general.url=ws://0.0.0.0:2700
       > Loaded res_speech_vosk.so => (Vosk Speech Engine)
    -- Executing [9093@outgoing:2] Answer("PJSIP/1002-00000007", "") in new stack
       > 0x7f2fdc14dae0 -- Strict RTP learning after remote address set to: 10.0.0.3:5104
    -- Executing [9093@outgoing:3] Wait("PJSIP/1002-00000007", "1") in new stack
       > 0x7f2fdc14dae0 -- Strict RTP switching to RTP target address 10.0.0.3:5104 as source
    -- Executing [9093@outgoing:4] SpeechCreate("PJSIP/1002-00000007", "") in new stack
    -- Executing [9093@outgoing:5] Wait("PJSIP/1002-00000007", "1") in new stack
    -- Executing [9093@outgoing:6] SpeechBackground("PJSIP/1002-00000007", "hello") in new stack
[2025-02-24 12:25:58] NOTICE[11657][C-00000008]: res_speech_vosk.c:193 vosk_recog_write: (vosk) Got error result -1
       > 0x7f2fdc14dae0 -- Strict RTP learning complete - Locking on source address 10.0.0.3:5104
  == WebSocket connection to '0.0.0.0:2700' closed <-- this happens when call is ended

$/usr/src/asterisk-22.1.1# lsof -i :2700
COMMAND     PID USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
docker-pr 18743 root    4u  IPv4 484691004      0t0  TCP *:2700 (LISTEN)
docker-pr 18748 root    4u  IPv6 484690381      0t0  TCP *:2700 (LISTEN)

Thank you in advance for your help!

EDIT: Found out that removing ā€œsame => n,SpeechBackground(hello)ā€ solved the problem, but the result is:

**--** Executing [9093@outgoing:1] **Answer**("**PJSIP/1002-00000003**", "") in new stack
**>** 0x7fa68844adb0 -- Strict RTP learning after remote address set to: 10.0.0.3:5126
**--** Executing [9093@outgoing:2] **Wait**("**PJSIP/1002-00000003**", "**1**") in new stack
**>** 0x7fa68844adb0 -- Strict RTP switching to RTP target address 10.0.0.3:5126 as source
**--** Executing [9093@outgoing:3] **SpeechCreate**("**PJSIP/1002-00000003**", "") in new stack
**--** Executing [9093@outgoing:5] **Verbose**("**PJSIP/1002-00000003**", "**0,Result was** ") in new stack

Websocket URL should be the url of the service, not 0.0.0.0. Usually its 127.0.0.1. 0.0.0.0 is only for listening, not for target. See also

1 Like

Thanks, itā€™s changed but whatā€™s about itā€™s main problem?

**--** Executing [9093@outgoing:3] **SpeechCreate**("**PJSIP/1002-00000003**", "") in new stack

What problem do you see exactly? It is just a log message.

If I speak there is no result in the logs. There should be the spoken words in the logs.

In the docker logs I see:

INFO:root:Connection from ('172.17.0.1', 54996)
ERROR:websockets.server:Error in connection handler
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/websockets/server.py", line 191, in handler
    await self.ws_handler(self, path)
  File "/opt/vosk-server/websocket/./asr_server.py", line 70, in recognize
    await websocket.send(response)
  File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 555, in send
    await self.ensure_open()
  File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 803, in ensure_open
    raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedOK: code = 1000 (OK), no reason

SpeechBackground is required, if you removed it and did not put it back that would not work.

Hi Jacob,

got it, thanks. With SpeechBackground I get the following result:

*CLI> module load res_speech_vosk.so
Loaded res_speech_vosk.so
[2025-02-24 12:25:48] **NOTICE**[5917]: **res_speech_vosk.c**:**297** **load_module**: Load res_speech_vosk module
[2025-02-24 12:25:48] **DEBUG**[5917]: **res_speech_vosk.c**:**284** **vosk_engine_config_load**: general.url=ws://0.0.0.0:2700
       > Loaded res_speech_vosk.so => (Vosk Speech Engine)
    -- Executing [9093@outgoing:2] Answer("PJSIP/1002-00000007", "") in new stack
       > 0x7f2fdc14dae0 -- Strict RTP learning after remote address set to: 10.0.0.3:5104
    -- Executing [9093@outgoing:3] Wait("PJSIP/1002-00000007", "1") in new stack
       > 0x7f2fdc14dae0 -- Strict RTP switching to RTP target address 10.0.0.3:5104 as source
    -- Executing [9093@outgoing:4] SpeechCreate("PJSIP/1002-00000007", "") in new stack
    -- Executing [9093@outgoing:5] Wait("PJSIP/1002-00000007", "1") in new stack
    -- Executing [9093@outgoing:6] SpeechBackground("PJSIP/1002-00000007", "hello") in new stack
[2025-02-24 12:25:58] NOTICE[11657][C-00000008]: res_speech_vosk.c:193 vosk_recog_write: (vosk) Got error result -1
       > 0x7f2fdc14dae0 -- Strict RTP learning complete - Locking on source address 10.0.0.3:5104
  == WebSocket connection to '0.0.0.0:2700' closed <-- this happens when call is ended

You still have 0.0.0.0 in the log, address should be 127.0.0.1

Sorry I copied and pasted from the first post!

Current:

    -- Executing [9093@outgoing:1] Answer("PJSIP/1002-00000001", "") in new stack
       > 0x7ff0944c8be0 -- Strict RTP learning after remote address set to: 10.0.0.3:5318
    -- Executing [9093@outgoing:2] Wait("PJSIP/1002-00000001", "1") in new stack
       > 0x7ff0944c8be0 -- Strict RTP switching to RTP target address 10.0.0.3:5318 as source
    -- Executing [9093@outgoing:3] SpeechCreate("PJSIP/1002-00000001", "") in new stack
    -- Executing [9093@outgoing:5] SpeechBackground("PJSIP/1002-00000001", "") in new stack
[2025-02-24 15:45:27] NOTICE[23557][C-00000002]: res_speech_vosk.c:192 vosk_recog_write: (vosk) Got error result -1
  == WebSocket connection to '127.0.0.1:2700' closed

Maybe the problem is here?

root@82dba04a2b51:/opt/vosk-model-de/model/conf# cat mfcc.conf
--use-energy=false
--sample-frequency=16000
--num-mel-bins=30
--num-ceps=30
--low-freq=100
--high-freq=7600

Asterisk SpeechBackground uses 8000Hz and not 16000Hz

Damn I found the soulution!

This line was in /opt/vosk-server/asr_server.py was:

args.sample_rate = float(os.environ.get('VOSK_SAMPLE_RATE', 8000))

and should be

    args.sample_rate = float(os.environ.get('VOSK_SAMPLE_RATE', 16000))

It should be still 8000 and mfcc.conf should have --allow-upsample=true --allow-downsample=true. You can simply use our last model, it has required data.

2 Likes

Thanks that also worked. Iā€™ve changed the config back to

args.sample_rate = float(os.environ.get('VOSK_SAMPLE_RATE', 8000))

I have only 2048 MB memory on my server, so I only can use the small model. With docker running this is my left memory:

~# free -h
              total        used        free      shared  buff/cache   available
Mem:          1,9Gi       449Mi       257Mi       4,0Mi       1,2Gi       1,5Gi
Swap:         975Mi       123Mi       852Mi

Could you tell me why the recognition stops after a few seconds? Is there any way to solve this?

My current workaround is (not best practise):

[outgoing]
exten = 9000,1,Answer()
same => n,Set(DENOISE(rx)=on)
same => n,Wait(1)

; Start speech processing
same => n(start),SpeechCreate()
same => n,SpeechStart()
same => n,Set(TIMEOUT(digit)=0)
same => n,SpeechBackground()
same => n,Verbose(1,Recognized text: ${SPEECH_TEXT(0)})

; Save the recognized text & convert to lowercase
same => n,Set(SPEECH_RESULT=${SPEECH_TEXT(0)})
same => n,Set(SPEECH_RESULT=${TOLOWER(${SPEECH_RESULT})})
same => n,SpeechDestroy()

; Check if the result matches a greeting
same => n,Set(MATCH_RESULT=0)

same => n,GotoIf($["${SPEECH_RESULT}" =~ "(hello|welcome|can you hear me|phone number|good morning|good day|good evening|my name is|you are speaking with|help|hear|beautiful|what is this about|some other prompts)"]?success)
; If MATCH_RESULT is 1 (Successful regex match), go to success
same => n,ExecIf($["${MATCH_RESULT}" = "1"]?Goto(success))

; If no match - continue with regular processing
same => n,Goto(start)

; Success case ā€“ forward when a keyword is recognized
same => n(success),Playback(/etc/asterisk/sounds/greeting)
same => n,Dial(PJSIP/1000)

Small model is not going to work for telephony. You can use big model but delete ā€˜rnnlmā€™ and ā€˜rescoreā€™ folders from it. Then it will fit 1Gb.

Can you tell me what and why the small model wouldnā€™t work for telephony? Because in my case, it does work. When you say that I can delete the folders for a large model, are you referring to disk space or RAM usage during the active execution of Vosk?

I used this Dockerfile:

FROM alphacep/kaldi-vosk-server:latest

ENV MODEL_VERSION vosk-model-small-de-0.15
RUN mkdir /opt/vosk-model-de \
   && cd /opt/vosk-model-de \
   && wget -q https://alphacephei.com/vosk/models/${MODEL_VERSION}.zip \
   && unzip ${MODEL_VERSION}.zip \
   && mv ${MODEL_VERSION} model \
   && rm ${MODEL_VERSION}.zip

EXPOSE 2700
WORKDIR /opt/vosk-server/websocket
CMD [ "python3", "./asr_server.py", "/opt/vosk-model-de/model" ]

I ran into another issue:

...
       > (vosk) Got result: '{
       >   "partial" : "some words"
       > }'
       > (vosk) Got result: '{
       >   "partial" : "and so on"
       > }'
...
[2025-02-25 18:48:31] **WARNING**[4174][C-00000019]: **channel.c**:**1105** **__ast_queue_frame**: Exceptionally long voice queue length (97 voice / 97 total) queuing to Local/9000@vosk

This happens after ca 2 minutes. How to solve this?

EDIT:

Damn, I spent a whole day banging my head against the wall. This change in /usr/src/asterisk-22.1.1/res/res_speech_vosk.c solved my problem:

After this change I did make and make install in the directory /usr/src/asterisk-22.1.1
If someone has the same issue, make sure to make these changes in your asterisk build directory ā†’ /usr/src/asterisk-<version>

I doubt this change is correct either. It feels like the system is too slow to process the incoming audio so the buffer is full. Probably you need faster CPU. Overall, ASR is very resource-intensive thing.

Since the change (see last post), the issue no longer exists.