Looking for Technical support regarding Asterisk+vosk customization, need Anytime STT

helpinghandindia · May 16, 2025, 12:53pm

Hello, We are Helping Hand India NGO, working In India for poor student’s education.
We are using Asterisk 20 with vosk STT multilingual transcription English and Hindi, everything is working good, We need real time (always-on) STT within our Asterisk environment. An Expert already helped us in past to making it Multilingual. I am sharing here existing content to understand.

same => n,SpeechCreate(vosk^en)
same => n,SpeechCreate(vosk^hi)
same => n,SpeechCreate(vosk^enin)
;same => n,SpeechBackground(silence1,4,p,en^hi^enin)
same => n,SpeechBackground(quizivr/2025/${LSelect}Main-Menu-April25E,0,p,en^hi^enin)
same => n,Verbose(0,Result was ${SPEECH_TEXT(0^en)})
same => n,Verbose(0,Result was ${SPEECH_TEXT(0^hi)})
same => n,Verbose(0,Result was ${SPEECH_TEXT(0^enin)})
same => n,Set(EnglishVoice=${SPEECH_TEXT(0^en)})
same => n,Set(HindiVoice=${SPEECH_TEXT(0^hi)})
same => n,Set(EnglishVoice2=${SPEECH_TEXT(0^enin)})
same => n,SpeechDestroy(vosk^en)
same => n,SpeechDestroy(vosk^hi)
same => n,SpeechDestroy(vosk^enin)

cat /etc/asterisk/res_speech_vosk.conf
[general]
[en]
type=horse
url = ws://localhost:2702
[hi]
type=horse
url = ws://localhost:2700
[enin]
type=horse
url = ws://localhost:2701

docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3eb6ceef0959 alphacep/kaldi-en-in:latest “python3 ./asr_serve…” 3 hours ago Up 3 hours 0.0.0.0:2701->2700/tcp, :::2701->2700/tcp sleepy_spence
860d807f84a8 alphacep/kaldi-en:latest “python3 ./asr_serve…” 3 hours ago Up 3 hours 0.0.0.0:2702->2700/tcp, :::2702->2700/tcp brave_bassi
b08767cb1e33 alphacep/kaldi-hi:latest “python3 ./asr_serve…” 3 hours ago Up 3 hours 0.0.0.0:2700->2700/tcp, :::2700->2700/tcp funny_austin

You can reply/quote with your charges on our email.
office@helpinghandindiango.org

jcolp · May 16, 2025, 1:03pm

I think you’re going to need to be more specific on what exactly this means and what you’re looking for.

helpinghandindia · May 16, 2025, 1:19pm

OK, In current situation user able to speak once in starting, We need anytime user speaks during their session, one to eighty, and they will go directly to desired module. We have eighty modules.

penguinpbx · May 16, 2025, 3:49pm

Excellent!

To clarify, STT is Speech To Text. This is still a thing. So is Automatic Speech Recognition (ASR). And Text To Speech (TTS)!

However, there’s been lots of software developer growth in this area recently – see the forums, natch! – and the current nomenclature gravitates towards the more generic phrases like Artificial Intelligence (AI) and Large Language Models (LLM). Not to turn this reply into a Public Service Announcement, but it is definitely a note-to-self: it would be nice to revive STT/ASR/TTS in the VoiP space that Asterisk lives in, if only to help regain some focus on the problems that folks are trying to solve.

david551 · May 16, 2025, 4:12pm

I don’t think this is close to the current crop of speech recognition requirements. Whilst I suppose it could be done with a continuous speech recognition system and the text for that postprocessed, this seems to be limited vocabulary, isolated speech. The unusual feature is that it isn’t listening for an immediate response.

At least one consequence of using a current generation recognizer is that the recognition is likely to be delayed, as they will require significant look ahead, to reliably decode the speech, whereas, if one knows it is one of eighty numbers, one already has a lot of context, and the technology can be, maybe, 20 years old.

I also think a lot more problem analysis needs to be provided here, as I suspect, at the very least, the media channel is also being used, at least, DTMF input.

It would probably be better if there were a DTMF attention signal, that caused an attempt to read speech, rather than using hte speech, itself, as the trigger.

helpinghandindia · May 17, 2025, 5:27am

Appreciate your guidance, I have tried already but helpful content not found. I think experts can give quality/reliable solution.

helpinghandindia · May 20, 2025, 4:05pm

Please suggest a good way for continue/anytime voice activity detection in above scenario , BackgroundDetect() or TALK_DETECT function is seems not useful. any good reference to implement Alexa type solution or helpful links related to above requirements.

david551 · May 20, 2025, 4:26pm

Alexa uses an attention word, and, I believe detects that in the device or speaker. Only when it detects that trigger does it go to the cloud to decipher the full request. The analogy for phones would be that the phone detected the trigger word.

(Asterisk can detect a trigger DTMF, although some phone systems need a stronger attention signal, in the form of a hook flash, or the earthing of one of wires, in analogue systems.)

Chetang.Jha · May 24, 2025, 12:48pm

Try using audiosockets

Regards
CJ

Topic		Replies	Views
Multiple languages with vosk Asterisk Integration	1	121	December 24, 2024
Vosk speech enhancement Jobs	6	548	January 26, 2025
Module for Vosk speech recognition in Asterisk available Asterisk Integration	2	4136	July 16, 2020
Speech recognition API Limitations Asterisk APIs	8	645	September 9, 2023
TTS in native languages Asterisk Integration	2	233	January 23, 2024

Looking for Technical support regarding Asterisk+vosk customization, need Anytime STT

Related topics