Problem with SPEECH_ENGINE usage

aurelien_mt · January 6, 2025, 2:11pm

Hi everyone,

I work on a new speech recognition service.
I try to use AEAP to do that.

I need a strong identification, for all websockets.
To do that, I want to inject in SETUP request a param ID.

my dialplan :

[test_aeap]
exten => _X.,1,NoOp()
 same => n,Answer()
 same => n,SpeechCreate(test-speech-to-text)
 same => n,Set(SPEECH_ENGINE(callid)=16999999)
 same => n,Verbose(ENGINE CALL ID : ${SPEECH_ENGINE(callid)})
 same => n,SpeechStart()
 same => n,SpeechBackground()
 same => n,SpeechDestroy()
 same => n,Hangup()

my aeap.conf :

[test-speech-to-text]
type=client
codecs=!all,alaw
url=ws://localhost:9099
protocol=speech_to_text
@language=fr_FR
@callid={callid}

But
same => n,Verbose(ENGINE CALL ID : ${SPEECH_ENGINE(callid)})
return always None.

In my websocket message, i receive correctly default values.
I have read this post, but I don’t see what I doing wrong…

jcolp · January 6, 2025, 2:16pm

So is the problem just that you’re trying to access the value in the dialplan and can’t? If so, I don’t believe those values are readable. You’d need to set a normal dialplan variable if you want to have access.

If that’s not the problem then you’ll need to clarify further.

aurelien_mt · January 6, 2025, 2:25pm

Sorry, I will try to be more specific.

I want to send a custom value, from diaplan to Websocket server, to identify correctly my call.
I have tried to use SPEECH_ENGINE to set a custom variable, but my websocket server don’t received it.

I have tried to print it after set to check the initialization variable, but print is already None.

I have tried to set it in dialplan variable before print, but no more result :

[test_aeap]
exten => _X.,1,Verbose(TEST AEAP)
 same => n,Answer()
 same => n,SpeechCreate(test-speech-to-text)
 same => n,Set(SPEECH_ENGINE(callid)=16999999)
 same => n,Set(dialplan_call_id=${SPEECH_ENGINE(callid)})
 same => n,Verbose(ENGINE CALL ID : ${dialplan_call_id})
 same => n,SpeechStart()
 same => n,SpeechBackground()
 same => n,SpeechDestroy()
 same => n,Hangup()

jcolp · January 6, 2025, 2:29pm

I don’t believe it works like you want. Using SPEECH_ENGINE I think will result in a “set” request going to over the websocket. Have you captured all JSON requests to see?

aurelien_mt · January 6, 2025, 2:46pm

Oh, my bad.

I misunderstood the posts I read about this, I thought SPEECH ENGINE allowed to update the initial configuration sent.

By doing the speech engine set after EngineStart, I receive my request, which I will be able to map with my websocket Id.

Many thanks to you.

I allow myself an additional question.

I have not yet decided whether I should use AEAP or externalMedia.

I’m initially in an ARI stasis, externalMedia would be more appropriate, but the identification of the call would be more complex (no control on the RTP frames to add a header carrying the ID from ARI?), and I risk having more strong interactions with Asterisk in the long term

jcolp · January 6, 2025, 2:48pm

It depends on the goal, and the identification of the RTP stream isn’t that difficult. You either use individual receiving ports and associate based on that, or you could examine the source IP address+port and map based on that. Further identifier usage would be in the ARI application.

aurelien_mt · January 6, 2025, 3:24pm

We use docker, and I was afraid that opening RTP ports on our recognition server would conflict with the asterisk container in the same stack.
After discussing with my team, it seems that a host mode on the container would make this possible, and an ip+port mapping as you suggest would be possible.

I’ll explain our project more precisely.
We have a call, managed with ARI. The goal is, when the call is answered, to start voice recognition, to do automatic translation, then present the translated audio to our agent.

I have a little trouble seeing the fundamental differences between AEAP and ExternalMedia in terms of advantages. I am in the R&D phase on both solutions, and
not having the vocation to process the recognition result in Asterisk, ExternalMedia seems more appropriate to me, but I would like your opinion on this, and more explications on differences between this two.

jcolp · January 6, 2025, 3:26pm

AEAP is strictly for speech to text. External media is arbitrary and bidirectional. External media gives far greater control and if ARI is involved, generally the best course of action.

aurelien_mt · January 6, 2025, 3:50pm

thanks for the explanation.

A last question :
“External media is arbitrary and bidirectional”

For return audio to my channel, when I receive connexion on my RTP Server, I have to catch source port and IP used by channel, and return on it ?

jcolp · January 6, 2025, 3:51pm

That is a way to do it yet, or the creation of the external media channel will provide the information on its response.

aurelien_mt · January 6, 2025, 3:52pm

A very great thank’s for this very interesting exchange

system · February 5, 2025, 3:53pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to send custom “params” of Asterisk External Application Protocol: Speech to Text Engine (aeap) Asterisk APIs	7	754	November 19, 2022
Asterisk AEAP dialplan variables Asterisk APIs	1	369	February 18, 2023
AEAP Speech to Text / ulaw Codec Asterisk APIs	3	482	October 26, 2022
API Voice doubts Asterisk APIs	4	534	June 19, 2020
Asterisk Speech-to-text Asterisk APIs	2	176	September 18, 2024

Problem with SPEECH_ENGINE usage

Related topics