Problem with SPEECH_ENGINE usage

Hi everyone,

I work on a new speech recognition service.
I try to use AEAP to do that.

I need a strong identification, for all websockets.
To do that, I want to inject in SETUP request a param ID.

my dialplan :

[test_aeap]
exten => _X.,1,NoOp()
 same => n,Answer()
 same => n,SpeechCreate(test-speech-to-text)
 same => n,Set(SPEECH_ENGINE(callid)=16999999)
 same => n,Verbose(ENGINE CALL ID : ${SPEECH_ENGINE(callid)})
 same => n,SpeechStart()
 same => n,SpeechBackground()
 same => n,SpeechDestroy()
 same => n,Hangup()

my aeap.conf :

[test-speech-to-text]
type=client
codecs=!all,alaw
url=ws://localhost:9099
protocol=speech_to_text
@language=fr_FR
@callid={callid}

But
same => n,Verbose(ENGINE CALL ID : ${SPEECH_ENGINE(callid)})
return always None.

In my websocket message, i receive correctly default values.
I have read this post, but I don’t see what I doing wrong…

So is the problem just that you’re trying to access the value in the dialplan and can’t? If so, I don’t believe those values are readable. You’d need to set a normal dialplan variable if you want to have access.

If that’s not the problem then you’ll need to clarify further.

Sorry, I will try to be more specific.

I want to send a custom value, from diaplan to Websocket server, to identify correctly my call.
I have tried to use SPEECH_ENGINE to set a custom variable, but my websocket server don’t received it.

I have tried to print it after set to check the initialization variable, but print is already None.

I have tried to set it in dialplan variable before print, but no more result :

[test_aeap]
exten => _X.,1,Verbose(TEST AEAP)
 same => n,Answer()
 same => n,SpeechCreate(test-speech-to-text)
 same => n,Set(SPEECH_ENGINE(callid)=16999999)
 same => n,Set(dialplan_call_id=${SPEECH_ENGINE(callid)})
 same => n,Verbose(ENGINE CALL ID : ${dialplan_call_id})
 same => n,SpeechStart()
 same => n,SpeechBackground()
 same => n,SpeechDestroy()
 same => n,Hangup()

I don’t believe it works like you want. Using SPEECH_ENGINE I think will result in a “set” request going to over the websocket. Have you captured all JSON requests to see?

Oh, my bad.

I misunderstood the posts I read about this, I thought SPEECH ENGINE allowed to update the initial configuration sent.

By doing the speech engine set after EngineStart, I receive my request, which I will be able to map with my websocket Id.

Many thanks to you.

I allow myself an additional question.

I have not yet decided whether I should use AEAP or externalMedia.

I’m initially in an ARI stasis, externalMedia would be more appropriate, but the identification of the call would be more complex (no control on the RTP frames to add a header carrying the ID from ARI?), and I risk having more strong interactions with Asterisk in the long term

It depends on the goal, and the identification of the RTP stream isn’t that difficult. You either use individual receiving ports and associate based on that, or you could examine the source IP address+port and map based on that. Further identifier usage would be in the ARI application.

We use docker, and I was afraid that opening RTP ports on our recognition server would conflict with the asterisk container in the same stack.
After discussing with my team, it seems that a host mode on the container would make this possible, and an ip+port mapping as you suggest would be possible.

I’ll explain our project more precisely.
We have a call, managed with ARI. The goal is, when the call is answered, to start voice recognition, to do automatic translation, then present the translated audio to our agent.

I have a little trouble seeing the fundamental differences between AEAP and ExternalMedia in terms of advantages. I am in the R&D phase on both solutions, and
not having the vocation to process the recognition result in Asterisk, ExternalMedia seems more appropriate to me, but I would like your opinion on this, and more explications on differences between this two.

AEAP is strictly for speech to text. External media is arbitrary and bidirectional. External media gives far greater control and if ARI is involved, generally the best course of action.

thanks for the explanation.

A last question :
“External media is arbitrary and bidirectional”

For return audio to my channel, when I receive connexion on my RTP Server, I have to catch source port and IP used by channel, and return on it ?

That is a way to do it yet, or the creation of the external media channel will provide the information on its response.

A very great thank’s for this very interesting exchange :slight_smile: