Hello,
We have developed an ARI application that uses Google’s Speech Recognition API (we need Hebrew SR, so no other option). The app records the user’s speech and sends it to Google for recognition. Can you give us some tips on how to implement barge-in in this configuration?
The TALK_DETECT dialplan function[1] can be used to detect when talk is detected. This will raise an ARI event, and it can be set on a channel in ARI using the normal channel variable route.
A little bit tricky - main problem is that if I start recording on ChannelTalkingStarted event I loose the start of the user’s sentence. I can lower the loss if I increase the sensitivity of the detection but then I receive more false detections. Is there a way to buffer the last second or so?
BTW: I thing the documentation on the wiki page above has a mistake on the descriptions of the parameters. If I understand correctly the first is the time of silence to be identified as end of talk, and the second is the energy to be considered as talking.
I am trying to record the user on call start, and recognize on ChannelTalkingStarted event minus a second. Strange thing is that when I start to record the user, playing prompts stops working. What’s going on here?
You can’t do two things at once to a channel. Record in ARI is just that it, it records the channel as if you were calling Record() in the dialplan. It is not a MixMonitor equivalent. The foundation is there to implement such a thing though using a Snoop channel and Record on the Snoop channel.
I meant that I can record the user using a snoop channel, or adding him to a bridge and record the bridge. I am wondering what is better, regarding resource usage.
Will it be possible for you to share source code of your application? I am curious to know how it is done. You can e-mail it to me on gayake.sambhaji@gmail.com
Basically, what we do is creating a snoop channel, which immediately starts recording the user, and we save the time the recording started. Then, when we get the ChannelTalkingStarted on the snoop channel, we save this time, too. When finally the ChannelTalkingFinished arrives, we stop recording, and copy the recorded file from one secong before the ChannelTalkingStarted event (since we record in ulaw, no problem doing so). Then we send this file to Google speech recognition.
I think the secret is to jump into the Asterisk internals and see if you can build access to the dialplan speech applications through ARI. That would be ideal