I have a file that detects the ChannelTalkingStart event. When this event is triggered, I send a DTMF signal. However, I want to ensure that the DTMF is sent only when a real conversation is detected, not when there is noise, knocks, or other unintended sounds.
I already have code that can distinguish between noise and conversations, but my main challenge is accessing the audio in real time. Without real-time audio processing, I cannot determine whether the detected sound is actual speech before sending the DTMF.
I’m looking for a solution to reliably process real-time audio and filter out unwanted noise before triggering the DTMF signal.
There are various options for getting to audio, but it can not happen over AMI. The most complex one is probably to use a SIP client with ChanSpy to feed audio outside. ARI with External Media is another option using a Snoop channel, and so is AudioSocket with ChanSpy. Of course you could also modify the C code and extend func_talkdetect.
What does real time mean? RTP will normally have a serialisation delay of, at least,. 20ms. Avoiding false positives will require considerably longer delays, may be as much as a second.