We have a Python-based audio server connected to our Asterisk IVR system using AudioSocket. The call flow works as follows:
same => n,Wait(2)
same => n,Read(EXTID,${VOICE_DIR}/pls-enter-phone-ext,6,3,300)
same => n,Wait(1)
same => n,Playback(${VOICE_DIR}/you-have-entered)
same => n,SayDigits(${EXTID})
same => n,Wait(1)
same => n,Set(CIVR_HOST=ipconfig)
same => n,Log(NOTICE, Starting AudioSocket connection to ${CIVR_HOST}:3000)
same => n,Set(CALL_UUID=${UUID()})
same => n,Log(NOTICE, Notifying Python app - CallerId: ${CALLERID(num)}, UUID: ${CALL_UUID}, DNID: ${CALLERID(DNID)}, EXTID: ${EXTID})
same => n,System(curl -s "http://${CIVR_HOST}:1650/api/call-start?callerId=${CALLERID(num)}&uuid=${CALL_UUID}&dnid=${CALLERID(DNID)}&lext=${EXTID}" >/dev/null 2>&1 || wget -q -O - "http://${CIVR_HOST}:1650/api/call-start?callerId=${CALLERID(num)}&uuid=${CALL_UUID}&dnid=${CALLERID(DNID)}&lext=${EXTID}" >/dev/null 2>&1)
same => n,AudioSocket(${CALL_UUID},${CIVR_HOST}:3000)
same => n,Log(NOTICE, AudioSocket connection ended)
Once the AudioSocket connection is established:
-
User speech is streamed to the Python server.
-
The server performs Speech-to-Text (STT).
-
The transcribed text is sent to an agentic system for generating a response.
-
The response text is sent to an external TTS service.
-
TTS audio chunks are streamed back through AudioSocket and played to the caller via Asterisk.
-
After playback finishes, the system starts listening for user input again.
This loop continues throughout the call.
Current Problem
The current implementation behaves like a walkie-talkie or half-duplex system.
While the bot is speaking (during TTS playback), microphone/input processing is disabled to avoid:
-
Background noise
-
Echo from bot audio
-
TTS audio being reprocessed as user speech
Because of this, the system cannot hear the caller while the bot is speaking.
If a user interrupts mid-response — for example to clarify something or barge in — their speech is ignored completely.
Attempted Solution
To support full-duplex conversations and barge-in, I tried:
-
WebRTC-based echo cancellation (AEC)
-
Voice Activity Detection (VAD)
-
Continuous audio processing during TTS playback
All these changes were implemented on the Python server side.
Issues With Current Full-Duplex Attempt
Even after significant tuning, we are facing two major issues:
-
Missed Barge-Ins
- The system often fails to detect when the user is speaking over the bot audio.
-
False Barge-Ins
- The system frequently triggers a barge-in immediately when TTS playback starts, even when the user is completely silent.
This makes the experience unstable and unreliable.
What We Want To Understand
We are exploring whether there is a better approach from the Asterisk side itself.
Specifically:
-
Can full-duplex/barge-in handling be implemented more effectively using Asterisk dialplan features?
-
Does Asterisk provide any native support for interruptible playback or duplex audio handling with AudioSocket?
-
Are there recommended architectural patterns for implementing reliable barge-in with Asterisk + AudioSocket + external STT/TTS systems?
-
Would moving some logic from Python into Asterisk help improve detection stability?