Enabling barge in with audiosocket connection between asterisk and python server

shaan754 · May 18, 2026, 9:40am

We have a Python-based audio server connected to our Asterisk IVR system using AudioSocket. The call flow works as follows:

same => n,Wait(2)
same => n,Read(EXTID,${VOICE_DIR}/pls-enter-phone-ext,6,3,300)
same => n,Wait(1)
same => n,Playback(${VOICE_DIR}/you-have-entered)
same => n,SayDigits(${EXTID})
same => n,Wait(1)

same => n,Set(CIVR_HOST=ipconfig)
same => n,Log(NOTICE, Starting AudioSocket connection to ${CIVR_HOST}:3000)

same => n,Set(CALL_UUID=${UUID()})

same => n,Log(NOTICE, Notifying Python app - CallerId: ${CALLERID(num)}, UUID: ${CALL_UUID}, DNID: ${CALLERID(DNID)}, EXTID: ${EXTID})

same => n,System(curl -s "http://${CIVR_HOST}:1650/api/call-start?callerId=${CALLERID(num)}&uuid=${CALL_UUID}&dnid=${CALLERID(DNID)}&lext=${EXTID}" >/dev/null 2>&1 || wget -q -O - "http://${CIVR_HOST}:1650/api/call-start?callerId=${CALLERID(num)}&uuid=${CALL_UUID}&dnid=${CALLERID(DNID)}&lext=${EXTID}" >/dev/null 2>&1)

same => n,AudioSocket(${CALL_UUID},${CIVR_HOST}:3000)

same => n,Log(NOTICE, AudioSocket connection ended)

Once the AudioSocket connection is established:

User speech is streamed to the Python server.
The server performs Speech-to-Text (STT).
The transcribed text is sent to an agentic system for generating a response.
The response text is sent to an external TTS service.
TTS audio chunks are streamed back through AudioSocket and played to the caller via Asterisk.
After playback finishes, the system starts listening for user input again.

This loop continues throughout the call.

Current Problem

The current implementation behaves like a walkie-talkie or half-duplex system.

While the bot is speaking (during TTS playback), microphone/input processing is disabled to avoid:

Background noise
Echo from bot audio
TTS audio being reprocessed as user speech

Because of this, the system cannot hear the caller while the bot is speaking.

If a user interrupts mid-response — for example to clarify something or barge in — their speech is ignored completely.

Attempted Solution

To support full-duplex conversations and barge-in, I tried:

WebRTC-based echo cancellation (AEC)
Voice Activity Detection (VAD)
Continuous audio processing during TTS playback

All these changes were implemented on the Python server side.

Issues With Current Full-Duplex Attempt

Even after significant tuning, we are facing two major issues:

Missed Barge-Ins
- The system often fails to detect when the user is speaking over the bot audio.
False Barge-Ins
- The system frequently triggers a barge-in immediately when TTS playback starts, even when the user is completely silent.

This makes the experience unstable and unreliable.

What We Want To Understand

We are exploring whether there is a better approach from the Asterisk side itself.

Specifically:

Can full-duplex/barge-in handling be implemented more effectively using Asterisk dialplan features?
Does Asterisk provide any native support for interruptible playback or duplex audio handling with AudioSocket?
Are there recommended architectural patterns for implementing reliable barge-in with Asterisk + AudioSocket + external STT/TTS systems?
Would moving some logic from Python into Asterisk help improve detection stability?

david551 · May 18, 2026, 1:23pm

That’s probably because the echo canceller needs to receive some echoes in order to calibrate itself. I suppose you might be able to send a calibration signal as soon as the call is answered, and before barge in detection is enabled, but that might annoy the other party, and they may be in an environment where the acoustics are continually changing.

One problem with this is that the correct place to cancel far end echoes is the far end, and you don;t control the algorithm at that end. Doing echo cancellation at both ends is lilkely to cause conflicts and prevent correct operation.

hkjarral · May 18, 2026, 5:52pm

If you are open to exploring other options, I catered for these exact problem in my project or you can use my VAD and Barge in logic, its open source and python based

ldo · May 18, 2026, 7:53pm

A different approach might be to use ExternalIVR instead of AudioSocket. This allows the callee to press DTMF buttons to interrupt the playback.

This would likely mean breaking up the audio response into small temporary files for insertion in the playback queue. Note however that I have found Asterisk’s handling of the playback queue to be unreliable: luckily, you get notification as each file is played, so you can ensure the queue never has more than one file waiting.

For an example, see the ivr_dtmf_demo script here: Lawrence D’Oliveiro / seaskirt_examples · GitLab

shaan754 · May 19, 2026, 6:35am

Thanks for the suggestion, but we are bound not to use voice agents, we have a different setup, in which we have to process chunks from external TTS service, and send chunks from user to STT service.

shaan754 · May 19, 2026, 7:10am

Thanks for the reply!! ExternalIVR is just using buttons to interrupt, and also as per docs, it does not offers realtime streaming which is required in our case. Our project is basically a Conversational IVR.

shaan754 · May 19, 2026, 7:11am

Yes that’s the main issue, is there any way to prevent asterisk and python server to avoid taking bot’s voice as input for barge in?

ldo · May 19, 2026, 7:54am

I think there may be a way around that. If you look at the ami+agi_audio_player_async example script in that repo, you will see that it tells Asterisk to play an audio file via the AGI STREAM FILE command, and passes it the name of a pipe into which audio is being streamed in real time.

I think the same trick would work with ExternalIVR. Assuming that Asterisk uses the same audio-file-playing code in both places, of course … (whyever would it not do that?) …

shaan754 · May 20, 2026, 5:57am

This will just create ARI and AMI events handler, and when we speak, asterisk will emit events, and the event will not start until I speak, but in our case initially the bot will have to play a greeting message, and initially we also need to interact database for tracking purpose.

Ignoring other requirements, even I tried listening these events, but even when I said no AMI or ARI events were fired from asterisk.

Asterisk Dialplan

same  => n(civrAsteriskMode),Log(NOTICE, CIVR asterisk/FIFO mode UUID=${CALL_UUID} caller=${CALLERID(num)} ext=${EXTID})
 same  => n,Set(TALK_DETECT(set)=1200,384)
 ; MixMonitor writes caller audio (rx only, flag 'r') to the FIFO Python created in call-start.
 ; Python opens the read end after creating the FIFO, feeds it to the STT pump.
 same  => n,MixMonitor(/tmp/civr-rx-${CALL_UUID}.sln,r)
 ; Notify ARI listener that AsyncAGI is about to start so channel↔UUID is mapped.
 same  => n,UserEvent(CIVRStart,UUID: ${CALL_UUID},CallerID: ${CALLERID(num)},DNID: ${CALLERID(DNID)},EXTID: ${EXTID})
 same  => n,Log(NOTICE, CIVR AGI starting UUID=${CALL_UUID})
 ; Enter AsyncAGI — Python drives STREAM FILE calls via AMI; exits via ASYNCAGI BREAK.
 same  => n,AGI(agi:async,civr)
 same  => n,StopMixMonitor()
 same  => n,Set(TALK_DETECT(remove)=)
 same  => n,UserEvent(CIVREnd,UUID: ${CALL_UUID},CallerID: ${CALLERID(num)})
 same  => n,Log(NOTICE, CIVR AGI ended UUID=${CALL_UUID})

shaan754 · May 28, 2026, 6:20am

We have barge-in enabled in our telephony voice bot (Asterisk AudioSocket + Python STT/TTS loop), but we see inconsistent behavior:

Sometimes barge-in does not trigger at all.
The user speaks while the bot is talking, but the bot continues speaking until the prompt finishes.
Sometimes barge-in works, but later the bot’s own audio gets transcribed as user speech.
Bot audio seems to leak into STT after or between turns.

Current approach:

While TTS is playing, incoming audio is analyzed for barge-in, but silence is fed to STT to avoid echo.
A frame is treated as possible user speech only if:
- TTS is actively transmitting,
- cooldown after TTS start has passed,
- inbound RMS crosses an adaptive threshold,
- echo correlation check says it is not bot audio.
Uses a sliding-window qualification instead of strict consecutive frames.
On barge-in:
- cancel TTS,
- flush playback queue,
- clear TTS-active state,
- send preroll speech frames to STT,
- add a short silence tail to reduce residual echo.

Detection logic is roughly:

if tts_active and bot_is_transmitting:
    rms = inbound_rms(frame)

    if rms > adaptive_threshold:
        echo_corr = correlate_with_recent_tx(frame)

        if echo_corr < threshold:
            qualify_frame()

    if enough_qualified_frames():
        trigger_barge_in()

The problem is that this works well in some calls, but behaves poorly in others depending on call acoustics, latency, echo, carrier quality, speakerphone usage, etc.

Looking for practical suggestions from people who have implemented reliable barge-in in real telephony systems:

better echo rejection strategies,
VAD tuning,
handling residual TTS leakage,
timing/cooldown strategies,
or architecture improvements.

Earlier I also asked about implementation of barge-in using asterisk built-in events: Enabling barge in with audiosocket connection between asterisk and python server

nshmyrev · May 28, 2026, 7:21am

You need to run STT in parallel and implement interruption based on content, not just sound level. If user says “ok” you should not stop talking. If he says “connect to operator” you have to stop.

ldo · May 31, 2026, 11:59pm

Looks like DTMF is still the simplest and most reliable way to do this.

Topic		Replies	Views
Barge-in in voice call Asterisk APIs	7	308	December 22, 2025
Need help connecting asterisk to python and GCP Asterisk Integration	14	1823	April 12, 2021
AI Voice Agent + Grandstream HT813 + Landline Asterisk APIs	5	174	October 13, 2025
ARI: Delay when stopping bridge playbacks for barge-in - frames still draining after DELETE calls Asterisk APIs	5	109	September 5, 2025
Asterisk bridge recorder (ARI) Asterisk APIs	0	681	June 16, 2022

Enabling barge in with audiosocket connection between asterisk and python server

Current Problem

Attempted Solution

Issues With Current Full-Duplex Attempt

What We Want To Understand

Related topics