ARI: Delay when stopping bridge playbacks for barge-in - frames still draining after DELETE calls

I’m building a voice bot using Asterisk ARI with real-time STT/TTS. I’m experiencing a ~8-10 second delay when implementing barge-in (interrupting TTS playback when the user starts speaking).

The Problem: When I detect speech during TTS playback, I immediately:

  1. Stop all active playbacks via DELETE /playbacks/{id}

  2. Clear my audio queue

  3. Start new TTS for the response

However, there’s a noticeable delay (8-10 seconds) before the new TTS actually starts playing. It seems like audio frames are still draining from Asterisk’s buffers even after the DELETE calls succeed.

Current Setup:

  • Using bridge-based playback (playing to bridge, not directly to channel)

  • TTS arrives as GSM chunks that I play sequentially using POST /bridges/{id}/play

  • Each chunk creates a separate playback ID

  • During barge-in, I may have 10-15 active playbacks to stop

    Relevant Code:

Playing audio chunks to bridge

async def play_audio_to_bridge(channel_id, audio_file):bridge_id = bridges[f"playback_{channel_id}“]media_uri = f"sound:{audio_file_basename}”
response = await client.post(
    f"{ARI_URL}/bridges/{bridge_id}/play",
    auth=(ARI_USER, ARI_PASS),
    params={"media": media_uri}
)
playback_id = response.json()["id"]
return playback_id

Current barge-in attempt - has delay issue

async def stop_all_playbacks(channel_id):# Get all active playback IDs for this channelplayback_ids = get_active_playback_ids(channel_id)  # returns 10-15 IDs typically
# Stop them all in parallel
tasks = []
for pid in playback_ids:
    task = client.delete(f"{ARI_URL}/playbacks/{pid}", auth=(ARI_USER, ARI_PASS))
    tasks.append(task)

await asyncio.gather(*tasks)  # All DELETE calls succeed

# Problem: Audio still plays for 8-10 seconds after this!

What I’ve Tried:

  1. Stopping playbacks sequentially vs parallel - no difference

  2. Adding small delays between stop calls - no improvement

  3. Removing channel from bridge temporarily - causes audio glitches

Questions:

  1. Is there a way to immediately flush/clear audio frames queued in a bridge?

  2. Would destroying and recreating the bridge be a better approach for instant cutoff?

  3. Is there a “batch stop all playbacks” capability I’m missing?

  4. Are there bridge or channel settings that would reduce buffering?

I’m considering destroying the entire bridge and creating a new one for each barge-in, but wanted to check if there’s a cleaner approach. Even this approach is not quite working right now.

Environment:

  • Asterisk 20.15.1 with ARI

  • Bridge type: mixing

  • Audio format: GSM files played as sound: resources

  • Python asyncio with httpx for ARI calls

Any guidance on achieving instant audio cutoff for barge-in would be greatly appreciated!

Stopping is asynchronous, even though the HTTP response is received it still takes some time for it to actually execute, and they are serialized. I wouldn’t expect 8-10 seconds for 10-15 playbacks, but I don’t know what would be reasonable for that. Normally if you’re doing that then it’s a single playback with multiple sounds, not individual ones. You’re trying to turn it into a realtime streaming mechanism using chunks of audio with individual playbacks. It’s really not designed to be used this way in such a realtime/responsive manner. It can probably work to a point, but you’re pushing it beyond that in my opinion.

As for your questions - the documentation is derived from the implementation. If you don’t see the ability to do something in an HTTP request with ARI, then it’s most likely not there. There’s no way to explicitly flush/clear audio frames (which I don’t think would solve this really), no batch stop all playbacks, no settings to reduce any buffering.

I understand now, Thank you.

is there any documentation you can share with me to solve this issue? what will be the best way to achieve real time streaming in Asterisk?

No, as you’re trying to use it in a way that it wasn’t meant to be - so you may be able to get it working using this approach, or not.

actually doing things in a real time streaming fashion using the methods provided, like external media functionality be it RTP, Audiosocket, or the media over websocket functionality that will be in release candidates soon.

I’d suggest first looking at the existing AI related threads on this topic to see if there is useful information there.

It seems to me, the obvious thing to do in this situation is stop the audio source first, then do the Asterisk cleanup part.

Yes, that’s exactly what we are doing: as soon as a barge-in occurs, we stop the currently playing audio, pass the STT chunks to the LLM, and then feed the LLM output to the TTS. We also receive the TTS output, but there is still a delay in the cleanup phase because of that which takes around 8 - 10 sec for the cleanup.