I’m building a voice bot using Asterisk ARI with real-time STT/TTS. I’m experiencing a ~8-10 second delay when implementing barge-in (interrupting TTS playback when the user starts speaking).
The Problem: When I detect speech during TTS playback, I immediately:
-
Stop all active playbacks via
DELETE /playbacks/{id}
-
Clear my audio queue
-
Start new TTS for the response
However, there’s a noticeable delay (8-10 seconds) before the new TTS actually starts playing. It seems like audio frames are still draining from Asterisk’s buffers even after the DELETE calls succeed.
Current Setup:
-
Using bridge-based playback (playing to bridge, not directly to channel)
-
TTS arrives as GSM chunks that I play sequentially using
POST /bridges/{id}/play
-
Each chunk creates a separate playback ID
-
During barge-in, I may have 10-15 active playbacks to stop
Relevant Code:
Playing audio chunks to bridge
async def play_audio_to_bridge(channel_id, audio_file):bridge_id = bridges[f"playback_{channel_id}“]media_uri = f"sound:{audio_file_basename}”
response = await client.post(
f"{ARI_URL}/bridges/{bridge_id}/play",
auth=(ARI_USER, ARI_PASS),
params={"media": media_uri}
)
playback_id = response.json()["id"]
return playback_id
Current barge-in attempt - has delay issue
async def stop_all_playbacks(channel_id):# Get all active playback IDs for this channelplayback_ids = get_active_playback_ids(channel_id) # returns 10-15 IDs typically
# Stop them all in parallel
tasks = []
for pid in playback_ids:
task = client.delete(f"{ARI_URL}/playbacks/{pid}", auth=(ARI_USER, ARI_PASS))
tasks.append(task)
await asyncio.gather(*tasks) # All DELETE calls succeed
# Problem: Audio still plays for 8-10 seconds after this!
What I’ve Tried:
-
Stopping playbacks sequentially vs parallel - no difference
-
Adding small delays between stop calls - no improvement
-
Removing channel from bridge temporarily - causes audio glitches
Questions:
-
Is there a way to immediately flush/clear audio frames queued in a bridge?
-
Would destroying and recreating the bridge be a better approach for instant cutoff?
-
Is there a “batch stop all playbacks” capability I’m missing?
-
Are there bridge or channel settings that would reduce buffering?
I’m considering destroying the entire bridge and creating a new one for each barge-in, but wanted to check if there’s a cleaner approach. Even this approach is not quite working right now.
Environment:
-
Asterisk 20.15.1 with ARI
-
Bridge type: mixing
-
Audio format: GSM files played as
sound:
resources -
Python asyncio with httpx for ARI calls
Any guidance on achieving instant audio cutoff for barge-in would be greatly appreciated!