Need mechanism to acknowledge when queued audio chunks are played in WebSocket media connection

gauravs456 · November 4, 2025, 9:01am

I’m working on an AI voice bot that streams audio dynamically to Asterisk using the WebSocket media interface (chan_websocket).
The bot sends small audio chunks (e.g., 1600 bytes ≈ 200 ms of μ-law audio) continuously, and it needs to know when each chunk has actually been played by Asterisk before generating or sending the next one.

Right now, Asterisk buffers the incoming media and plays it in real time, but there’s no per-chunk acknowledgment mechanism.
We explored the following options:

REPORT_QUEUE_DRAINED → triggers only when the entire media queue becomes empty. This doesn’t help when the client needs acknowledgment after specific chunks (e.g., after two frames but before more are sent).
FLUSH_MEDIA → can clear buffered audio, but doesn’t provide playback confirmation.
MEDIA_XOFF / MEDIA_XON → handles flow control but not playback progress.

Our use case requires fine-grained feedback because the AI engine generates speech in real time and must know exactly when the previous segment has been played to decide when to generate and stream the next part. Without this, we either overfill the buffer (causing latency) or underflow (causing gaps).

Question:
Is there any existing or planned mechanism in Asterisk’s WebSocket media driver to acknowledge when a certain portion of the queued audio has been consumed or played — for example, per-frame or per-byte playback callbacks or progress events?

Such a feature would be extremely helpful for low-latency AI voice streaming applications that require tight synchronization with Asterisk playback.

jcolp · November 4, 2025, 9:57am

There’s a feature requests issue tracker[1] and a normal issue tracker[2]. If you don’t see anything, then chances are the answer is no and you can then file a feature request.

Note that any such event would be when media is provided to the Asterisk core, there is no indication when it has actually been sent or heard by any other party.

[1] GitHub · Where software is built

[2] GitHub · Where software is built

gauravs456 · November 4, 2025, 10:05am

Created a feature request

github.com/asterisk/asterisk-feature-requests

Add playback progress acknowledgment for WebSocket media (per-chunk or byte-level acknowledgment)

opened 10:05AM - 04 Nov 25 UTC

gauravs456

### Is your feature or improvement request related to a problem? Please describe…. Asterisk’s WebSocket media driver (`chan_websocket`) allows streaming binary audio data from external applications (like AI voice bots) but does not provide any mechanism to know when a specific portion of audio has actually been played. This causes synchronization issues for AI-driven real-time streaming systems that generate audio dynamically (e.g., Text-to-Speech or conversational AI). Without playback progress acknowledgment, the application has no way to determine when the audio it sent has finished playing. This leads to two major problems: 1. Over-buffering — increases latency since the application keeps sending new audio before the previous one is played. 2. Under-buffering — causes playback gaps when the application waits too long to send the next chunk. ### Describe the solution you'd like Introduce a playback progress acknowledgment mechanism for the WebSocket media driver. Possible designs: 1. **Mark-based acknowledgment** - Allow clients to send a `MARK id=<uuid>` text control command that sets a logical boundary in the playback queue. - When Asterisk finishes playing all media queued before that mark, it responds with `MARK_PLAYED id=<uuid>`. Example: ### Describe alternatives you've considered We explored existing Asterisk WebSocket control messages: - `REPORT_QUEUE_DRAINED`: Notifies only when the *entire* queue is empty — not useful for partial playback acknowledgment. - `FLUSH_MEDIA`: Clears buffered audio but provides no confirmation of playback. - `MEDIA_XOFF` / `MEDIA_XON`: Flow control only, unrelated to playback timing. We also attempted to simulate playback timing locally by estimating real-time audio consumption (1 ms per 8 bytes for μ-law), but this approach is only an approximation and cannot confirm actual playback progress in Asterisk. ### Additional context Use case: AI-driven voice bots and real-time speech generation systems streaming audio to Asterisk over WebSocket. Modern AI engines (like OpenAI Realtime API, ElevenLabs, or custom TTS models) generate audio in small, variable-sized chunks. These systems require acknowledgment when certain chunks have been played so they can dynamically: - Generate the next segment of speech - Handle interruptions or “barge-in” events - Avoid excessive buffering and latency Adding per-chunk or progress-based acknowledgment would significantly improve synchronization for real-time applications and make Asterisk more compatible with emerging AI voice technologies. Proposed area: WebSocket Media Driver (`chan_websocket`) Author: Shrish Gulati

gjoseph · November 4, 2025, 1:57pm

I’ve moved this into the main asterisk repo since it’s an improvement to an existing capability.

github.com/asterisk/asterisk

[improvement]: Add playback progress acknowledgment for WebSocket media (per-chunk or byte-level acknowledgment)

opened 10:05AM - 04 Nov 25 UTC

gauravs456

improvement support-level-core

### Is your feature or improvement request related to a problem? Please describe…. Asterisk’s WebSocket media driver (`chan_websocket`) allows streaming binary audio data from external applications (like AI voice bots) but does not provide any mechanism to know when a specific portion of audio has actually been played. This causes synchronization issues for AI-driven real-time streaming systems that generate audio dynamically (e.g., Text-to-Speech or conversational AI). Without playback progress acknowledgment, the application has no way to determine when the audio it sent has finished playing. This leads to two major problems: 1. Over-buffering — increases latency since the application keeps sending new audio before the previous one is played. 2. Under-buffering — causes playback gaps when the application waits too long to send the next chunk. ### Describe the solution you'd like Introduce a playback progress acknowledgment mechanism for the WebSocket media driver. Possible designs: 1. **Mark-based acknowledgment** - Allow clients to send a `MARK id=<uuid>` text control command that sets a logical boundary in the playback queue. - When Asterisk finishes playing all media queued before that mark, it responds with `MARK_PLAYED id=<uuid>`. Example: ### Describe alternatives you've considered We explored existing Asterisk WebSocket control messages: - `REPORT_QUEUE_DRAINED`: Notifies only when the *entire* queue is empty — not useful for partial playback acknowledgment. - `FLUSH_MEDIA`: Clears buffered audio but provides no confirmation of playback. - `MEDIA_XOFF` / `MEDIA_XON`: Flow control only, unrelated to playback timing. We also attempted to simulate playback timing locally by estimating real-time audio consumption (1 ms per 8 bytes for μ-law), but this approach is only an approximation and cannot confirm actual playback progress in Asterisk. ### Additional context Use case: AI-driven voice bots and real-time speech generation systems streaming audio to Asterisk over WebSocket. Modern AI engines (like OpenAI Realtime API, ElevenLabs, or custom TTS models) generate audio in small, variable-sized chunks. These systems require acknowledgment when certain chunks have been played so they can dynamically: - Generate the next segment of speech - Handle interruptions or “barge-in” events - Avoid excessive buffering and latency Adding per-chunk or progress-based acknowledgment would significantly improve synchronization for real-time applications and make Asterisk more compatible with emerging AI voice technologies. Proposed area: WebSocket Media Driver (`chan_websocket`) Author: Shrish Gulati

system · December 4, 2025, 1:58pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sending Raw Audio Stream over Websockets Asterisk APIs	5	3004	May 7, 2022
How to Send Call Media to a Bot via WebSocket and Receive Responses in Asterisk Asterisk APIs	2	759	February 26, 2025
Regarding to the Media Traffic over the external Media AI	1	37	November 23, 2025
Asterisk Audio Sockets / Web Sockets Asterisk APIs	6	534	April 12, 2024
How do I get the called party's audio stream while playing audio Asterisk APIs	1	56	October 4, 2024

Need mechanism to acknowledge when queued audio chunks are played in WebSocket media connection

Related topics