I’m working on an AI voice bot that streams audio dynamically to Asterisk using the WebSocket media interface (chan_websocket) .
The bot sends small audio chunks (e.g., 1600 bytes ≈ 200 ms of μ-law audio) continuously, and it needs to know when each chunk has actually been played by Asterisk before generating or sending the next one.
Right now, Asterisk buffers the incoming media and plays it in real time, but there’s no per-chunk acknowledgment mechanism.
We explored the following options:
REPORT_QUEUE_DRAINED → triggers only when the entire media queue becomes empty. This doesn’t help when the client needs acknowledgment after specific chunks (e.g., after two frames but before more are sent).
FLUSH_MEDIA → can clear buffered audio, but doesn’t provide playback confirmation.
MEDIA_XOFF / MEDIA_XON → handles flow control but not playback progress.
Our use case requires fine-grained feedback because the AI engine generates speech in real time and must know exactly when the previous segment has been played to decide when to generate and stream the next part. Without this, we either overfill the buffer (causing latency) or underflow (causing gaps).
Question:
Is there any existing or planned mechanism in Asterisk’s WebSocket media driver to acknowledge when a certain portion of the queued audio has been consumed or played — for example, per-frame or per-byte playback callbacks or progress events?
Such a feature would be extremely helpful for low-latency AI voice streaming applications that require tight synchronization with Asterisk playback.
jcolp
November 4, 2025, 9:57am
2
There’s a feature requests issue tracker[1] and a normal issue tracker[2]. If you don’t see anything, then chances are the answer is no and you can then file a feature request.
Note that any such event would be when media is provided to the Asterisk core, there is no indication when it has actually been sent or heard by any other party.
[1] GitHub · Where software is built
[2] GitHub · Where software is built
1 Like
Created a feature request
opened 10:05AM - 04 Nov 25 UTC
### Is your feature or improvement request related to a problem? Please describe… .
Asterisk’s WebSocket media driver (`chan_websocket`) allows streaming binary audio data from external applications (like AI voice bots) but does not provide any mechanism to know when a specific portion of audio has actually been played.
This causes synchronization issues for AI-driven real-time streaming systems that generate audio dynamically (e.g., Text-to-Speech or conversational AI). Without playback progress acknowledgment, the application has no way to determine when the audio it sent has finished playing.
This leads to two major problems:
1. Over-buffering — increases latency since the application keeps sending new audio before the previous one is played.
2. Under-buffering — causes playback gaps when the application waits too long to send the next chunk.
### Describe the solution you'd like
Introduce a playback progress acknowledgment mechanism for the WebSocket media driver.
Possible designs:
1. **Mark-based acknowledgment**
- Allow clients to send a `MARK id=<uuid>` text control command that sets a logical boundary in the playback queue.
- When Asterisk finishes playing all media queued before that mark, it responds with `MARK_PLAYED id=<uuid>`.
Example:
### Describe alternatives you've considered
We explored existing Asterisk WebSocket control messages:
- `REPORT_QUEUE_DRAINED`: Notifies only when the *entire* queue is empty — not useful for partial playback acknowledgment.
- `FLUSH_MEDIA`: Clears buffered audio but provides no confirmation of playback.
- `MEDIA_XOFF` / `MEDIA_XON`: Flow control only, unrelated to playback timing.
We also attempted to simulate playback timing locally by estimating real-time audio consumption (1 ms per 8 bytes for μ-law), but this approach is only an approximation and cannot confirm actual playback progress in Asterisk.
### Additional context
Use case: AI-driven voice bots and real-time speech generation systems streaming audio to Asterisk over WebSocket.
Modern AI engines (like OpenAI Realtime API, ElevenLabs, or custom TTS models) generate audio in small, variable-sized chunks. These systems require acknowledgment when certain chunks have been played so they can dynamically:
- Generate the next segment of speech
- Handle interruptions or “barge-in” events
- Avoid excessive buffering and latency
Adding per-chunk or progress-based acknowledgment would significantly improve synchronization for real-time applications and make Asterisk more compatible with emerging AI voice technologies.
Proposed area: WebSocket Media Driver (`chan_websocket`)
Author: Shrish Gulati
I’ve moved this into the main asterisk repo since it’s an improvement to an existing capability.
opened 10:05AM - 04 Nov 25 UTC
improvement
support-level-core
### Is your feature or improvement request related to a problem? Please describe… .
Asterisk’s WebSocket media driver (`chan_websocket`) allows streaming binary audio data from external applications (like AI voice bots) but does not provide any mechanism to know when a specific portion of audio has actually been played.
This causes synchronization issues for AI-driven real-time streaming systems that generate audio dynamically (e.g., Text-to-Speech or conversational AI). Without playback progress acknowledgment, the application has no way to determine when the audio it sent has finished playing.
This leads to two major problems:
1. Over-buffering — increases latency since the application keeps sending new audio before the previous one is played.
2. Under-buffering — causes playback gaps when the application waits too long to send the next chunk.
### Describe the solution you'd like
Introduce a playback progress acknowledgment mechanism for the WebSocket media driver.
Possible designs:
1. **Mark-based acknowledgment**
- Allow clients to send a `MARK id=<uuid>` text control command that sets a logical boundary in the playback queue.
- When Asterisk finishes playing all media queued before that mark, it responds with `MARK_PLAYED id=<uuid>`.
Example:
### Describe alternatives you've considered
We explored existing Asterisk WebSocket control messages:
- `REPORT_QUEUE_DRAINED`: Notifies only when the *entire* queue is empty — not useful for partial playback acknowledgment.
- `FLUSH_MEDIA`: Clears buffered audio but provides no confirmation of playback.
- `MEDIA_XOFF` / `MEDIA_XON`: Flow control only, unrelated to playback timing.
We also attempted to simulate playback timing locally by estimating real-time audio consumption (1 ms per 8 bytes for μ-law), but this approach is only an approximation and cannot confirm actual playback progress in Asterisk.
### Additional context
Use case: AI-driven voice bots and real-time speech generation systems streaming audio to Asterisk over WebSocket.
Modern AI engines (like OpenAI Realtime API, ElevenLabs, or custom TTS models) generate audio in small, variable-sized chunks. These systems require acknowledgment when certain chunks have been played so they can dynamically:
- Generate the next segment of speech
- Handle interruptions or “barge-in” events
- Avoid excessive buffering and latency
Adding per-chunk or progress-based acknowledgment would significantly improve synchronization for real-time applications and make Asterisk more compatible with emerging AI voice technologies.
Proposed area: WebSocket Media Driver (`chan_websocket`)
Author: Shrish Gulati
system
Closed
December 4, 2025, 1:58pm
5
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.