I’m working on an AI voice bot that streams audio dynamically to Asterisk using the WebSocket media interface (chan_websocket).
The bot sends small audio chunks (e.g., 1600 bytes ≈ 200 ms of μ-law audio) continuously, and it needs to know when each chunk has actually been played by Asterisk before generating or sending the next one.
Right now, Asterisk buffers the incoming media and plays it in real time, but there’s no per-chunk acknowledgment mechanism.
We explored the following options:
REPORT_QUEUE_DRAINED → triggers only when the entire media queue becomes empty. This doesn’t help when the client needs acknowledgment after specific chunks (e.g., after two frames but before more are sent).
FLUSH_MEDIA → can clear buffered audio, but doesn’t provide playback confirmation.
MEDIA_XOFF / MEDIA_XON → handles flow control but not playback progress.
Our use case requires fine-grained feedback because the AI engine generates speech in real time and must know exactly when the previous segment has been played to decide when to generate and stream the next part. Without this, we either overfill the buffer (causing latency) or underflow (causing gaps).
Question:
Is there any existing or planned mechanism in Asterisk’s WebSocket media driver to acknowledge when a certain portion of the queued audio has been consumed or played — for example, per-frame or per-byte playback callbacks or progress events?
Such a feature would be extremely helpful for low-latency AI voice streaming applications that require tight synchronization with Asterisk playback.
There’s a feature requests issue tracker[1] and a normal issue tracker[2]. If you don’t see anything, then chances are the answer is no and you can then file a feature request.
Note that any such event would be when media is provided to the Asterisk core, there is no indication when it has actually been sent or heard by any other party.