ExternalMedia WebSocket: packet drops when sending optimal_frame_size every 10ms (works at 20ms) — how to debug channel driver queue?

Hi everyone,

I’m working with Asterisk ExternalMedia over WebSocket and I’m seeing packet drops under a specific timing pattern. I’d appreciate some guidance on whether this is expected behavior and how to debug it properly.

Setup (simplified)

  • Asterisk ExternalMedia channel (WebSocket transport)

  • A dummy WebSocket server sends raw audio packets

  • A main media service sits in between:

    • Receives audio packets of optimal_frame_size from dummy server every 10ms

    • Forwards them as-is to Asterisk

  • Audio packets are exactly optimal_frame_size as provided by MEDIA_START

  • Codec: (slin / ulaw depending on call)

Flow

Dummy WS → Main Service → Asterisk ExternalMedia.

Dummy WS ← Main Service ← Asterisk ExternalMedia

Flow control

  • When MEDIA_XOFF is received from Asterisk:

    • Main service stops sending media
  • When MEDIA_XON is received:

    • Media sending resumes
  • No packets are intentionally dropped on the application side

The problem

  • Dummy server sends optimal_frame_size packets every 10ms

  • Main service forwards them immediately to Asterisk

  • After ~40–50 seconds of call time:

    • Audio starts breaking

    • Packets appear to be dropped

    • Playback becomes choppy

However:

  • If the same packets are sent every 20ms, everything works perfectly

  • No packet loss

  • No audio issues

This makes me suspect I’m overrunning something internally even though I’m respecting:

  • optimal_frame_size

  • MEDIA_XOFF / MEDIA_XON

My understanding so far

From documentation / discussions, I understand that:

  • The channel driver maintains an internal media queue

  • Roughly:

    • ~1000 frames max

    • XOFF around ~900 frames

    • XON around ~800 frames

  • Even with XOFF/XON handling, media sent before XOFF may still overflow if timing is off

This makes me wonder whether:

  • Sending 2× packets faster than ptime (10ms vs 20ms) is inherently unsafe

  • Or whether my understanding of XOFF/XON semantics is incomplete

Questions

  • Is it valid to send optimal_frame_size packets faster than the negotiated ptime?

    (e.g., 10ms packets when ptime=20)

  • Does ExternalMedia assume wall-clock pacing, not just packet size?

    In other words, is respecting ptime timing mandatory even if frame size is correct?

  • Is there any way to introspect or debug the channel driver media queue?

    • Queue depth

    • Frame backlog

    • Drops

    • Debug logs / CLI commands / tracepoints

  • Is MEDIA_XOFF intended as a “hard stop” guarantee, or is it best-effort and timing-sensitive?

For production Voice AI integration, what is the recommended approach?

  • Should the media engine:

    • Always re-clock audio at ptime?

    • Use a jitter buffer / ring buffer?

    • Treat ExternalMedia like RTP in terms of pacing?

  • Architecturally, what’s the recommended way to build a media engine that:

    • Talks to Asterisk ExternalMedia

    • Talks to another media source (Voice AI in production)

    • Handles pacing cleanly without trying to outsmart ptime

What I’m trying to confirm

Whether the correct model is:

“Even if you receive media faster, you must clock audio into Asterisk at ptime, otherwise drops are expected.”

or whether there’s a supported way to safely send faster-than-ptime media using XOFF/XON alone.

I’m investigating. Let me try and reproduce.

I just want to confirm… You’re sending 20ms of media every 10ms? Can I ask what the use case is for that?

Yes, that’s correct.

The dummy WebSocket server is sending packets of exactly optimal_frame_size every 10ms.
So if optimal_frame_size represents 20ms of audio, then effectively 20ms worth of media is being sent at a 10ms interval.

This is intentional in the test setup to simulate an upstream media source (e.g., Voice AI) that can overproduce or send media in bursts without awareness of ptime.

The intent is to understand whether ExternalMedia flow control (MEDIA_XOFF / MEDIA_XON) can safely handle this, or whether audio must always be strictly re-clocked to ptime regardless of packet size.

Also, to add one more scenario we tested:

Instead of sending exactly one optimal_frame_size packet every 10ms, we also tested the following approach:

  • The dummy WebSocket server sends multiples of optimal_frame_size (e.g., 2×, 3×, etc.) every 10ms to the main media service.

  • The main service then chunks this data into optimal_frame_size frames.

  • These frames are sent to Asterisk one by one until MEDIA_XOFF is received.

  • After MEDIA_XOFF:

    • The remaining data is buffered locally in the main service.
  • When MEDIA_XON is received:

    • The main service resumes chunking and sending the buffered data in optimal_frame_size frames.

Even in this model, we observe the same behavior:

  • After some time, packets appear to be dropped

  • Audio becomes corrupted / choppy

  • Whereas sending strictly one frame every ptime (20ms) works reliably

To answer some of your questions…

Is it valid to send optimal_frame_size packets faster than the negotiated ptime?(e.g., 10ms packets when ptime=20)

Yes, it should be.

Does ExternalMedia assume wall-clock pacing, not just packet size? In other words, is respecting ptime timing mandatory even if frame size is correct?

No. The incoming messages are broken up into correctly sized frames and placed on the internal queue as they arrive. Any excess data is saved in a buffer to be prepended to following messages until the buffer contains enough data to make a full frame which is then queued. There is no timing check at that point.

There’s an internal constant-rate “ptime” timer that pulls frames off the internal queue and sends them to the core. Given that the XOFF level is 900 frames and the XON level is 800 frames, you should be able to do the math to determine when you’d get the first XOFF, then subsequent XON/XOFF messages.

Is there any way to introspect or debug the channel driver media queue?

There are two ways… Your app can send a GET_STATUScommand which will return a STATUS event with the current queue depth, the XOFF/XON levels and flags that indicate whether the queue is full, where a bulk media xfer is in progress or whether the queue is paused. You can also turn debugging on for chan_websocket with the CLI command core set debug 4 chan_websocket.so . That should dump the events you’re interested in.

To answer some of your other questions…

The whole point of chan_websocket was to remove the need for you to rechunk or retime the audio data and was driven by the need to interact with AI agents. There are some caveats to that however… First, It can’t do it with codecs like opus because it uses packet headers and variable length packets. Second, if you send large chunks, you need to use the START_MEDIA_BUFFERING and STOP_MEDIA_BUFFERING commands.

My first troubleshooting recommendation would be to try without the intermediate server and see what happens.