MixMonitor D option produces invalid stereo .raw when bridged channels have different native sample rates

gauravs456 · May 28, 2026, 12:05pm

Title: MixMonitor D option produces invalid stereo .raw when bridged channels have different native sample rates

Hi all,

I’m hitting an issue with the D option in MixMonitor on Asterisk 22.8.0 when the two bridged channels use codecs with different native sample rates. The output .raw file ends up unusable — no single -r value passed to sox produces correct-sounding audio.

Setup

A typical inbound call in my voice-bot architecture:

Leg A (PJSIP) — inbound caller, NativeFormats: (ulaw) → 8 kHz
Leg B (WebSocket) — AI voice bot endpoint, NativeFormats: (slin16) → 16 kHz

The two are joined in a Stasis-bridged call. Channel info:

PJSIP/.../00000199
  NativeFormats: (ulaw)
  WriteFormat:   slin16
  ReadFormat:    slin16
  WriteTranscode: Yes (slin@16000)->(slin@8000)->(ulaw@8000)
  ReadTranscode:  Yes (ulaw@8000)->(slin@8000)->(slin@16000)

WebSocket/vpaas_rtp_engine/...
  NativeFormats: (slin16)
  WriteFormat:   slin16
  ReadFormat:    slin16
  WriteTranscode: No
  ReadTranscode:  No

MixMonitor launched against the PJSIP leg:

MixMonitor(/Recording/<hash>.raw,Db)

What I observe

The resulting .raw file plays incorrectly at every sample rate I try with sox:

sox -t raw -r 8000  -e signed -b 16 -c 2 file.raw out.wav  # too slow
sox -t raw -r 16000 -e signed -b 16 -c 2 file.raw out.wav  # too fast
sox -t raw -r 12000 -e signed -b 16 -c 2 file.raw out.wav  # still too fast

No fixed -r value produces normal-speed playback. Spectral analysis of the raw bytes shows energy mirrored across the spectrum, characteristic of a stream where frame rates are inconsistent — as if frames from one direction are at 8 kHz and the other at 16 kHz, but written into the same interleaved stereo stream without rate normalization.

What works

When both channels have matching native sample rates (e.g. ulaw ↔ ulaw, or slin16 ↔ slin16), the D option works perfectly and sox -r <rate> produces clean stereo output with caller on one channel and bot on the other.

The breakage only happens when native rates differ across the bridge.

Workaround I’m using

Switching to r(file) + t(file) (two separate .wav files, each with proper headers encoding the correct per-direction rate) and post-merging with sox -M works correctly across all codec combinations, because sox reads the rate from each WAV header and resamples as needed.

Suggestion

It would be very helpful if the D option could:

Detect when the two directions have different sample rates and resample one to match the other before interleaving, OR
Emit the raw stream at a defined fixed rate (e.g. the higher of the two, with the lower direction upsampled), OR
At minimum, document this limitation — currently the docs just say “Interleave the audio coming from the channel and the audio going to the channel and output it as a 2 channel (stereo) raw stream”, with no mention that both directions must be at the same native rate.

The ideal behavior would be a guarantee that the resulting .raw file is always playable at a single, deterministic sample rate regardless of codec mismatch on the bridge — that would make D reliable for mixed-codec architectures like AI voice bots, WebRTC<->PSTN bridges, etc.

Environment

Asterisk 22.8.0 (also reproduced on 22.8.2)
Standard app_mixmonitor.so
Mixed PJSIP (ulaw) ↔ WebSocket (slin16/slin24/slin48/opus) bridges
chan_websocket channel driver

Has anyone else run into this? Is there a way to force the audiohook to a specific format before interleaving, or is the r()+t()+sox merge the only path for mixed-codec stereo recording?

Thanks!
Claude Help me to write this problem Statement
Thanks to claude

jcolp · May 28, 2026, 12:07pm

It could very well be the same underlying thing as:

github.com/asterisk/asterisk

[bug]: MixMonitor with flag D produces garbage in a 16KHz bridge

opened 11:09AM - 18 Feb 26 UTC

nappsoft

bug support-level-core

### Severity Minor ### Versions 22.8.2 ### Components/Modules apps/app_mixm…onitor.ch ### Operating Environment Linux ### Frequency of Occurrence None ### Issue Description When using MixMonitor with a .raw file and the Flags Db, the output is garbage as soon as the bridge is operating with 16KHz (for example with a G722 codec). I have created a workaround (see attached .txt, what in fact is a diff, however adding a .diff is not supported), but I won't create a pull request with this code as my solution is quite naive (downsampling by simply discarding every second sample)... Have a look at the changes in the diff wich produces a "usable" output for me. But proper downsampling would be a better solution, so someone who knows better how the audiohook and ast_writefile works should probably create a better solution. [mix_monitor.txt](https://github.com/user-attachments/files/25388795/mix_monitor.txt) ### Relevant log output ```shell ``` ### Asterisk Issue Guidelines - [x] Yes, I have read the Asterisk Issue Guidelines

Topic		Replies	Views
MixMonitor with `D` flag produces distorted/unplayable audio when two channel legs have different codecs (ulaw + slin16) Asterisk Support	6	55	May 8, 2026
MixMonitor quality audio issue with "D" flag Asterisk Support	13	223	July 20, 2025
MixMonitor to create two channel stereo in one audio file Asterisk Support	2	190	April 15, 2025
Mixmonitor r() and t() options swapping audio mid call Asterisk Support	3	114	December 30, 2024
MixMonitor interleaved audio option not working Asterisk Support	3	110	July 3, 2025