MOH issue: Failed to write frame to xxx Resource temporarily unavailable

Hello everyone!

Recently we’ve been experiencing intermittent issue with music on hold. We have heavy loaded system with concurrent 100-200 calls. From time to time there is a warning thrown:
res_musiconhold.c: Failed to write frame to 'PJSIP/xxx': Resource temporarily unavailable
It is thrown right after call of Musiconhold like this:

VERBOSE[88179][C-000091b0] pbx.c: Executing [000@context:31] MusicOnHold("PJSIP/sip-trunk-0000945d", "default,10") in new stack
VERBOSE[88179][C-000091b0] res_musiconhold.c: Started music on hold, class 'default', on channel 'PJSIP/sip-trunk-0000945d'
WARNING[88179][C-000091b0] res_musiconhold.c: Failed to write frame to 'PJSIP/sip-trunk-0000945d': Resource temporarily unavailable
VERBOSE[88179][C-000091b0] res_musiconhold.c: Stopped music on hold on PJSIP/sip-trunk-0000945d

The problem (apart from the warning itself) is that MOH is not played and dialplan continues to execute. It doesn’t happen all the time, most of the time moh is played ok, no errors or any issues. It happens with random moh class.
We checked everything:

  • ulimits values, most of them are unlimited and asterisk process doesn’t use much oppose to max values, this has been checked real time.
  • Disk load, io utilization - all within normal. We moved moh files to ramdisk as a test - it changed nothing.
  • Network connection, sockets limitations, transmit buffers - we changed core values for udp buffers net.core.rmem_max/default, changed receive (TX) and transmit (RX) buffers on network interface with ethtool. There is send-q and recv-q filling a bit with 1280 or 2560 bytes in netstat on asterisk udp ports from time to time but it is released quite fast.

Our system settings:
PBX Core settings
-----------------

  • Version: 18.7.1*
  • Build Options: DONT_OPTIMIZE, COMPILE_DOUBLE, BUILD_NATIVE, OPTIONAL_API*
  • System: Linux/5.4.0-91-generic built by root on x86_64 *

Our server is on hardware with quite a lot of resources, from our debug it doesn’t seems like lack of resources issue but more of a system limitation, we just can’t find out which one.

Any help would be much appreciated!
Thanks

What’s your moh config…as in what’s your source(s)?

All MOH files are located at /var/lib/asterisk/moh/ under specified subdirectories. We have several classes all with settings:

[CLASSNAME]
mode=files
directory=/var/lib/asterisk/moh/CLASS_FOLDER
sort=alpha

All sound files are in alaw format.

That error usually pops up when a codec is missing or it otherwise can’t convert. I can’t imagine alaw being missing from asterisk. I feel like this is a file read problem. Are your files raw alaw files (no header) with .alaw extension? How were the files converted over?

I would likely set up a test extension to make sure Asterisk can actually play those files back; move one of them to the default sound directories where Playback() can find it and see what happens.

There is no metadata in .alaw files, so Asterisk will play any file whose name ends in .alaw, without complaint, although the result may be various forms of noise.

I tested - Playback works fine, i can hear music, it is played and no error occurs.
The error is intermittent, so the same files are played just fine most of the time.

I’d assume you have some sort of overload.

You could try increasing he debugging to see if you get a message closer to the original cause.

That’s negligible. However, recovering quickly is not a help, if you do overrun the queues, as the message relates to EAGAIN, which is something you get when using non-blocking writes, so there is no attempt, by the kernel, to wait for space to become available.

For VoIP media, the user level code won’t want to build up a backlog in user space, so will not retry in such cases.

I enabled debug to 4, there is piece of log i got:

[2022-11-24 12:17:10] DEBUG[849313][C-00029f31] pbx.c: Launching ‘MusicOnHold’
[2022-11-24 12:17:10] VERBOSE[849313][C-00029f31] pbx.c: Executing [11111@context:31] MusicOnHold(“PJSIP/sip-trunk-0005cf03”, “MOH_CLASS,10”) in new stack
[2022-11-24 12:17:10] VERBOSE[849313][C-00029f31] res_musiconhold.c: Started music on hold, class ‘MOH_CLASS’, on channel ‘PJSIP/sip-trunk-0005cf03’
[2022-11-24 12:17:10] DEBUG[849313][C-00029f31] channel.c: Scheduling timer at (50 requested / 50 actual) timer ticks per second
[2022-11-24 12:17:10] DEBUG[849313][C-00029f31] channel.c: Channel PJSIP/sip-trunk-0005cf03 setting write format path: alaw → alaw
[2022-11-24 12:17:10] DEBUG[849313][C-00029f31] res_musiconhold.c: PJSIP/sip-trunk-0005cf03 Opened file 0 ‘/var/lib/asterisk/moh/MOH_CLASS/MOH_CLASS_MOH’
[2022-11-24 12:17:10] WARNING[849313][C-00029f31] res_musiconhold.c: Failed to write frame to ‘PJSIP/sip-trunk-0005cf03’: Resource temporarily unavailable
[2022-11-24 12:17:10] DEBUG[849313][C-00029f31] channel.c: Auto-deactivating generator
[2022-11-24 12:17:10] VERBOSE[849313][C-00029f31] res_musiconhold.c: Stopped music on hold on PJSIP/sip-trunk-0005cf03
[2022-11-24 12:17:10] DEBUG[849313][C-00029f31] channel.c: Channel PJSIP/sip-trunk-0005cf03 setting write format path: alaw → alaw
[2022-11-24 12:17:10] DEBUG[849313][C-00029f31] channel.c: Scheduling timer at (0 requested / 0 actual) timer ticks per second
[2022-11-24 12:17:10] DEBUG[849313][C-00029f31] pbx.c: Spawn extension (context2,3,1) exited non-zero on ‘PJSIP/sip-trunk-0005cf03’

I don’t see anything wrong.
I also encoded one of the moh file to wav to see if this changes anything, but error pops up with wav file too.

Is there anything else i can check what could cause the issue?

When I said that 1280 or 2560 bytes were negligible that is in terms of bytes, however, RTP packets are small, around 200 bytes, and that large a backlog on RTP (~120 to 240 ms), especially if variable, would indicate that you had a severe jitter problem as the result of either a severe networking overload, or inadequate access to CPU resources (as might happen with an unsuitable VM configuration). Speach quality is unlikely to be acceptable.

I increased UDP buffers to 25MB, however i don’t think this is a problem because i see no errors on netstat -s

Udp:

0 send buffer errors

And error still comes up. This is definitely not VM configurations as this is hardware server.
Are there any other system(Ubuntu) limitations asterisk can reach apart from open files?