Recently we’ve been experiencing intermittent issue with music on hold. We have heavy loaded system with concurrent 100-200 calls. From time to time there is a warning thrown: res_musiconhold.c: Failed to write frame to 'PJSIP/xxx': Resource temporarily unavailable
It is thrown right after call of Musiconhold like this:
VERBOSE[88179][C-000091b0] pbx.c: Executing [000@context:31] MusicOnHold("PJSIP/sip-trunk-0000945d", "default,10") in new stack
VERBOSE[88179][C-000091b0] res_musiconhold.c: Started music on hold, class 'default', on channel 'PJSIP/sip-trunk-0000945d'
WARNING[88179][C-000091b0] res_musiconhold.c: Failed to write frame to 'PJSIP/sip-trunk-0000945d': Resource temporarily unavailable
VERBOSE[88179][C-000091b0] res_musiconhold.c: Stopped music on hold on PJSIP/sip-trunk-0000945d
The problem (apart from the warning itself) is that MOH is not played and dialplan continues to execute. It doesn’t happen all the time, most of the time moh is played ok, no errors or any issues. It happens with random moh class.
We checked everything:
ulimits values, most of them are unlimited and asterisk process doesn’t use much oppose to max values, this has been checked real time.
Disk load, io utilization - all within normal. We moved moh files to ramdisk as a test - it changed nothing.
Network connection, sockets limitations, transmit buffers - we changed core values for udp buffers net.core.rmem_max/default, changed receive (TX) and transmit (RX) buffers on network interface with ethtool. There is send-q and recv-q filling a bit with 1280 or 2560 bytes in netstat on asterisk udp ports from time to time but it is released quite fast.
Our system settings: PBX Core settings -----------------
System: Linux/5.4.0-91-generic built by root on x86_64 *
Our server is on hardware with quite a lot of resources, from our debug it doesn’t seems like lack of resources issue but more of a system limitation, we just can’t find out which one.
That error usually pops up when a codec is missing or it otherwise can’t convert. I can’t imagine alaw being missing from asterisk. I feel like this is a file read problem. Are your files raw alaw files (no header) with .alaw extension? How were the files converted over?
I would likely set up a test extension to make sure Asterisk can actually play those files back; move one of them to the default sound directories where Playback() can find it and see what happens.
There is no metadata in .alaw files, so Asterisk will play any file whose name ends in .alaw, without complaint, although the result may be various forms of noise.
I tested - Playback works fine, i can hear music, it is played and no error occurs.
The error is intermittent, so the same files are played just fine most of the time.
That’s negligible. However, recovering quickly is not a help, if you do overrun the queues, as the message relates to EAGAIN, which is something you get when using non-blocking writes, so there is no attempt, by the kernel, to wait for space to become available.
For VoIP media, the user level code won’t want to build up a backlog in user space, so will not retry in such cases.
When I said that 1280 or 2560 bytes were negligible that is in terms of bytes, however, RTP packets are small, around 200 bytes, and that large a backlog on RTP (~120 to 240 ms), especially if variable, would indicate that you had a severe jitter problem as the result of either a severe networking overload, or inadequate access to CPU resources (as might happen with an unsuitable VM configuration). Speach quality is unlikely to be acceptable.
I increased UDP buffers to 25MB, however i don’t think this is a problem because i see no errors on netstat -s
Udp:
… 0 send buffer errors
And error still comes up. This is definitely not VM configurations as this is hardware server.
Are there any other system(Ubuntu) limitations asterisk can reach apart from open files?