Heavy UDP errors on Asterisk 1.6.1 in G.729 transcoding app

Hello,

We’re running Asterisk 1.6.1.11 on a 4 CPU core CentOS Linux server and Digium’s codec_g729a-1.6.1_3.1.4-core2_64 codec pack for transcoding between G.729 and G.711 u-law. Here’s the output of uname -a:

Linux sbc401 2.6.18-8.el5 #1 SMP Thu Mar 15 19:46:53 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

With only 75 active SIP dialogs (50% G.729 and 50% G.711) with see heavy UDP packet loss with RTP and SIP. The Asterisk logs show entries like

May 14 16:52:22 sbc401 asterisk[32259]: WARNING[32269]: chan_sip.c:3397 in retrans_pkt: Maximum retries exceeded on transmission 800efabf96bdf141154c3201e00 for seqno 102 (Non-critical Request) – See doc/sip-retransmit.txt.
May 14 16:53:38 sbc401 asterisk[32259]: WARNING[32269]: chan_sip.c:3397 in retrans_pkt: Maximum retries exceeded on transmission 0883474fa6bdf11174c3201e00 for seqno 102 (Non-critical Request) – See doc/sip-retransmit.txt.

The network itself does not exhibit packet loss and individual test calls with no load on the server are just fine (no UDP errors). Under load with 75 dialogs the top output is similar to

top - 16:59:09 up 7 days, 20:34, 2 users, load average: 0.65, 0.92, 0.99
Tasks: 228 total, 2 running, 226 sleeping, 0 stopped, 0 zombie
Cpu(s): 11.4%us, 1.7%sy, 0.0%ni, 85.2%id, 0.0%wa, 0.4%hi, 1.3%si, 0.0%st
Mem: 1026960k total, 1014828k used, 12132k free, 324044k buffers
Swap: 8388600k total, 264k used, 8388336k free, 504144k cached

The load isn’t too high yet netstat -su show bursts of UDP receive packet errors and netstat -nlp shows that Asterisk cannot process received UDP packets fast enough, causing a buildup in the UDP receive queues and packets to be dropped

Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
udp 78384 0 0.0.0.0:15272 0.0.0.0:* 32259/asterisk
udp 16192 0 0.0.0.0:15550 0.0.0.0:* 32259/asterisk

We cannot reproduce this phenomenon with a light SIPP load. Under heavy simulated load we see this happening around 180 SIP dialogs. Freeswitch can handle about double that load before being crippled by UDP errors.

We have applied ulimits and are running Asterisk in high priority (nice level -11).

Please let us know if you have any idea what could be the cause of the issue and how we could resolve it. Thank you.

Best Regards,

Serge