High CPU load after upgrade from 11 to 13

Yesterday I upgraded form Asterisk 11 to Asterisk 13. Now the asterisk process is consuming more then 100% load, in previous version it was rarely 10%.

Calls are getting interrupted with jitter buffer loss, wrong timestamps and out off seq packets.

What could be the problem?

Noone can really answer this. Everyone uses Asterisk differently which exercises different parts and can have different results. You need to provide more information about your usage, and narrow down what exactly you are doing that is causing it.

We have around 200 extensions on an asterisk/FreePBX server. About 10 concurrent calls max.
Most of the extensions are chan_sip, a few are IAX2 (hylafax on iaxmodem).

This whole setup went perfect on asterisk 11.xx for a few years, however after updating to 13.15.0 CPU load went very high and calls are getting interrupted.

We had a virtual server with 2 cores and 8GB RAM, we added two more cores buth this didn’t solve the result.

I can’t speak for the hylafax/iaxmodem usage but the rest is very common and done by quite a few people. Are you doing anything else? Have you narrowed it down at all? IE: Is there high CPU usage even with no calls?

With no calls the CPU there is no high CPU, when we start calling the CPU gets up.

Now the CPU for the asterisk process is 280%, whilest the load is very low at 0.04.

top - 14:13:50 up 1:48, 4 users, load average: 0.04, 0.20, 0.30
Tasks: 212 total, 1 running, 210 sleeping, 0 stopped, 1 zombie
Cpu0 : 26.3%us, 35.2%sy, 0.0%ni, 38.1%id, 0.0%wa, 0.0%hi, 0.4%si, 0.0%st
Cpu1 : 31.4%us, 34.1%sy, 0.0%ni, 34.1%id, 0.0%wa, 0.0%hi, 0.4%si, 0.0%st
Cpu2 : 28.6%us, 32.8%sy, 0.0%ni, 38.2%id, 0.0%wa, 0.0%hi, 0.4%si, 0.0%st
Cpu3 : 30.4%us, 30.8%sy, 0.0%ni, 38.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8051600k total, 734436k used, 7317164k free, 19928k buffers
Swap: 1738748k total, 0k used, 1738748k free, 124540k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6511 root 20 0 2793m 170m 13m S 280.4 2.2 19:44.20 asterisk

Anyone an idea how to debug this?

I now switched back to 11.20.0 where I don’t have this problem.

Yes, you have to isolate things. Take parts of your usage away and better determine precisely what is causing the high CPU usage. That’s the only way really to figure out what is going on.

The only usage is the asterisk process. If I stop this the server behaviour is normal.

What could I isolate in asterisk itself?

Reduce your usage of Asterisk to the most minimal level, and then add on additional things you are using. If you see a sudden spike then you know it is what you just added for usage.

It’s not that simple. If I unload the sip module or block port 5060 on the firewall, everything is fine. Even with chan_sip.so loaded and firewall port 5060 is open the server load is quite when there are no calls (eg. in the evening).
It all starts when there are multiple calls at the same time. CPU use time then hits 400% sometimes.

I didn’t say to unload the SIP module or block port 5060. I said to reduce your Asterisk usage to try to isolate things. For example, if calls are being recorded then don’t. You have to understand what the calls are doing and then simplify that to try to figure it out. Noone else has reported anything like this, so doing that can isolate what it is in your environment exactly that is doing it.

Recording is off.

When I make an outgoing call I see those warnings:
[2017-05-02 15:14:56] WARNING[47914]: func_cdr.c:383 cdr_write_callback: CDR requires a value (CDR(variable)=value)
[2017-05-02 15:14:57] WARNING[51631][C-00000009]: chan_iax2.c:1239 jb_warning_output: Resyncing the jb. last_delay 0, this delay -410728462, threshold 1000, new offset 410728462

With only 1 outgoing call (2 channels) the CPU use of process asterisk goes up to 20%.

Are you using a jitterbuffer in Asterisk? If you disable that what happens?

Disabled jitterbuffer and disabled rtp checksums.

Problem stays the same.

Then you have to try to isolate it even more. That’s the only thing I can really say about this…

Yeah, I don’t know what to isolate even more. I also deleted all the logfile creating.

Reloading also takes quite some more time then before. Is there any possibility to debug the reloading process?

The easiest way is like I’ve been saying, simplify and eliminate things and then add back. That includes for configuration. Outside of that you have to do process profiling to understand where CPU time is being spent, in what functions.

Just happened something similar, this compilation flags were activated in “Compiler” Section of menuconfig:

  • DONT OPTIMIZE
  • DEBUG THREADS
  • DEBUG FD_LEAKS
  • *debug options…

After disabling cpu went down a 60% (was 150% higher than usual).

13 is EOL. You need to be on 20 or 21.