High CPU spikes with PJSIP and Asterisk 13.15

Thank you guys. I understand that this is not a number one priority. First, I will try the same setup in different virtual environment for myself. Then, If I see no change in behavior, I will do what I can to deliver all the data you requested.

Thank you very much for your time and expertise.

Hello everyone!

I have the very same problem with Asterisk version 13.17.2. I have several Asterisk systems with lots of extensions which work perfectly on chan_sip. Recently we decided to migrate to chan_pjsip as it has many features we would like to use and which are not available in chan_sip, one of them is maxcontacts directive and ability have static contacts along with dynamic ones. I set everything up, first with 1-2 extensions, tested - everything is good. However as soon as I add all of the extensions i need, and it is around 2500, on pjsip reload console freezes, CPU goes to 100% and after pjsip is unresponsive, all extensions went to Unreachable state and phones can’t register.
I can see messages like:

The ‘subm:endpoint_topic_all-cached- task’ - processor queue reached 500 scheduled tasks.

core show taskprocessors

shows queue and number in queue increasing under pjsip/default like this:

Processor                                      Processed   In Queue  Max Depth  Low water High water
app_voicemail                                          0          0          0        450        500
ast_msg_queue                                          0          0          0        450        500
CCSS_core                                              0          0          0        450        500
iax2_transmit                                          0          0          0        450        500
pjsip/default-0000000b                              1708      22675      22674        450        500
pjsip/default-0000000c                              1834      22673      22653        450        500
pjsip/default-0000000d                              1825      22672      22668        450        500
pjsip/default-0000000e                              1777      22668      22665        450        500
pjsip/default-0000000f                              1821      22674      22655        450        500
pjsip/default-00000010                              1697      22678      22654        450        500
pjsip/default-00000011                              1429      22864      22854        450        500
pjsip/default-00000012                              1825      22664      22639        450        500

I searched google a lot and found that pjsip now uses and rely on task processors and if there are many taskprocessors in queue then pjsip slows down until it frees queue. I even added 1 more core to this test server to see if it helps but asterisk now eat 200% of CPU and situation has not changed.
It is worth mentioning that all of my extensions have static
contact=sip:${EXTEN}@IP:5060
directive because i need asterisk to dial not only registered IP but sending a call to another server as well.
As soon as i lower down number of loaded extensions to, say, 450 it loads, a bit slow but extension are working fine.
So it seems like the problem is in amount of extensions.

I tested this configuration and this problem happens on all versions beginning from version 13.8(didn’t test earlier ones) and to 15.
I gathered backtrace and “core show locks” information during this problem. Should this be added to asterisk tracker or there is no chance for this to be checked and fixed?

I appreciate you time and thank you in advance to all who can help me on that.

Any difference if you adjust the thread pool size in pjsip.conf?

I tested and now i don’t see any error messages. However cli still freezes and pjsip is unresponsive. I found one interesting thing is that if i disable qualify on all extensions - it takes about 20-25 seconds to reload pjsip module (it stucks for this times) but after this everything works fine. It still takes more time to reload pjsip and to show all endpoints but at least it doesn’t die on reload.
Hope that helps anyone.

Hello devox, do you know if this problem was solved ?

I have this problem too, after some “reloads” in cli the pjsip dont process more calls.

I use version 15.4.1

Hi,

No. I reduced number of endpoints on servers and if i need more - i disable qualify, it seems that the issue is with many qualify requests at the same time.

Qualify support was rewritten and should be better in the next set of releases.

Thks for your answer.

How many endpoints ?

It was tested against 3000 and CPU usage was minimal.

What processor and memory in this server ?

It was my dev instance, so 4GB of RAM and an Intel i7-4790.

Ok tks for share information

Hello Jcolp,

Is there information what versions that would be and how soon?

Thank you.

It has been merged in and will be available in the next release, probably 2-4 weeks.

Great news! Thanks for sharing.

We are using the Freepbx Distro 7 using asterisk 13.19.1

Some of the times when we get high CPU spikes are:

  1. We perform a reload. The new dial plan is fed into asterisk and it processes it. Then the Freepbx portion is done and the cpu stabilizes for a few seconds. After that asterisk seems to rebuild the subscriptions/BLF’s and that causes the CPU to spike like crazy and stay crazy. Optimizing or splitting subscription processing to another core would really help to stabilize our asterisk. That main asterisk thread gets up to 200%+ usage during reloads now.
    a) Our system has a little under 1000 active extensions on one of our servers. Many of our phones are Yealinks and some of them are Yealink T29’s with 27 BLF keys and all 27 keys are filled with other BLF’s. So 20 phones with 27 subscriptions each. We have probably over 300 phones with 20 or more BLF keys each. Subscription processing gets massive so reloads are a major problem. During a reload BLF lights on yealinks will shut off for a few seconds on both Chan_SIP and PJSIP.
  2. With 1000 active extensions we have a massive CDR/CEL table, so perhaps it is causing part of this. When we have 30+ active calls being made, ended, transfered, etc I notice that CPU use jumps quite a bit with even 2-3 new calls. If 10 calls come in at once it can jump the asterisk thread from 50% to 100% use. It seems like new calls cause significant jumps in usage. However this might be a mix between asterisk and Freepbx. Thought I should at least let the asterisk devs know since this affects a system that depends on asterisk.
1 Like

Your Yealink phones support Resource List Subscriptions, you could define some or all of your BLF groups as RLS entries and save on updates that way.

That’s assuming FreePBX supports doing stuff with RLS and you are using chan_pjsip.

This is the first time I have heard of Resource List Subscriptions and I am using PJSIP on certain extensions, planning to convert all. Not sure if its on Freepbx yet.

Link I found to more info: https://wiki.asterisk.org/wiki/pages/viewpage.action?pageId=30278158

Hello @jcolp,

we have been running into the exactly same issues like @devox, since we migrated a bunch of devices to pjsip, Asterisk 13.13-cert6, FreePBX 14.0.13.40.
You mentioned that Qualify support was rewritten. In which version have the fixes been applied?

Such changes are not present in 13.13-cert6. Versions the change went in are listed on the JIRA issue[1].

[1] https://issues.asterisk.org/jira/browse/ASTERISK-26806