Asterisk High CPU Usage for 100+ Calls


Above images shows the current resource consumption of my server. The server is running Asterisk 18.2.0 with FastAGI and MySQL (all in the same server). Currently 100 calls runnig (200 Channels). Only g729 and g723 codecs are allowed (in pass-thru mode). But it consumes more than 30% CPU.
I need to handle about 600 concurrent calls (without codec translation) in this server. Where is the problem ? Why it consuming too much CPU ?

Also, I don’t need to run more than 160 concurrent calls per VM. Only I need to run 600 calls with multiple VMs (e.g. 4 VMs, 150 concurrent calls each).

But it is consuming a lot in a single VM. 100 calls → 35% CPU (7.9GHz out of 22.8GHz).

How can I handle 600 concurrent calls with 22.8GHz (6 x 3.8GHz) server with g729 pass-thru mode ?

You haven’t provided the sort of detail that would be needed to judge where your bottlenecks are, or even to verify that you are not frustrating codec pass through.

My involvement was developing for function, not operation and sizing, so I probably can’t provide a good answer, even with that detail, but I can’t see anyone doing so without details of your dialplan, AGI, and database usage.

pjsip show channelstats shows all channels are using g729 codecs. But yet it consumes high CPU.

Dialplan:

[general]
static=yes
writeprotect=no
autofallthrough=yes
clearglobalvars=no


[globals]

[dialer]
exten => _X.,1,Goto(f1context,${EXTEN},1)

[f1context]
exten => _X.,1,NoOp(${CALLER_USERNAME} from ${CHANNEL(pjsip,remote_addr)})
 same => n,AGI(agi://127.0.0.1/auth.php)
 same => n,Set(IAXVAR(route)=${ROUTE})
 same => n,NoOp(${IAXVAR(route)})
 same => n,GotoIf($["${DIALSTR}" = ""]?noroute)
 same => n(route),Dial(${DIALSTR},,U(answer^${CALLID}))
 same => n(noroute),NoOP("No Dial String")
exten => h,1,AGI(agi://127.0.0.1/cdr.php,${CDR(uniqueid)})

[answer]
exten => s,1,Set(theCallID=${ARG1})
 same => n,AGI(agi://127.0.0.1/activecall.php)
 same => n,Return()

Your htop doesn’t look all that bad to me. I think it’s difficult to extrapolate performance reliably from a snapshot. If you scale up your call volume to 200 or 300 how does it look? The difference may be insightful.

From the Asterisk CLI, if you enter core show channel <x> does the section that looks like:

  NativeFormats: (ulaw)
    WriteFormat: ulaw
     ReadFormat: ulaw
 WriteTranscode: No 
  ReadTranscode: No 

contain the values you expect?

Beyond that wild guess, more details like what you are actually trying to do, why you are running Asterisk in a VM, why are you running multiple VMs, what do your FastAGIs do, etc. would be helpful.

channel details shows:

    State: Up (6)
  NativeFormats: (g729)
    WriteFormat: g729
     ReadFormat: g729
 WriteTranscode: No 
  ReadTranscode: No 

My PHP-AGI (FastAGI with xinetd) checks for balance, call limit etc. Then it looks for appropriate outgoing endpoint to be dialed (out of maximum of 30 endpoints) based on the prefix. All these things are retrieved from a MariaDB server (in the same host). Finally it sets DIALSTR variable and returns to dialplan; then it is dialed. Also all outgoing endpoints are behind OpenVPN (UDP) running in the same host.

Here is my pjsip.conf

[global]
type = global
user_agent = Asterisk PBX

[transport-udp]
type = transport
protocol = udp
bind = 0.0.0.0:5060

#include "pjsip_terminations.conf"
#include "pjsip_originations.conf"
#include "pjsip_dialers.conf"

pjsip_terminations.conf

[gw-EJ_21]
type = aor
contact = sip:192.168.101.3:5060

[gw-EJ_21]
type = identify
endpoint = gw-EJ_21
match = 192.168.101.3

[gw-EJ_21]
type = endpoint
timers = no
;context = EJ_21
context = f1context
dtmf_mode = rfc4733
disallow = all
allow = g729
allow = g723
;allow = alaw
;allow = ulaw
rtp_symmetric = yes
rewrite_contact = yes
direct_media = no
trust_id_inbound = yes
send_rpid = yes
call_group = 1
pickup_group = 1
sdp_owner = root
sdp_session = Asterisk PBX
aors = gw-EJ_21
;devicestate_busy_at = 32



[gw-EJ_22]
type = aor
contact = sip:192.168.101.4:5060

[gw-EJ_22]
type = identify
endpoint = gw-EJ_22
match = 192.168.101.4

[gw-EJ_22]
type = endpoint
timers = no
;context = EJ_22
context = f1context
dtmf_mode = rfc4733
disallow = all
allow = g729
allow = g723
;allow = alaw
;allow = ulaw
rtp_symmetric = yes
rewrite_contact = yes
direct_media = no
trust_id_inbound = yes
send_rpid = yes
call_group = 1
pickup_group = 1
sdp_owner = root
sdp_session = Asterisk PBX
aors = gw-EJ_22
;devicestate_busy_at = 32



[gw-ET_23]
type = aor
contact = sip:192.168.101.5:5060

[gw-ET_23]
type = identify
endpoint = gw-ET_23
match = 192.168.101.5

[gw-ET_23]
type = endpoint
timers = no
;context = ET_23
context = f1context
dtmf_mode = rfc4733
disallow = all
allow = g729
allow = g723
;allow = alaw
;allow = ulaw
rtp_symmetric = yes
rewrite_contact = yes
direct_media = no
trust_id_inbound = yes
send_rpid = yes
call_group = 1
pickup_group = 1
sdp_owner = root
sdp_session = Asterisk PBX
aors = gw-ET_23
;devicestate_busy_at = 32



[gw-ET_24]
type = aor
contact = sip:192.168.101.6:5060

[gw-ET_24]
type = identify
endpoint = gw-ET_24
match = 192.168.101.6

[gw-ET_24]
type = endpoint
timers = no
;context = ET_24
context = f1context
dtmf_mode = rfc4733
disallow = all
allow = g729
allow = g723
;allow = alaw
;allow = ulaw
rtp_symmetric = yes
rewrite_contact = yes
direct_media = no
trust_id_inbound = yes
send_rpid = yes
call_group = 1
pickup_group = 1
sdp_owner = root
sdp_session = Asterisk PBX
aors = gw-ET_24
;devicestate_busy_at = 32



[gw-ET_25]
type = aor
contact = sip:192.168.101.8:5060

[gw-ET_25]
type = identify
endpoint = gw-ET_25
match = 192.168.101.8

[gw-ET_25]
type = endpoint
timers = no
;context = ET_25
context = f1context
dtmf_mode = rfc4733
disallow = all
allow = g729
allow = g723
;allow = alaw
;allow = ulaw
rtp_symmetric = yes
rewrite_contact = yes
direct_media = no
trust_id_inbound = yes
send_rpid = yes
call_group = 1
pickup_group = 1
sdp_owner = root
sdp_session = Asterisk PBX
aors = gw-ET_25
;devicestate_busy_at = 32



[gw-ET_26]
type = aor
contact = sip:192.168.101.9:5060

[gw-ET_26]
type = identify
endpoint = gw-ET_26
match = 192.168.101.9

[gw-ET_26]
type = endpoint
timers = no
;context = ET_26
context = f1context
dtmf_mode = rfc4733
disallow = all
allow = g729
allow = g723
;allow = alaw
;allow = ulaw
rtp_symmetric = yes
rewrite_contact = yes
direct_media = no
trust_id_inbound = yes
send_rpid = yes
call_group = 1
pickup_group = 1
sdp_owner = root
sdp_session = Asterisk PBX
aors = gw-ET_26
;devicestate_busy_at = 32



[gw-EJ_27]
type = aor
contact = sip:192.168.101.10:5060

[gw-EJ_27]
type = identify
endpoint = gw-EJ_27
match = 192.168.101.10

[gw-EJ_27]
type = endpoint
timers = no
;context = EJ_27
context = f1context
dtmf_mode = rfc4733
disallow = all
allow = g729
allow = g723
;allow = alaw
;allow = ulaw
rtp_symmetric = yes
rewrite_contact = yes
direct_media = no
trust_id_inbound = yes
send_rpid = yes
call_group = 1
pickup_group = 1
sdp_owner = root
sdp_session = Asterisk PBX
aors = gw-EJ_27
;devicestate_busy_at = 32



[gw-EJ_28]
type = aor
contact = sip:192.168.101.11:5060

[gw-EJ_28]
type = identify
endpoint = gw-EJ_28
match = 192.168.101.11

[gw-EJ_28]
type = endpoint
timers = no
;context = EJ_28
context = f1context
dtmf_mode = rfc4733
disallow = all
allow = g729
allow = g723
;allow = alaw
;allow = ulaw
rtp_symmetric = yes
rewrite_contact = yes
direct_media = no
trust_id_inbound = yes
send_rpid = yes
call_group = 1
pickup_group = 1
sdp_owner = root
sdp_session = Asterisk PBX
aors = gw-EJ_28
;devicestate_busy_at = 32



[gw-EJ_29]
type = aor
contact = sip:192.168.101.16:5060

[gw-EJ_29]
type = identify
endpoint = gw-EJ_29
match = 192.168.101.16

[gw-EJ_29]
type = endpoint
timers = no
;context = EJ_29
context = f1context
dtmf_mode = rfc4733
disallow = all
allow = g729
allow = g723
;allow = alaw
;allow = ulaw
rtp_symmetric = yes
rewrite_contact = yes
direct_media = no
trust_id_inbound = yes
send_rpid = yes
call_group = 1
pickup_group = 1
sdp_owner = root
sdp_session = Asterisk PBX
aors = gw-EJ_29
;devicestate_busy_at = 32



[gw-EJ_30]
type = aor
contact = sip:192.168.101.17:5060

[gw-EJ_30]
type = identify
endpoint = gw-EJ_30
match = 192.168.101.17

[gw-EJ_30]
type = endpoint
timers = no
;context = EJ_30
context = f1context
dtmf_mode = rfc4733
disallow = all
allow = g729
allow = g723
;allow = alaw
;allow = ulaw
rtp_symmetric = yes
rewrite_contact = yes
direct_media = no
trust_id_inbound = yes
send_rpid = yes
call_group = 1
pickup_group = 1
sdp_owner = root
sdp_session = Asterisk PBX
aors = gw-EJ_30
;devicestate_busy_at = 32

pjsip_originations.conf

[ori-E-sysTem]
type = identify
endpoint = ori-E-sysTem
match = 209.x.y.z

[ori-E-sysTem]
type = endpoint
timers = no
set_var=CALLER_USERNAME=E-sysTem
;context = E-sysTem
context = f1context
dtmf_mode = rfc4733
disallow = all
allow = g729
allow = g723
;allow = alaw
;allow = ulaw
direct_media = no
trust_id_inbound = yes
send_rpid = yes
sdp_owner = root
sdp_session = Asterisk PBX

pjsip_dialer.conf is empty.

I don’t think FastAGI/phpagi.php does what you want.

It does not avoid the ‘startup cost’ of parsing your PHP script nor the process creation cost for each invocation nor the MySQL connection overhead.

What you describe should be an insignificant load for MySQL.

How many calls per second (meaning AGI invocations / MySQL lookups) are you executing?

My AGI execution is as shown bellow:

  1. During incoming call (before Dial() function): auth.php
    • 7 SELECT Queries (each table size not more than 50 rows).
    • 1 SELECT COUNT() query (table size == number of active calls).
    • 1 INSERT
  2. When a dialed call is answerd: activecall.php
    • 1 SQL UPDATE query with unique key (table size == number of active calls).
  3. When a call is hangs up: cdr.php
    • 3 SELECT quries (table size <= 30 rows)
    • 5 UPDATE (with unique key).
    • 1 INSERT
    • 1 DELETE (with unique key)
    • 1 DELETE (with datetime comparison, table size == num of currently active calls).

Your MySQL doesn’t sound like it is too demanding. I’m kind of surprised the MySQL server is consuming 5.9% CPU.

How many calls per second are you processing?

When active call goes 100+ and 12-15 calls per second hits, Asterisk consumes too much CPU. MySQL consumption creates spike sometimes but not consistently.

Are you experiencing problems (audio quality or processing delays) or are you concerned because you see 40% with 100 calls and want to scale to 6x that level?

Isn’t that 40% of 800% available (8 cores * 100%)?

For comparison, right now I’m encoding a video. htop shows HandBrake is using 435% of my 8 core host. I also have 400 tabs open on Chromium and the host is still responsive and ready for more :slight_smile:

I’d scale the call volume up and see how your load distributes across cores. You may be able to handle a lot more than you think.

1 Like

Main problem is, (in high active calls and > 12 Calls Per Second) SIP qualify packets are lost or too much delayed that asterisk treats most of the outgoing endpoints offline (or lagged at-least). To overcome this I have disabled Qualify and check their status (using another system service) periodically by nmap which uses ICMP. But even nmap pings are too late (e.g. 300-700 ms rtt) although real (normal) ping time is <70 ms rtt. It also hampers audio packets resulting low Average Call Duration (ACD < 3 minutes) where normal ACD > 4 minutes.

Your htop picture shows fairly high amounts of time spent in kernel threads. That may have many causes. Since you are using a virtual machine, the device emulation could play a role.

One of the things to look at is disk usage. I’ve never needed to change things for Asterisk, but for Video servers I need to enable buffering as much as possible, i.e. I need to use the “unsafe” buffer mode for disks (qemu/kvm). I am not suggesting to do that in a production environment, but you could, whether there is a significant influence.

WAG…

Any chance you have a munched Ethernet cable causing retransmissions?