Above images shows the current resource consumption of my server. The server is running Asterisk 18.2.0 with FastAGI and MySQL (all in the same server). Currently 100 calls runnig (200 Channels). Only g729 and g723 codecs are allowed (in pass-thru mode). But it consumes more than 30% CPU.
I need to handle about 600 concurrent calls (without codec translation) in this server. Where is the problem ? Why it consuming too much CPU ?
Also, I don’t need to run more than 160 concurrent calls per VM. Only I need to run 600 calls with multiple VMs (e.g. 4 VMs, 150 concurrent calls each).
But it is consuming a lot in a single VM. 100 calls → 35% CPU (7.9GHz out of 22.8GHz).
How can I handle 600 concurrent calls with 22.8GHz (6 x 3.8GHz) server with g729 pass-thru mode ?
You haven’t provided the sort of detail that would be needed to judge where your bottlenecks are, or even to verify that you are not frustrating codec pass through.
My involvement was developing for function, not operation and sizing, so I probably can’t provide a good answer, even with that detail, but I can’t see anyone doing so without details of your dialplan, AGI, and database usage.
Your htop doesn’t look all that bad to me. I think it’s difficult to extrapolate performance reliably from a snapshot. If you scale up your call volume to 200 or 300 how does it look? The difference may be insightful.
From the Asterisk CLI, if you enter core show channel <x> does the section that looks like:
NativeFormats: (ulaw)
WriteFormat: ulaw
ReadFormat: ulaw
WriteTranscode: No
ReadTranscode: No
contain the values you expect?
Beyond that wild guess, more details like what you are actually trying to do, why you are running Asterisk in a VM, why are you running multiple VMs, what do your FastAGIs do, etc. would be helpful.
State: Up (6)
NativeFormats: (g729)
WriteFormat: g729
ReadFormat: g729
WriteTranscode: No
ReadTranscode: No
My PHP-AGI (FastAGI with xinetd) checks for balance, call limit etc. Then it looks for appropriate outgoing endpoint to be dialed (out of maximum of 30 endpoints) based on the prefix. All these things are retrieved from a MariaDB server (in the same host). Finally it sets DIALSTR variable and returns to dialplan; then it is dialed. Also all outgoing endpoints are behind OpenVPN (UDP) running in the same host.
Here is my pjsip.conf
[global]
type = global
user_agent = Asterisk PBX
[transport-udp]
type = transport
protocol = udp
bind = 0.0.0.0:5060
#include "pjsip_terminations.conf"
#include "pjsip_originations.conf"
#include "pjsip_dialers.conf"
When active call goes 100+ and 12-15 calls per second hits, Asterisk consumes too much CPU. MySQL consumption creates spike sometimes but not consistently.
Are you experiencing problems (audio quality or processing delays) or are you concerned because you see 40% with 100 calls and want to scale to 6x that level?
Isn’t that 40% of 800% available (8 cores * 100%)?
For comparison, right now I’m encoding a video. htop shows HandBrake is using 435% of my 8 core host. I also have 400 tabs open on Chromium and the host is still responsive and ready for more
I’d scale the call volume up and see how your load distributes across cores. You may be able to handle a lot more than you think.
Main problem is, (in high active calls and > 12 Calls Per Second) SIP qualify packets are lost or too much delayed that asterisk treats most of the outgoing endpoints offline (or lagged at-least). To overcome this I have disabled Qualify and check their status (using another system service) periodically by nmap which uses ICMP. But even nmappings are too late (e.g. 300-700 ms rtt) although real (normal) ping time is <70 ms rtt. It also hampers audio packets resulting low Average Call Duration (ACD < 3 minutes) where normal ACD > 4 minutes.
Your htop picture shows fairly high amounts of time spent in kernel threads. That may have many causes. Since you are using a virtual machine, the device emulation could play a role.
One of the things to look at is disk usage. I’ve never needed to change things for Asterisk, but for Video servers I need to enable buffering as much as possible, i.e. I need to use the “unsafe” buffer mode for disks (qemu/kvm). I am not suggesting to do that in a production environment, but you could, whether there is a significant influence.