I am running Asterisk 15.1.3 on CentOS 7. I have a dedicated server with an 8 Core Xeon Processor, 24GB of RAM (15.5GB Memory / 8.5GB Swap), 1 TB HDD.
I am running into issues where the Asterisk service is stopping multiple times per day, particularily when the system is under load. The system will get as high as 400 channels or 200 simultaneous calls. It’s happening so often that I have had to install Monit to automatically restart Asterisk when this happens.
I have also increased the soft and hard limits for the Asterisk process to 524280 as we were running into a file limits issue previously.
Does anyone have any pointers as to what could be causing this? I’m not sure if Asterisk just isn’t stable when handling this many calls at once, or if I need to up my system resources or what. The logs from /var/log/asterisk/messages don’t really give an indication as to what’s happening when the service stops and starts back up.
Perhaps I need to turn logging/debugging on at a higher verbosity?
Any help with this would be greatly appreciated as I’m not really sure what would cause Asterisk to stop so many times a day, mainly when the system is under high call volumes.
You appear to be using Asterisk 15.1.3, and not Asterisk 13 as your post states. Secondly I’d suggest always trying the latest version of Asterisk as we do fix problems. Third by “stopping” do you mean it is crashing? If so the wiki has a guide[1] on how to extract a backtrace to see where.
I have edited my Asterisk version in my original post. Thanks for pointing that out.
I can look into performing an Asterisk upgrade. I’m honestly not sure if the Asterisk service is crashing or stopping, I will look at the wiki article and see if I can gather more information.
I believe Asterisk is crashing, rather than a deadlock happening based on what I read here: https://www.voip-info.org/asterisk-deadlock/ - Mainly because Asterisk is crashing and exiting (and the service is stopping all together and having to be restarted).
I am running as root in the command line so permissions wouldn’t be the issue.
My soft and hard ulimit for Asterisk is configured for 524280 – is this large enough or should I increase it?
[root@68-168-108-26 asterisk]# cat /proc/25020/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size unlimited unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 63266 63266 processes
Max open files 524280 524280 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 63266 63266 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Thank you for your help in assisting me gaining a backtrace to figure out why Asterisk crashing when under a high call load.
You’ll have difficulty breaking that limit. It was the core file size I was thinking about.
Are you running Asterisk as root? Otherwise the system might consider taking a core dump a security risk? Is the working directory on a large enough filiesystem.
Are you sure that something isn’t requesting a normal stop (will show in the logs)?
Do you get a core dump if you deliberately kill it with kill -3?
Asterisk is running as root. I don’t believe anything is requesting a normal stop, the stop tends to happen when the system is under a high call volume (200+ concurrent calls). I posted samples of log files from /var/log/asterisk/messages in my initial post if you wouldn’t mind reviewing those… nothing sticks out as to why Asterisk would be stopping.
I’m not sure how to deliberately kill it with -3, if you could give me an example that would be great.
Does look like a crash, in which case I think you need to look at what directory was used to start Asterisk, and how much space is on the filesystem. It could be a small temp filesystem.
Alternatively actually start Asterisk with the -d (I think) option, under gdb. It will not restart on its own, if you do this. You will have to run the backtraces, or at least write the core dump, before fully terminating it and restarting it.
Also check for resource leaks (memory and file descriptors, in particular, e.g. top and lsof).
Incidentally, frame type 10 is comfort noise. You will reduce the log noise if you can turn that feature off at the sending end.
Asterisk is starting using the following command: /usr/sbin/asterisk -gd
The filesystem is running on a 1TB HDD which is only 1% utilized, so I don’t believe filesystem space would be an issue.
I don’t believe we have a resource leak either, although, I haven’t been monitoring top/htop during high loads to confirm this. I will do this tomorrow to be completely sure.
I have turned off WARNINGS in the Asterisk logger to get rid of some log noise. I’ve also turned on debugging mode to see if I can get some more output when a crash occurs.
Thanks for any other suggestions I could use to understand why Asterisk is crashing multiple times a day.
Most Linux systems have multiple file systems. Some are purely in RAM. Are you sure that the initial director for Asterisk was in the large filesystem.
(My big problem here is that, whilst, with various versions, I’ve had many crashes, I’ve never had problems in finding a crash dump. As such, I’ve no experience as to why one might not get one.)
Most Linux systems have multiple file systems. Some are purely in RAM. Are you sure that the initial director for Asterisk was in the large filesystem.
I’m not sure, how would I find out?
I’m pretty new to in depth Asterisk troubleshooting of this nature (obviously)… My issue here is that Asterisk is crashing multiple times a day when the system has a high call volume of 200+ concurrent calls. Per the recommendation of JColp, I am attempting to pull a backtrace so that someone can shed some light as to why this is happening.
Per the wiki article, Asterisk is running with a -g flag, so it should produce a core when a crash occurs, but that apparently isn’t happening:
[root@68-168-108-26 asterisk]# /var/lib/asterisk/scripts/ast_coredumper core
No coredumps found
I’m just not sure on what to do at this point as I am unable to pull a back trace apparently to help troubleshoot this issue.
I believe core files when created are saved to the /tmp directory? That directory on my server is empty… so it appears cores aren’t being created when Asterisk crashes?
**EDIT: I have found where the core files are being created… Thank you for your help David. Please standby and I will post a relevant core file once a real crash happens again.