[SOLVED] Asterisk service stopping when system under load

I am running Asterisk 15.1.3 on CentOS 7. I have a dedicated server with an 8 Core Xeon Processor, 24GB of RAM (15.5GB Memory / 8.5GB Swap), 1 TB HDD.

I am running into issues where the Asterisk service is stopping multiple times per day, particularily when the system is under load. The system will get as high as 400 channels or 200 simultaneous calls. It’s happening so often that I have had to install Monit to automatically restart Asterisk when this happens.

I have also increased the soft and hard limits for the Asterisk process to 524280 as we were running into a file limits issue previously.

Does anyone have any pointers as to what could be causing this? I’m not sure if Asterisk just isn’t stable when handling this many calls at once, or if I need to up my system resources or what. The logs from /var/log/asterisk/messages don’t really give an indication as to what’s happening when the service stops and starts back up.

Some log examples:
https://pastebin.com/mamg3Uj3
https://pastebin.com/eawXBHQq

Perhaps I need to turn logging/debugging on at a higher verbosity?

Any help with this would be greatly appreciated as I’m not really sure what would cause Asterisk to stop so many times a day, mainly when the system is under high call volumes.

Thank you!

You appear to be using Asterisk 15.1.3, and not Asterisk 13 as your post states. Secondly I’d suggest always trying the latest version of Asterisk as we do fix problems. Third by “stopping” do you mean it is crashing? If so the wiki has a guide[1] on how to extract a backtrace to see where.

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

Hi Jcolp, thank you for your reply.

I have edited my Asterisk version in my original post. Thanks for pointing that out.

I can look into performing an Asterisk upgrade. I’m honestly not sure if the Asterisk service is crashing or stopping, I will look at the wiki article and see if I can gather more information.

Thank you.

I am unable to get a back trace… I’m sure I’m doing something wrong… assistance would be appreciated.

I have verified that Asterisk is running with a -g flag.
GDB is installed on my CentOS7 server.

However when I try to run ast_coredumper, it says no cores found. If I run a sysctl -n kernal.core_pattern the output is: core

Can someone please point me in the right direction of how I can successfully obtain a back trace?

Thank you!

Bump… Can someone please assist me in obtaining a backtrace?

There are a number of reasons why ast_coredumper may have produced nothing:

It could be a deadlock, rather than a crash, which is what you were asked before. See the deadlock section of https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

for how to proceed.

You might not have an adequate ulimit for your core file.

You might be running in a directory that is not writeable by the Asterisk user if you are running non-root.

David,

Thank you for your reply!

I believe Asterisk is crashing, rather than a deadlock happening based on what I read here: https://www.voip-info.org/asterisk-deadlock/ - Mainly because Asterisk is crashing and exiting (and the service is stopping all together and having to be restarted).

I am running as root in the command line so permissions wouldn’t be the issue.

My soft and hard ulimit for Asterisk is configured for 524280 – is this large enough or should I increase it?

[root@68-168-108-26 asterisk]# cat /proc/25020/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             63266                63266                processes
Max open files            524280               524280               files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       63266                63266                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

Thank you for your help in assisting me gaining a backtrace to figure out why Asterisk crashing when under a high call load.

You’ll have difficulty breaking that limit. It was the core file size I was thinking about.

Are you running Asterisk as root? Otherwise the system might consider taking a core dump a security risk? Is the working directory on a large enough filiesystem.

Are you sure that something isn’t requesting a normal stop (will show in the logs)?

Do you get a core dump if you deliberately kill it with kill -3?

Hi David-

Asterisk is running as root. I don’t believe anything is requesting a normal stop, the stop tends to happen when the system is under a high call volume (200+ concurrent calls). I posted samples of log files from /var/log/asterisk/messages in my initial post if you wouldn’t mind reviewing those… nothing sticks out as to why Asterisk would be stopping.

I’m not sure how to deliberately kill it with -3, if you could give me an example that would be great.

Thank you.

First log:

[Aug  8 06:01:32] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:32] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:33] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:33] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:34] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:34] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:35] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:35] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:36] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:36] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:37] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:37] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:38] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:38] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:39] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:39] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:40] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:40] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:41] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:41] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:42] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:42] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:43] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:43] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:57] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:58] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:58] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:59] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:01:59] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:00] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:00] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:01] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:01] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:02] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:02] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:03] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:03] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:04] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:04] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:04] WARNING[15165][C-000016cc] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:04] WARNING[15165][C-000016cc] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:04] WARNING[15165][C-000016cc] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:05] WARNING[15165][C-000016cc] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:05] WARNING[15165][C-000016cc] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:05] WARNING[15165][C-000016cc] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:05] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:05] WARNING[6757][C-00000c82] chan_sip.c: Can't send 10 type frames with SIP write
[Aug  8 06:02:17] Asterisk 15.1.3 built by root @ 68-168-108-26.dedicated.codero.net on a x86_64 running Linux on 2017-12-04 05:38:33 UTC
[Aug  8 06:02:17] NOTICE[15180] cdr.c: CDR simple logging enabled.
[Aug  8 06:02:17] NOTICE[15180] loader.c: 281 modules will be loaded.
[Aug  8 06:02:17] WARNING[15180] res_phoneprov.c: Unable to find a valid server address or name.
[Aug  8 06:02:17] ERROR[15180] ari/config.c: No configured users for ARI
[Aug  8 06:02:17] WARNING[15180] loader.c: Error loading module 'func_pjsip_aor.so': /usr/lib/asterisk/modules/func_pjsip_aor.so: undefined symbol: ast_sip_location_retrieve_aor_contacts
[Aug  8 06:02:17] WARNING[15180] loader.c: Module 'func_pjsip_aor.so' could not be loaded.
[Aug  8 06:02:17] WARNING[15180] loader.c: Error loading module 'func_pjsip_endpoint.so': /usr/lib/asterisk/modules/func_pjsip_endpoint.so: undefined symbol: ast_sip_get_sorcery
[Aug  8 06:02:17]

Second is similar but longer.

Does look like a crash, in which case I think you need to look at what directory was used to start Asterisk, and how much space is on the filesystem. It could be a small temp filesystem.

Alternatively actually start Asterisk with the -d (I think) option, under gdb. It will not restart on its own, if you do this. You will have to run the backtraces, or at least write the core dump, before fully terminating it and restarting it.

Also check for resource leaks (memory and file descriptors, in particular, e.g. top and lsof).

Incidentally, frame type 10 is comfort noise. You will reduce the log noise if you can turn that feature off at the sending end.

Hi David-

Asterisk is starting using the following command: /usr/sbin/asterisk -gd

The filesystem is running on a 1TB HDD which is only 1% utilized, so I don’t believe filesystem space would be an issue.

I don’t believe we have a resource leak either, although, I haven’t been monitoring top/htop during high loads to confirm this. I will do this tomorrow to be completely sure.

I have turned off WARNINGS in the Asterisk logger to get rid of some log noise. I’ve also turned on debugging mode to see if I can get some more output when a crash occurs.

Thanks for any other suggestions I could use to understand why Asterisk is crashing multiple times a day.

Most Linux systems have multiple file systems. Some are purely in RAM. Are you sure that the initial director for Asterisk was in the large filesystem.

(My big problem here is that, whilst, with various versions, I’ve had many crashes, I’ve never had problems in finding a crash dump. As such, I’ve no experience as to why one might not get one.)

Most Linux systems have multiple file systems. Some are purely in RAM. Are you sure that the initial director for Asterisk was in the large filesystem.

I’m not sure, how would I find out?

I’m pretty new to in depth Asterisk troubleshooting of this nature (obviously)… My issue here is that Asterisk is crashing multiple times a day when the system has a high call volume of 200+ concurrent calls. Per the recommendation of JColp, I am attempting to pull a backtrace so that someone can shed some light as to why this is happening.

Per the wiki article, Asterisk is running with a -g flag, so it should produce a core when a crash occurs, but that apparently isn’t happening:

[root@68-168-108-26 asterisk]# /var/lib/asterisk/scripts/ast_coredumper core
No coredumps found

I’m just not sure on what to do at this point as I am unable to pull a back trace apparently to help troubleshoot this issue.

I believe core files when created are saved to the /tmp directory? That directory on my server is empty… so it appears cores aren’t being created when Asterisk crashes?


[root@68-168-108-26 asterisk]# sysctl -n kernel.core_pattern      
core

**EDIT: I have found where the core files are being created… Thank you for your help David. Please standby and I will post a relevant core file once a real crash happens again.

Ok, so we had a crash just a few minutes after my last post and the system wasn’t under heavy load.

The crash produced a core and I used ast_coredumper to produce the text files. The full text file is located here: https://www.dropbox.com/s/beiz330g1r0ei35/core.14585-full.txt?dl=0

Is there anything else I need to post? Can someone please help me examine this file to figure out what’s happening?

Thank you!

Your problem is one that was already fixed in a later version[1].

[1] https://issues.asterisk.org/jira/browse/ASTERISK-27488

Thank you – I will update to the latest stable version and see if our problem goes away.

Upgrading to 15.5.0 appears to have solved our issue. Thank you Jcolp!