Program terminated with signal SIGSEGV, Segmentation fault

We are having a weird issue with safe asterisk reloading itself and need a little help being pointed in the right direction to troubleshoot this. We are running Asterisk 11.21.2. It appears that during this time of some kind of process overload the system caused 504 Server Timeout messages to be sent on SIP requests. Any ideas of how to go about actually finding the cause would be helpful as we don’t really see much in the messages logging that stands out? We are assuming it is some call getting stuck in a loop and I have added additional logging to catch it next time, but we are not for exactly sure that is what is causing this.

During this time the CPU spikes to 100% along with the memory which i’m assuming which is what causes the below core dump eventually and Asterisk restarts itself.

Below is the output we found in the messages file.

[2016-06-02 09:35:25] Asterisk 11.21.2 built by root @ sfsw25 on a x86_64 running Linux on 2016-02-24 07:00:06 UTC
[2016-06-02 09:35:25] NOTICE[846] loader.c: 3 modules will be loaded.
[2016-06-02 09:35:25] NOTICE[846] res_odbc.c: Connecting asteriskcdrdb-odbc
[2016-06-02 09:35:25] NOTICE[846] res_odbc.c: res_odbc: Connected to asteriskcdrdb-odbc [asteriskcdrdb]
[2016-06-02 09:35:25] NOTICE[846] res_odbc.c: Registered ODBC class ‘asteriskcdrdb-odbc’ dsn->[asteriskcdrdb]
[2016-06-02 09:35:25] NOTICE[846] res_odbc.c: Connecting sfswcore-write-odbc
[2016-06-02 09:35:25] NOTICE[846] res_odbc.c: res_odbc: Connected to sfswcore-write-odbc [sfswcorewrite]
[2016-06-02 09:35:25] NOTICE[846] res_odbc.c: Registered ODBC class ‘sfswcore-write-odbc’ dsn->[sfswcorewrite]
[2016-06-02 09:35:25] NOTICE[846] res_odbc.c: Connecting sfswcore-read-odbc
[2016-06-02 09:35:25] NOTICE[846] res_odbc.c: res_odbc: Connected to sfswcore-read-odbc [sfswcoreread]
[2016-06-02 09:35:25] NOTICE[846] res_odbc.c: Registered ODBC class ‘sfswcore-read-odbc’ dsn->[sfswcoreread]
[2016-06-02 09:35:25] NOTICE[846] res_odbc.c: Connecting qstats-odbc
[2016-06-02 09:35:25] NOTICE[846] res_odbc.c: res_odbc: Connected to qstats-odbc [qstats]
[2016-06-02 09:35:25] NOTICE[846] res_odbc.c: Registered ODBC class ‘qstats-odbc’ dsn->[qstats]
[2016-06-02 09:35:25] NOTICE[846] res_odbc.c: res_odbc loaded.
[2016-06-02 09:35:25] NOTICE[846] config.c: Registered Config Engine odbc
[2016-06-02 09:35:25] NOTICE[846] config.c: Registered Config Engine mysql
[2016-06-02 09:35:25] NOTICE[846] cdr.c: CDR simple logging enabled.
[2016-06-02 09:35:25] NOTICE[846] loader.c: 217 modules will be loaded.
[2016-06-02 09:35:25] NOTICE[846] res_smdi.c: No SMDI interfaces are available to listen on, not starting SMDI listener.
[2016-06-02 09:35:25] NOTICE[846] config.c: Registered Config Engine curl
[2016-06-02 09:35:25] NOTICE[846] res_config_ldap.c: No directory user found, anonymous binding as default.
[2016-06-02 09:35:25] ERROR[846] res_config_ldap.c: No directory URL or host found.
[2016-06-02 09:35:25] ERROR[846] res_config_ldap.c: Cannot load LDAP RealTime driver.
[2016-06-02 09:35:25] NOTICE[846] config.c: Registered Config Engine sqlite3
[2016-06-02 09:35:25] WARNING[846] res_rtp_asterisk.c: Invalid STUN server address: stun.1.google.com:19302
[2016-06-02 09:35:25] WARNING[846] res_musiconhold.c: Cannot open dir /var/lib/asterisk/moh/goo01 or dir does not exist
[2016-06-02 09:35:25] NOTICE[846] chan_sip.c: The ‘username’ field for sip peers has been deprecated in favor of the term ‘defaultuser’

Core dump:

BFD: Warning: /tmp/core is truncated: expected core file size >= 2176118784, found: 2147479552.
[New LWP 32125]
[New LWP 3483]
[New LWP 28928]
[New LWP 3557]
[New LWP 3567]
[New LWP 3714]
[New LWP 3476]
[New LWP 3477]
[New LWP 32197]
[New LWP 3481]
[New LWP 3472]
[New LWP 32195]
[New LWP 32177]
[New LWP 30732]
[New LWP 32196]
[New LWP 3482]
[New LWP 27843]
[New LWP 3566]
[New LWP 3558]
[New LWP 30724]
[New LWP 3486]
[New LWP 3719]
[New LWP 31467]
[New LWP 3485]
[New LWP 28525]
[New LWP 31964]
[New LWP 29719]
[New LWP 3573]
[New LWP 31945]
[New LWP 3569]
[New LWP 31719]
[New LWP 31875]
[New LWP 3565]
[New LWP 3559]
[New LWP 27842]
[New LWP 3488]
[New LWP 3715]
[New LWP 28643]
[New LWP 31944]
[New LWP 3713]
[New LWP 10840]
[New LWP 29176]
[New LWP 31872]
[New LWP 3489]
[New LWP 18490]
[New LWP 31172]
[New LWP 31399]
[New LWP 3484]
[New LWP 31097]
[New LWP 31024]
[New LWP 19124]
[New LWP 3717]
[New LWP 3570]
[New LWP 22751]
[New LWP 25467]
[New LWP 3577]
[New LWP 31722]
[New LWP 31980]
[New LWP 31987]
[New LWP 29331]
[New LWP 31871]
[New LWP 3564]
[New LWP 31496]
[New LWP 3574]
[New LWP 30639]
[New LWP 3568]
[New LWP 3487]
[New LWP 3571]
[New LWP 3716]
[New LWP 31169]
[New LWP 30252]
[New LWP 9931]
[New LWP 3480]
[New LWP 3474]
[New LWP 32124]
[New LWP 3563]
[New LWP 9513]
[New LWP 31387]
[New LWP 31780]
[New LWP 31246]
[New LWP 3475]
[New LWP 3556]
[New LWP 3578]
[New LWP 3572]
[New LWP 31941]
[New LWP 31949]
[New LWP 29726]
[New LWP 31643]
[New LWP 32121]
[New LWP 31173]
[New LWP 3562]
[New LWP 19943]
[New LWP 30249]
[New LWP 32194]
[New LWP 31428]
[New LWP 30036]
[New LWP 3718]
[New LWP 3579]
[New LWP 30555]
[New LWP 31231]
[New LWP 20108]
[New LWP 31950]
[New LWP 30178]
[New LWP 31946]
[New LWP 3561]
[New LWP 31096]
[New LWP 31972]
[New LWP 31392]
[New LWP 29481]
[New LWP 25819]
[New LWP 31164]
[New LWP 3720]
[New LWP 3721]
[New LWP 3473]
[New LWP 31961]
[New LWP 31942]
[New LWP 19533]
[New LWP 3560]
[New LWP 32123]
[New LWP 30812]
[New LWP 30247]
[New LWP 19834]
[New LWP 27527]
[New LWP 31391]
[New LWP 31791]
[New LWP 31981]
[New LWP 31570]
[New LWP 7706]
[New LWP 31876]
[New LWP 31968]
[New LWP 31249]
[New LWP 30106]
[New LWP 31408]
[New LWP 21081]
[New LWP 29254]
[New LWP 31708]
[New LWP 27170]
[New LWP 32192]
[New LWP 30552]
[New LWP 30033]
[New LWP 31970]
[New LWP 19607]
[New LWP 30037]
[New LWP 30807]
[New LWP 23504]
[New LWP 6877]
[New LWP 20452]
[New LWP 15808]
[New LWP 31404]
[New LWP 19435]
[New LWP 26172]
[New LWP 31711]
[New LWP 32120]
[New LWP 28230]
[New LWP 28529]
[New LWP 21851]
[New LWP 31954]
[New LWP 29704]
[New LWP 3479]
[New LWP 31966]
[New LWP 32082]
[New LWP 31413]
[New LWP 23409]
[New LWP 31960]
[New LWP 31642]
[New LWP 30719]
[New LWP 31178]
[New LWP 31417]
[New LWP 31497]
[New LWP 32053]
[New LWP 31447]
[New LWP 31965]
Failed to read a valid object file image from memory.
Core was generated by `/usr/sbin/asterisk -f -t -vvvg -c’.
Program terminated with signal SIGSEGV, Segmentation fault.
"#0 0x00007f5d0d7dde60 in ?? ()"

(gdb) where
"#0 0x00007f5d0d7dde60 in ?? ()
"#1 0x00007f5d0dce7642 in ?? ()
"#2 0x00007f5bbaddc4e8 in ?? ()
"#3 0x00007f5bbc016920 in ?? ()
"#4 0x00007f5bbaddc4e8 in ?? ()
"#5 0x00007f5d0edc1520 in ?? ()
"#6 0x0000000000000000 in ?? ()

With an incomplete core file and no debug symbols, I’m not sure that one can get very far with this. I note that the memory image is a little over 2GB and the core file a little under 2GB. I wonder if you have a memory leak and a a filesystem that doesn’t support large files.

I would monitor memory usage and use gcore before it gets unmanageable. You will still need debug symbols, and, in practice, a non-optimised build to get very far.

1 Like

Thanks for the reply and help.

I did confirm the system does support LFS.

root@sfsw25:/etc/asterisk# df -T
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/sda2 ext4 2845439720 299696312 2401180260 12% /
none tmpfs 4 0 4 0% /sys/fs/cgroup
udev devtmpfs 18520176 4 18520172 1% /dev
tmpfs tmpfs 3706192 1036 3705156 1% /run
none tmpfs 5120 0 5120 0% /run/lock
none tmpfs 18530952 0 18530952 0% /run/shm
none tmpfs 102400 0 102400 0% /run/user
10.5.30.20:/var/spool/asterisk/monitor nfs 264352768 234880000 16020480 94% /var/spool/asterisk/monitor

root@sfsw25:/etc/asterisk# file -sL /dev/sda2
/dev/sda2: Linux rev 1.0 ext4 filesystem data, UUID=0d45bfc5-3960-46eb-ad76-d4a91f684846 (needs journal recovery) (extents) (large files) (huge files)

Could you help me with the gcore part? I know some Linux. but no where near a Linux system administrator. Don’t want to mess anything up and/or cause Asterisk to be killed/restarted during middle of the day. If this gcore is ran on the Asterisk process does that kill the process or anything? I have looked at the man pages, but purposefully being very careful to not cause another outage, at least not right now. Thanks for your time.

gcore will temporarily attach the debugger, request a dump and then detach. I It cause a short period in which Asterisk execution is paused, so it is fairly disruptive.

1 Like

I did just find this in the syslogs

Jun 2 09:32:45 sfsw25 kernel: [33754.980200] show_signal_msg: 108 callbacks suppressed
Jun 2 09:32:45 sfsw25 kernel: [33754.980207] asterisk[32125]: segfault at 4ec767 ip 00007f5d0d7dde60 sp 00007f5bbaddc388 error 7 in libmysqlclient.so.18.0.0[7f5d0d791000+2ae000]

The actual crash is in the database handling library, so outside of Asterisk itself.