Asterisk auto restart

vijo.jose · October 29, 2021, 9:02am

Hai All,

I have Asterisk 16.20.0 Installed on Centos 7 Server running chan_pjsip

I have the users connected on pjsip extension with TLS and I have also SIP connectivity configured on PJSIP with UDP. I have noticed that the asterisk restarted and I have received the below error in Linux messages.

Oct 29 12:45:14 ip-10-10-5-23 kernel: asterisk[1872]: segfault at 2 ip 0000000000000002 sp 00007f62e4f0cc88 error 14 in asterisk[400000+2d1000]
Oct 29 12:45:14 ip-10-10-5-23 abrt-hook-ccpp: Process 1701 (asterisk) of user 0 killed by SIGSEGV - dumping core

I have also have CORE DUMP file - Attahced here Google Drive: Sign-in

david551 · October 29, 2021, 9:34am

Raw core dumps are useless because they depend on exact builds of software in including system libraries. Please analyse it as described in Getting a Backtrace - Asterisk Project - Asterisk Project Wiki

Also, many crashes are delayed from the underlying fault, so look for anything unusual being logged leading up to the crash.

Also, most crashes, in the field, happen when people are doing something unusual, so please say what was happening at the time, and any way in which your usage of Asterisk is unusual.

Finally note that 16.22.1 has two fixes for crashes associated with AEL reloads and one fix for crashes associated with Read().

vijo.jose · October 29, 2021, 9:55am

@david551 - we are not done any ael reload during the time.

Getting a Backtrace - Asterisk Project - Asterisk Project Wiki this should run now itself or we need to run when it crashed itself. because it was crashed 3 hr back and whether we will be able to get the required data on the core dump and will be able to find the root cause of the crash.

During the crash, we have not observed anything extra on the asterisk. we didn’t have the issue with chan_sip - is there anything specific to chan_pjsip ?

david551 · October 29, 2021, 10:06am

If you have a core file, ast_coredumper can be run later, as long as you haven’t change versions of the code or libraries in the mean time.

Building with the right options has to be done before the crash, otherwise the information available will be limited.

Nothing in the debugging process is specific to chan_sip.

Full details of what h as been fixed can be found in the summary files for the release.

vijo.jose · October 29, 2021, 12:41pm

@david551 - Thanks for the reply.

I have taken the core dump and below is the link for the same.

ps -C asterisk u
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 11857 87.5 1.0 3413012 161636 ? Sl 12:45 285:34 /usr/sbin/asterisk -f -vvvg -c
root 12198 0.0 0.0 86144 4248 pts/0 S+ 17:42 0:00 rasterisk vvvvvvvvr

https://drive.google.com/drive/folders/1wwzlmY0BESALVPDt4KVS3Rbh4lLcAqXa?usp=sharing

david551 · October 29, 2021, 2:25pm

Google Drive

You need access

vijo.jose · October 29, 2021, 5:01pm

Please check now @david551

https://drive.google.com/drive/folders/1wwzlmY0BESALVPDt4KVS3Rbh4lLcAqXa?usp=sharing

david551 · October 29, 2021, 5:39pm

Thread 1 (Thread 0x7f62e4f0d700 (LWP 1872)):
#0  0x0000000000000002 in  ()
#1  0x000000000059347e in ast_taskprocessor_execute (tps=tps@entry=0x1fa5260) at taskprocessor.c:1235
        local = {local_data = 0x1fa29c0, data = 0x7f634c01c568}
        t = 0x7f634c04a270
        __PRETTY_FUNCTION__ = "ast_taskprocessor_execute"
#2  0x0000000000593520 in default_tps_processing_function (data=data@entry=0x1fa21d0) at taskprocessor.c:209
        listener = 0x1fa21d0
        tps = 0x1fa5260
        pvt = 0x1f98030
        sem_value = 0
        __PRETTY_FUNCTION__ = "default_tps_processing_function"
#3  0x00000000005a2638 in dummy_start (data=<optimized out>) at utils.c:1428
        __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {33157920, -8714024614472384669, 0, 512000, 0, 140062724511488, 8778739609105137507, -8714024939614568605}, __mask_was_saved = 0}}, __pad = {0x7f62e4f0cdb0, 0x0, 0x0, 0x0}}
        __cancel_arg = 0x7f62e4f0d700
        __not_first_call = <optimized out>
        ret = <optimized out>
        a = {start_routine = 0x5934e0 <default_tps_processing_function>, data = 0x1fa21d0, name = 0x1f9f320 "default_tps_processing_function started at [  226] taskprocessor.c default_listener_start()"}
        __PRETTY_FUNCTION__ = "dummy_start"
#4  0x00007f636a6d6ea5 in start_thread () at /usr/lib64/libpthread.so.0
#5  0x00007f6369a769fd in clone () at /usr/lib64/libc.so.6

Asterisk wasn’t built for debugging, and it looks like a task processor request has been overwritten, which will make it very difficult to debug if you don’t know what was likely to be happening at the time.

It’s trying to do a callback, but the callback subroutine address is 0:

github.com

asterisk/asterisk/blob/16.20/main/taskprocessor.c#L1235

    
      
          	tps->thread = pthread_self();
          	tps->executing = 1;
          
          
	if (t->wants_local) {
          		local.local_data = tps->local_data;
          		local.data = t->datap;
          	}
          	ao2_unlock(tps);
          
          
	if (t->wants_local) {
          		t->callback.execute_local(&local);
          	} else {
          		t->callback.execute(t->datap);
          	}
          	tps_task_free(t);
          
          
	ao2_lock(tps);
          	tps->thread = AST_PTHREADT_NULL;
          	/* We need to check size in the same critical section where we reset the
          	 * executing bit. Avoids a race condition where a task is pushed right
          	 * after we pop an empty stack.

In principle, you might get some more information by running:

frame 1
print *t

in gdb, but I suspect that the whole of *t is zeroes, in which case you won’t be able to find out what task it was trying to do. Actually, I’m not sure that anything except the routine with the corrupted address knows what is being done.

vijo.jose · November 1, 2021, 8:18am

@david551 Thanks for the valuable response.

As checked the asterisk log I couldn’t find any error during the time. And during the time only I have received only below warning before the restart

[Oct 29 12:45:14] WARNING[1778] core_local.c: Someone used Local/XXXXXXXX:XXXX somewhere without a @context. This is bad.

Logs_BeforeRestart.txt (5.1 KB)

system · December 1, 2021, 8:18am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Libpjsip.so crashed Asterisk SIP	7	2445	February 11, 2021
Frequent segfaults in pjsip timer lib Asterisk SIP	1	353	May 15, 2020
Asterisk 13.11.0 + PJSIP crash Asterisk Support	2	1280	September 7, 2016
Random Core Dumps Asterisk Support	3	318	September 10, 2014
Suddenly asterisk start crashing, using version 20 Asterisk Support	2	152	June 2, 2024

Asterisk auto restart

Related topics