Asterisk hangs up and shows locks

Hi,

I had installed asterisk on my server but it hangs up occassionaly. When I tried to debug the isssue, I found that something is locking up asterisk. Following is the output of ‘core show locks’ on it:-

=======================================================================
=== 11.10.2
=== Currently Held Locks

===
=== <lock#> (): (times locked)

=== Thread ID: 0x7f668f1e6700 (logger_thread started at [ 1251] logger.c init_logger())
=== —> Lock #0 (logger.c): RDLOCK 1068 logger_print_normal &(&logchannels)->lock 0x92e190 (1)
main/logger.c:1610 ast_bt_get_addresses() (0x518670+1D)
main/lock.c:872 __ast_rwlock_rdlock() (0x5122b6+C9)
main/logger.c:1070 logger_print_normal()
main/logger.c:1196 logger_thread()
main/utils.c:1169 dummy_start()
pthread_create.c:0 start_thread()
:0 __clone() (0x7f668ee2a8d0+6D)
=== -------------------------------------------------------------------

=== Thread ID: 0x7f663a62f700 (do_monitor started at [29178] chan_sip.c restart_monitor())
=== —> Lock #0 (chan_sip.c): MUTEX 28586 handle_request_do &netlock 0x7f6646df6800 (1)
main/logger.c:1610 ast_bt_get_addresses() (0x518670+1D)
main/lock.c:218 __ast_pthread_mutex_lock() (0x5102ef+C9)
channels/chan_sip.c:28589 handle_request_do()
channels/chan_sip.c:28548 sipsock_read()
main/io.c:292 ast_io_wait() (0x509bcc+19C)
channels/chan_sip.c:29145 do_monitor()
main/utils.c:1169 dummy_start()
pthread_create.c:0 start_thread()
:0 __clone() (0x7f668ee2a8d0+6D)
=== —> Lock #1 (chan_sip.c): MUTEX 9097 sip_pvt_lock_full pvt 0xf252f00 (1)
main/logger.c:1610 ast_bt_get_addresses() (0x518670+1D)
main/lock.c:218 __ast_pthread_mutex_lock() (0x5102ef+C9)
main/astobj2.c:195 __ao2_lock() (0x452c4e+96)
channels/chan_sip.c:9098 sip_pvt_lock_full()
channels/chan_sip.c:28602 handle_request_do()
channels/chan_sip.c:28548 sipsock_read()
main/io.c:292 ast_io_wait() (0x509bcc+19C)
channels/chan_sip.c:29145 do_monitor()
main/utils.c:1169 dummy_start()
pthread_create.c:0 start_thread()
:0 __clone() (0x7f668ee2a8d0+6D)
=== -------------------------------------------------------------------

=======================================================================

After I restart asterisk, it starts working fine for a day or two and then again goes down. I tried to study about what’s happening but could not really find a particular reason. Could someone please help me find the reason behind this issue?

There is no deadlock evident there, but deadlocks can involve CPU as well as locks. Does this pattern of locks stay constant? Is Asterisk running 100% on at least one CPU?

The only unusual lock is the one in the logging subsystem. I’ve never seen one of those before, but they are possible in normal operation.

Do you have a debug build? If so, you need to obtain backtraces.

Yes mostly we get these locks or the one mentioned below:-

=== Thread ID: 0x7f15255de700 (logger_thread started at [ 1251] logger.c init_logger())
=== —> Lock #0 (logger.c): RDLOCK 1061 logger_print_normal &(&verbosers)->lock 0x92e390 (1)
main/logger.c:1610 ast_bt_get_addresses() (0x518670+1D)
main/lock.c:872 __ast_rwlock_rdlock() (0x5122b6+C9)
main/logger.c:1062 logger_print_normal()
main/logger.c:1196 logger_thread()
main/utils.c:1169 dummy_start()
pthread_create.c:0 start_thread()
:0 __clone() (0x7f15252228d0+6D)

The machine it is installed on is a 2 core machine and this is the CPU utilization for asterisk-

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5816 asterisk 20 0 1726m 258m 11m S 1.0 6.8 1:32.49 asterisk

If you could explain, what do these locks actually mean? what excatly hangs up in the system?

Yes I have debug enable but no core files were created in these cases!

I assume you understand what a lock is. You really need to look at the source code to understand a particular lock.

The one you mention suggests that something has stalled whilst trying to write to one of the asterisk logs. That may mean that you have a very large amount of logging going on and nothing is actually wrong, or you are using some form of logging that has limited throughput.

You don’t seem to be short of memory or CPU.

You can get a core file by using the gcore command, or by manually attaching gdb and either debugging the process in RAM, or forcing a dump. (gcore just automates the latter.)

The one you mention suggests that something has stalled whilst trying to write to one of the asterisk logs. That may mean that you have a very large amount of logging going on and nothing is actually wrong, or you are using some form of logging that has limited throughput.

I had rotated the asterisk logs yesterday to handle the lock related to logger. But again asterisk is showing locks today and my number is coming busy.

=======================================================================
=== 11.10.2
=== Currently Held Locks

===
=== <lock#> (): (times locked)

=== Thread ID: 0x7faa4d1fd700 (do_monitor started at [29178] chan_sip.c restart_monitor())
=== —> Lock #0 (chan_sip.c): MUTEX 28586 handle_request_do &netlock 0x7faa599c4800 (1)
main/logger.c:1610 ast_bt_get_addresses() (0x518670+1D)
main/lock.c:218 __ast_pthread_mutex_lock() (0x5102ef+C9)
channels/chan_sip.c:28589 handle_request_do()
channels/chan_sip.c:28548 sipsock_read()
main/io.c:292 ast_io_wait() (0x509bcc+19C)
channels/chan_sip.c:29145 do_monitor()
main/utils.c:1169 dummy_start()
pthread_create.c:0 start_thread()
:0 __clone() (0x7faaa19898d0+6D)
=== —> Lock #1 (chan_sip.c): MUTEX 9097 sip_pvt_lock_full pvt 0x7faa40203bd0 (1)
main/logger.c:1610 ast_bt_get_addresses() (0x518670+1D)
main/lock.c:218 __ast_pthread_mutex_lock() (0x5102ef+C9)
main/astobj2.c:195 __ao2_lock() (0x452c4e+96)
channels/chan_sip.c:9098 sip_pvt_lock_full()
channels/chan_sip.c:28602 handle_request_do()
channels/chan_sip.c:28548 sipsock_read()
main/io.c:292 ast_io_wait() (0x509bcc+19C)
channels/chan_sip.c:29145 do_monitor()
main/utils.c:1169 dummy_start()
pthread_create.c:0 start_thread()
:0 __clone() (0x7faaa19898d0+6D)


The problem is that, since yesterday no calls have been placed on the number mapped to this server. The only calls that are going on are the one’s generated via qualify settings. But still something locks up asterisk and the number starts showing busy.

You haven’t got the logger lock.

The problem with diagnosing these locks is that nothing is waiting for these locks. Locks are only interesting when there is something shown as waiting. It could be that something is busy waiting by doing conditional locks, but you still don’t have the makings of a deadlock based purely on locks. For a start that requires more than one thread.

There has to be something else that is causing the code to stall. That’s not a tight CPU loop, although it could be one with waits in it. You really need the core dump and backtraces to find the problem.

Perhaps if you can provide more information about your environment. Is this hardware or a VPS? If VPS what kind? Are you compiling Asterisk or installing RPM’s? 32bit or 64bit OS?

If you are compiling are you just running ./configure or adding any config options to that. Did you check your config.log after running ./configure to see if you have any missing dependencies?

This is a hardware and we had compiled asterisk on our 64 bit machine.

We had not added any config option while compiling and all dependencies had been installed then.

Also, I have a copy of this server with same configuration, on which I have mapped a different number with a different service provider but I do not encounter such issues on that server. Can I somehow detect if this is being caused by the provider I am using for this server.

What do you mean by “copy”? Did you recompile on the new hardware? You need to recompile Asterisk if you move/copy/restore the software to different hardware.

Other than that I suppose it could be a hardware issue but most likely software.

Packages like AsteriskNOW would not work if you had to recompile.

Of course, what the packages do is compile for a machine which only supports common processor options, whereas the default build process tries to compile for maximum performance on the processor on which it is being compiled.

However in this case there was no suggestion that the hardware was different, and hardware incompatibilities generally show as illegal instruction exceptions.

If you compile from source with default settings you should recompile if you move to different hardware. AsteriskNOW uses RPM’s and the payload was probably compile with the “Build Native” flag disabled.