Hanging Asterisk 1.6.2.20 on KVM-QEMU and VMWARE

Hi,

for several months now I’m noticing blocks of virtual environments, asterisk seems working but in reality it is locked and I have to kill it with kill -9.
To find the problem I have recompiled the unbundled version 1.6.2.20 with the option DEBUG_THREADS DONT_OPTIMIZE and in the Compiler Flags section of menuSelect.
During the last block I got this information:
asterisk -rx “core show locks” | tee /tmp/core-show-locks.txt
cat /tmp/core-show-locks.txt

=======================================================================
=== Currently Held Locks ==============================================

===
=== <lock#> (): (times locked)

=== Thread ID: 0xb70a1b70 (tps_processing_function started at [ 451] taskprocessor.c ast_taskprocessor_get())
=== —> Lock #0 (pbx.c): MUTEX 3910 handle_statechange hints 0x8acb390 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x81115ca]
/usr/sbin/asterisk() [0x8087e88]
/usr/sbin/asterisk(_ao2_lock+0x48) [0x8088a78]
/usr/sbin/asterisk() [0x812d5ec]
/usr/sbin/asterisk() [0x817ad98]
/usr/sbin/asterisk() [0x818caf1]
/lib/i686/cmov/libpthread.so.0(+0x5955) [0xb7329955]
/lib/i686/cmov/libc.so.6(clone+0x5e) [0xb7536e7e]
=== —> Lock #1 (pbx.c): MUTEX 3911 handle_statechange hint 0xcae4948 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x81115ca]
/usr/sbin/asterisk() [0x8087e88]
/usr/sbin/asterisk(_ao2_lock+0x48) [0x8088a78]
/usr/sbin/asterisk() [0x812d617]
/usr/sbin/asterisk() [0x817ad98]
/usr/sbin/asterisk() [0x818caf1]
/lib/i686/cmov/libpthread.so.0(+0x5955) [0xb7329955]
/lib/i686/cmov/libc.so.6(clone+0x5e) [0xb7536e7e]
=== —> Lock #2 (chan_sip.c): MUTEX 13398 cb_extensionstate p 0xcb6ddf0 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x81115ca]
/usr/sbin/asterisk() [0x8087e88]
/usr/sbin/asterisk(_ao2_lock+0x48) [0x8088a78]
/usr/lib/asterisk/modules/chan_sip.so(+0x381d8) [0xb2c1c1d8]
/usr/sbin/asterisk() [0x812d7c1]
/usr/sbin/asterisk() [0x817ad98]
/usr/sbin/asterisk() [0x818caf1]
/lib/i686/cmov/libpthread.so.0(+0x5955) [0xb7329955]
/lib/i686/cmov/libc.so.6(clone+0x5e) [0xb7536e7e]
=== —> Lock #3 (channel.c): RDLOCK 1355 ast_channel_search_locked &(&channels)->lock 0x81fec88 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x81115ca]
/usr/sbin/asterisk() [0x80a3853]
/usr/sbin/asterisk(ast_channel_search_locked+0x39) [0x80a6f87]
/usr/lib/asterisk/modules/chan_sip.so(+0x2fe8a) [0xb2c13e8a]
/usr/lib/asterisk/modules/chan_sip.so(+0x38379) [0xb2c1c379]
/usr/sbin/asterisk() [0x812d7c1]
/usr/sbin/asterisk() [0x817ad98]
/usr/sbin/asterisk() [0x818caf1]
/lib/i686/cmov/libpthread.so.0(+0x5955) [0xb7329955]
/lib/i686/cmov/libc.so.6(clone+0x5e) [0xb7536e7e]
=== —> Waiting for Lock #4 (channel.c): MUTEX 1357 ast_channel_search_locked &c->lock_dont_use 0xcbbe1ec (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x81115ca]
/usr/sbin/asterisk() [0x80a2aec]
/usr/sbin/asterisk(ast_channel_search_locked+0x76) [0x80a6fc4]
/usr/lib/asterisk/modules/chan_sip.so(+0x2fe8a) [0xb2c13e8a]
/usr/lib/asterisk/modules/chan_sip.so(+0x38379) [0xb2c1c379]
/usr/sbin/asterisk() [0x812d7c1]
/usr/sbin/asterisk() [0x817ad98]
/usr/sbin/asterisk() [0x818caf1]
/lib/i686/cmov/libpthread.so.0(+0x5955) [0xb7329955]
/lib/i686/cmov/libc.so.6(clone+0x5e) [0xb7536e7e]
=== — ---> Locked Here: channel.c line 2769 (__ast_read)
=== -------------------------------------------------------------------

=== Thread ID: 0xb26b8b70 (do_monitor started at [23309] chan_sip.c restart_monitor())
=== —> Lock #0 (chan_sip.c): MUTEX 22754 handle_request_do &netlock 0xb2c75cc0 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x81115ca]
/usr/lib/asterisk/modules/chan_sip.so(+0x7e66) [0xb2bebe66]
/usr/lib/asterisk/modules/chan_sip.so(+0x64025) [0xb2c48025]
/usr/lib/asterisk/modules/chan_sip.so(+0x63ded) [0xb2c47ded]
/usr/sbin/asterisk(ast_io_wait+0x144) [0x8106eba]
/usr/lib/asterisk/modules/chan_sip.so(+0x65bcd) [0xb2c49bcd]
/usr/sbin/asterisk() [0x818caf1]
/lib/i686/cmov/libpthread.so.0(+0x5955) [0xb7329955]
/lib/i686/cmov/libc.so.6(clone+0x5e) [0xb7536e7e]
=== —> Waiting for Lock #1 (channel.c): WRLOCK 949 __ast_channel_alloc_ap &(&channels)->lock 0x81fec88 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x81115ca]
/usr/sbin/asterisk() [0x80a3b04]
/usr/sbin/asterisk() [0x80a5e53]
/usr/sbin/asterisk(__ast_channel_alloc+0x78) [0x80a60f1]
/usr/lib/asterisk/modules/chan_sip.so(+0x1ab31) [0xb2bfeb31]
/usr/lib/asterisk/modules/chan_sip.so(+0x5bd11) [0xb2c3fd11]
/usr/lib/asterisk/modules/chan_sip.so(+0x633af) [0xb2c473af]
/usr/lib/asterisk/modules/chan_sip.so(+0x64621) [0xb2c48621]
/usr/lib/asterisk/modules/chan_sip.so(+0x63ded) [0xb2c47ded]
/usr/sbin/asterisk(ast_io_wait+0x144) [0x8106eba]
/usr/lib/asterisk/modules/chan_sip.so(+0x65bcd) [0xb2c49bcd]
/usr/sbin/asterisk() [0x818caf1]
/lib/i686/cmov/libpthread.so.0(+0x5955) [0xb7329955]
/lib/i686/cmov/libc.so.6(clone+0x5e) [0xb7536e7e]
=== -------------------------------------------------------------------

=== Thread ID: 0xb2640b70 (pbx_thread started at [ 4630] pbx.c ast_pbx_start())
=== —> Lock #0 (channel.c): MUTEX 2769 __ast_read &chan->lock_dont_use 0xcbbe1ec (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x81115ca]
/usr/sbin/asterisk() [0x80a2e24]
/usr/sbin/asterisk() [0x80aa99d]
/usr/sbin/asterisk(ast_read+0x19) [0x80ac697]
/usr/sbin/asterisk(ast_safe_sleep_conditional+0xc0) [0x80a715a]
/usr/sbin/asterisk(ast_safe_sleep+0x28) [0x80a72f4]
/usr/sbin/asterisk(ast_senddigit+0x47) [0x80acfb9]
/usr/sbin/asterisk(ast_dtmf_stream+0x13a) [0x806b233]
/usr/sbin/asterisk(ast_bridge_call+0xdb3) [0x80e7fd6]
/usr/lib/asterisk/modules/app_dial.so(+0xd4d9) [0xb32d84d9]
/usr/lib/asterisk/modules/app_dial.so(+0xddec) [0xb32d8dec]
/usr/sbin/asterisk(pbx_exec+0x1a5) [0x81254a3]
/usr/sbin/asterisk() [0x812cdae]
/usr/sbin/asterisk(ast_spawn_extension+0x53) [0x812e65f]
/usr/sbin/asterisk() [0x812ed73]
/usr/sbin/asterisk() [0x813023d]
/usr/sbin/asterisk() [0x818caf1]
/lib/i686/cmov/libpthread.so.0(+0x5955) [0xb7329955]
/lib/i686/cmov/libc.so.6(clone+0x5e) [0xb7536e7e]
=== -------------------------------------------------------------------

=== Thread ID: 0xb2604b70 (pbx_thread started at [ 4630] pbx.c ast_pbx_start())
=== —> Waiting for Lock #0 (channel.c): WRLOCK 1844 ast_hangup &(&channels)->lock 0x81fec88 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x81115ca]
/usr/sbin/asterisk() [0x80a3b04]
/usr/sbin/asterisk() [0x80a5e53]
/usr/sbin/asterisk(__ast_channel_alloc+0x78) [0x80a60f1]
/usr/lib/asterisk/modules/chan_sip.so(+0x1ab31) [0xb2bfeb31]
/usr/lib/asterisk/modules/chan_sip.so(+0x5bd11) [0xb2c3fd11]
/usr/lib/asterisk/modules/chan_sip.so(+0x633af) [0xb2c473af]
/usr/lib/asterisk/modules/chan_sip.so(+0x64621) [0xb2c48621]
/usr/lib/asterisk/modules/chan_sip.so(+0x63ded) [0xb2c47ded]
/usr/sbin/asterisk(ast_io_wait+0x144) [0x8106eba]
/usr/lib/asterisk/modules/chan_sip.so(+0x65bcd) [0xb2c49bcd]
/usr/sbin/asterisk() [0x818caf1]
/lib/i686/cmov/libpthread.so.0(+0x5955) [0xb7329955]
/lib/i686/cmov/libc.so.6(clone+0x5e) [0xb7536e7e]
=== -------------------------------------------------------------------

=== Thread ID: 0xb258cb70 (pbx_thread started at [ 4630] pbx.c ast_pbx_start())
=== —> Waiting for Lock #0 (channel.c): WRLOCK 1844 ast_hangup &(&channels)->lock 0x81fec88 (1)
/usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x81115ca]
/usr/sbin/asterisk() [0x80a3b04]
/usr/sbin/asterisk() [0x80a5e53]
/usr/sbin/asterisk(__ast_channel_alloc+0x78) [0x80a60f1]
/usr/lib/asterisk/modules/chan_sip.so(+0x1ab31) [0xb2bfeb31]
/usr/lib/asterisk/modules/chan_sip.so(+0x5bd11) [0xb2c3fd11]
/usr/lib/asterisk/modules/chan_sip.so(+0x633af) [0xb2c473af]
/usr/lib/asterisk/modules/chan_sip.so(+0x64621) [0xb2c48621]
/usr/lib/asterisk/modules/chan_sip.so(+0x63ded) [0xb2c47ded]
/usr/sbin/asterisk(ast_io_wait+0x144) [0x8106eba]
/usr/lib/asterisk/modules/chan_sip.so(+0x65bcd) [0xb2c49bcd]
/usr/sbin/asterisk() [0x818caf1]
/lib/i686/cmov/libpthread.so.0(+0x5955) [0xb7329955]
/lib/i686/cmov/libc.so.6(clone+0x5e) [0xb7536e7e]
=== -------------------------------------------------------------------

=======================================================================

UPDATE

cat /tmp/refs

0x90f6bc0 call destructor astobj2.c:948:cd_cb_debug (deref object via container destroy)
0x90f5988 -1 astobj2.c:948:cd_cb_debug (deref object via container destroy) [@1]
0x90f5988 call destructor astobj2.c:948:cd_cb_debug (deref object via container destroy)
0x90f4880 -1 astobj2.c:948:cd_cb_debug (deref object via container destroy) [@1]
0x90f4880 call destructor astobj2.c:948:cd_cb_debug (deref object via container destroy)
0x90f6288 -1 astobj2.c:948:cd_cb_debug (deref object via container destroy) [@1]
0x90f6288 call destructor astobj2.c:948:cd_cb_debug (deref object via container destroy)
0x90f5070 -1 astobj2.c:948:cd_cb_debug (deref object via container destroy) [@1]
0x90f5070 call destructor astobj2.c:948:cd_cb_debug (deref object via container destroy)
0xcda9298 -1 astobj2.c:948:cd_cb_debug (deref object via container destroy) [@1]

I add that I also installed version 1.8 and has the same problem …

I couldn’t find a loop in the locks, so it doesn’t look like a lock/lock deadlock, but it could be a loop or could involve a non-lock resource.

If you can reproduce it on 1.8.6, you should force the production of a backtrace and submit both it and the core show locks to issues.asterisk.org/jira/