we are using Asterisk 1.4.44
for some resone from the 12.10 this month (no changes been made)
asterisk is falling every 2-3 hours.
i have a core file Analysed with gdb with this error :
"
Core was generated by `/usr/sbin/asterisk -f -U asterisk -G asterisk -vvvg -c’.
Program terminated with signal 11, Segmentation fault. #0 0x0807cd8b in ?? ()
"
can someone tell me where is the issue? i know the answer relays in this " #0 0x0807cd8b in ?? () " but i dont have a clue how to read it.
we have no other errors on messges full or dmesg and there is no overload on the CPU.
what more can i do?
Whilst it is true that 1.4.44 is too old for anyone to be interested, he didn’t actually obtain a backtrace. A backtrace may give a clue as to what to avoid. All he’s given is the message that is printed when gdb is started.
Core was generated by `/usr/sbin/asterisk -f -U asterisk -G asterisk -vvvg -c’.
Program terminated with signal 11, Segmentation fault. #0 0x009c8dc3 in strcasecmp () from /lib/libc.so.6
(gdb) where #0 0x009c8dc3 in strcasecmp () from /lib/libc.so.6 #1 0x001ceec1 in ast_bridge_call (chan=0x8a68708, peer=, config=0x158acd4) at res_features.c:2659 #2 0x008d3a53 in dial_exec_full (chan=0x8a68708, data=, peerflags=0x158ae64, continue_exec=0x0) at app_dial.c:1894 #3 0x008d4b92 in dial_exec (chan=0x8a68708, data=0x158ced8) at app_dial.c:1942 #4 0x080cf42b in pbx_exec (c=0x8a68708, con=0x0, context=0x8a68888 “macro-dialout-trunk”, exten=0x8a688d8 “s”, priority=28, label=0x0,
callerid=0xb3a01528 “442036080253”, action=E_SPAWN) at pbx.c:550 #5 pbx_extension_helper (c=0x8a68708, con=0x0, context=0x8a68888 “macro-dialout-trunk”, exten=0x8a688d8 “s”, priority=28, label=0x0,
callerid=0xb3a01528 “442036080253”, action=E_SPAWN) at pbx.c:1893 #6 0x004f22d9 in _macro_exec (chan=0x8a68708, data=0x1591f38, exclusive=0) at app_macro.c:352 #7 0x080cf42b in pbx_exec (c=0x8a68708, con=0x0, context=0x8a68888 “macro-dialout-trunk”, exten=0x8a688d8 “s”, priority=5, label=0x0,
callerid=0xb33650a0 “\220O6\263”, action=E_SPAWN) at pbx.c:550 #8 pbx_extension_helper (c=0x8a68708, con=0x0, context=0x8a68888 “macro-dialout-trunk”, exten=0x8a688d8 “s”, priority=5, label=0x0, callerid=0xb33650a0 “\220O6\263”,
action=E_SPAWN) at pbx.c:1893 #9 0x080d1ceb in ast_spawn_extension (c=0x8a68708) at pbx.c:2367 #10 __ast_pbx_run (c=0x8a68708) at pbx.c:2461 #11 0x080d2dde in pbx_thread (data=0x8a68708) at pbx.c:2688 #12 0x08103bfb in dummy_start (data=0x83cf840) at utils.c:856 #13 0x00ac1912 in start_thread () from /lib/libpthread.so.0 #14 0x00a2c4ae in clone () from /lib/libc.so.6
The channel is NULL when it is not expected to be. Why - I have no idea. Things have changed quite a lot since then, and it’s likely we’ve fixed it in later versions.
I don’t think you can say that the channel is NULL from the information provided.
On the other hand where this has failed suggests that the primary fault isn’t on the thread that crashed. As such it doesn’t give much clue as to what was the trigger, although it is likely to be something like a channel redirection or forwarding.
I’d probably approach it by trying to work out what happens every 2 hours. There is nothing in Asterisk itself that naturally happens at that interval.
The initial tiny backtrace showed a NULL channel when it was not expected to be and calling ast_set_flag with it will cause a crash. How it got there, dunno.
That sounds like sloppy locking and synchronisation. Disabling optimisation forces the compiler to load late and store early, so, if there is an inadequate memory barrier, it may make operation more reliable. This may be a compiler memory barrier, but the other question is what hardware are you using. Intel hardware is very forgiving to cross thread accesses without locks, but ARM is a completely different matter.
In any case, that Asterisk is so old and you are unlikely to find and be able to avoid the problem part of the code.