Asterisk is getting restarted automatically in 13.20

Hi , we just upgrade our asterisk to 13.20 , but after then we are facing a problem where asterisk is getting disconnected or getting auto-restart without throwing any error.

And some time this works well for 2 or 3 days and suddenly the problem starts and asterisk gets disconnected in every few minutes.

if anyone has faced such kind of issue or having any solution regarding this issue please suggest.

thanks in advance.

Thanks David for sharing this i will try with this. in between can you please suggest few things to drill down that can cause this problem

Bugs in Asterisk Bugs in third party libraries, e.g database access libraries. Failing memory.

After working smoothly since 3 days, we encountered the same problem today where asterisk gets disconnected automatically and it happened on our all servers (about 5 servers).

there was the one common error on all the server before crashed

iostream.c:507 ast iostream close: SSL_shutdown() failed: error:00000005:lib(0):func(0):DH lib, Underlying BIO error: Bad file descriptor

can you please give some solution regarding this error.

That error usually happens when a tcp socket disconnects unexpectedly. As @david551 mentioned earlier, we need the backtraces.

Thanks Joseph, we configured the backtraces and tested it by killing the asterisk process manually and it was dumping the core file with error log.

But now when again asterisk gets crashed it did not dump any log so not able to get backtraces.
We are facing this problem since a long time and not getting any solution have tried multiple things.
basically we started it getting after switching to SIPML5 (webrtc) and asterisk 13.20

we are currently using

Asterisk 13.20 /15.30
SIPML5 + webrtc
centos / ubuntu

Please suggest something how can we resolve this

When you tested killing the process manually, were you starting asterisk the same way as you do when you run it normally? Can you confirm that asterisk is started with the “-g” option? When asterisk crashes, do you get an entry in the kernel log (dmesg)? Could the core dumps be going someplace you don’t expect? What’s the output of sudo sysctl -n kernel.core_pattern? What’s process’s home directory?

Thanks Joseph,

we started Asterisk with - G option only.
we are not getting entry in kernel log as well.
output of sudo sysctl -n kernel.core_pattern is “/var/crash/core.%u.%e.%p”
Home directory of the process /usr/src

below is the log of my browser if some thing wrong here please highlight.

o=- 1196201691894680800 2 IN IP4
s=Doubango Telecom - chrome
t=0 0
a=group:BUNDLE audio
a=msid-semantic: WMS fUjrewNl4J6CyHB47hMLrfp5P9SxRgkTZjV1
m=audio 49306 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4
a=rtcp:9 IN IP4
a=candidate:3279868371 1 udp 2122260223 49306 typ host generation 0 network-id 1
a=candidate:2007919378 1 udp 2122194687 49307 typ host generation 0 network-id 2 network-cost 10
a=candidate:2382179619 1 tcp 1518280447 9 typ host tcptype active generation 0 network-id 1
a=candidate:959289314 1 tcp 1518214911 9 typ host tcptype active generation 0 network-id 2 network-cost 10
a=fingerprint:sha-256 AD:07:E7:AF:B5:74:C8:6D:09:47:23:BC:07:DA:13:9B:F8:E3:8A:8F:F9:58:0B:49:8D:7D:B1:E9:74:15:E0:FE
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:3846400613 cname:r4de2FR2wXvuEyHX
a=ssrc:3846400613 msid:fUjrewNl4J6CyHB47hMLrfp5P9SxRgkTZjV1 e912a9f2-86a9-4775-ada3-f47932f12d69
a=ssrc:3846400613 mslabel:fUjrewNl4J6CyHB47hMLrfp5P9SxRgkTZjV1
a=ssrc:3846400613 label:e912a9f2-86a9-4775-ada3-f47932f12d69

Options are case sensitive, at least that is normally the case for Unix and Linux.

please suggest if you see any issue with the above option.

below are the errors we are getting frequently

chan_sip.c:4274 __sip_reliable_xmit: Serious Network Trouble; __sip_xmit returns error for pkt data

iostream.c:507 ast_iostream_close: SSL_shutdown() failed: error:00000005:lib(0):func(0):DH lib, Underlying BIO error: Bad file descriptor

Guys please suggest some thing as we are facing this issue every day.this happens once or twice in a day but all connected calls gets dropped at this moment. we do get any thing in back trace.

Do you have a file descriptor leak (ls /proc/fd shows numbers of file descriptors amounting to something close to the corresponding ulimit. You can also use lsof.)?

If you don’t have a file descriptor leak, you need to find out why you are not getting a dump, or you need to run asterisk under the debugger. There is an option to stop it detaching.

Hi David, finally we got some logs in back trace
can you please suggest why this is happening

: [0x56325cc75fa1] main/utils.c:2446 __ast_assert_failed() (0x56325cc75f11+90)
#1: [0x56325ca85db3] main/astobj2.c:190 log_bad_ao2()
#2: [0x56325ca86021] main/astobj2.c:256 __ao2_unlock() (0x56325ca85fc4+5D)
#3: [0x7fc4e2e07105] channels/chan_sip.c:6807 update_call_counter()
#4: [0x7fc4e2e0618f] channels/chan_sip.c:6606 sip_pvt_dtor()
#5: [0x56325ca86c70] main/astobj2.c:585 __ao2_ref() (0x56325ca865d7+699)
#6: [0x56325ca86ecd] main/astobj2.c:629 __ao2_cleanup_debug() (0x56325ca86e85+48)
#7: [0x7fc4e2df5dbd] channels/chan_sip.c:3305 __dialog_unlink_sched_items()
#8: [0x56325cc2733d] main/sched.c:781 ast_sched_runq() (0x56325cc27238+105)
#9: [0x7fc4e2e7e234] channels/chan_sip.c:29763 do_monitor()
#10: [0x56325cc72d3e] main/utils.c:1257 dummy_start()
[May 3 18:18:41] ERROR[29410]: chan_sip.c:6813 update_call_counter: FRACK!, Failed assertion bad magic number 0x7fc1 for object 0x7fc1f4027f18 (0)

backtrace.txt (604.6 KB)

For the short one, you are missing debug symbols, so we can’t see the error message.

For the second one, you are using an optimised binary, which is pretty well incompatible with debugging crashes. However the nature of both crashes suggests that memory has been corrupted.

Unfortunately, memory corruption tends to happen some time before the actual crash, and in a different thread. Building with thread debugging enabled may show a locking error prior to the crash, but it slows execution down considerably.

Thanks David, what should be the action point here? where should i look into exactly to solve this issue can you please suggest.

If you have thread debugging you would be looking for messages saying that locks are being removed when not set, and similar. You’d then need to work out where they should have been set.

This sort of problem is vary hard to debug. You need to have good programming skills and a fair knowledge of the internal of Asterisk, to succeed. Most people have to wait till someone with those skills finds the same problem on their system and submits a patch, or at least, explicit details of what is causing the problem.

I should add that it could also be a hardware fault, although none of the Asterisk memory corruptions I’ve had have been hardware related.

This problem also appeared in asterisk 17, I solved it working with PJSIP, asterisk should place a warning or deny RTC + SIP.

chan_sip is not maintained by the core Asterisk developers, so there is already a general warning not to use it, and it also means that the cored developers are unlikely to add further warnings to it.

Unfortunately, a lot of people seem to start with cook book solutions for chan_sip, many of which are around a decade old, rather than reading the current documentation.