Asterisk is getting restarted automatically in 13.20

jitendra_singhal · April 21, 2018, 5:01pm

Hi , we just upgrade our asterisk to 13.20 , but after then we are facing a problem where asterisk is getting disconnected or getting auto-restart without throwing any error.

And some time this works well for 2 or 3 days and suddenly the problem starts and asterisk gets disconnected in every few minutes.

if anyone has faced such kind of issue or having any solution regarding this issue please suggest.

thanks in advance.

david551 · April 21, 2018, 5:11pm

https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

jitendra_singhal · April 21, 2018, 5:28pm

Thanks David for sharing this i will try with this. in between can you please suggest few things to drill down that can cause this problem

david551 · April 21, 2018, 6:40pm

Bugs in Asterisk Bugs in third party libraries, e.g database access libraries. Failing memory.

jitendra_singhal · April 23, 2018, 12:16pm

After working smoothly since 3 days, we encountered the same problem today where asterisk gets disconnected automatically and it happened on our all servers (about 5 servers).

there was the one common error on all the server before crashed

iostream.c:507 ast iostream close: SSL_shutdown() failed: error:00000005:lib(0):func(0):DH lib, Underlying BIO error: Bad file descriptor

can you please give some solution regarding this error.

gjoseph · April 24, 2018, 2:30pm

That error usually happens when a tcp socket disconnects unexpectedly. As @david551 mentioned earlier, we need the backtraces.

jitendra_singhal · April 28, 2018, 1:18pm

Thanks Joseph, we configured the backtraces and tested it by killing the asterisk process manually and it was dumping the core file with error log.

But now when again asterisk gets crashed it did not dump any log so not able to get backtraces.
We are facing this problem since a long time and not getting any solution have tried multiple things.
basically we started it getting after switching to SIPML5 (webrtc) and asterisk 13.20

we are currently using

Asterisk 13.20 /15.30
SIPML5 + webrtc
centos / ubuntu

Please suggest something how can we resolve this

gjoseph · April 30, 2018, 2:18pm

When you tested killing the process manually, were you starting asterisk the same way as you do when you run it normally? Can you confirm that asterisk is started with the “-g” option? When asterisk crashes, do you get an entry in the kernel log (dmesg)? Could the core dumps be going someplace you don’t expect? What’s the output of sudo sysctl -n kernel.core_pattern? What’s process’s home directory?

jitendra_singhal · May 1, 2018, 8:31am

Thanks Joseph,

we started Asterisk with - G option only.
we are not getting entry in kernel log as well.
output of sudo sysctl -n kernel.core_pattern is “/var/crash/core.%u.%e.%p”
Home directory of the process /usr/src

below is the log of my browser if some thing wrong here please highlight.

v=0
o=- 1196201691894680800 2 IN IP4 127.0.0.1
s=Doubango Telecom - chrome
t=0 0
a=group:BUNDLE audio
a=msid-semantic: WMS fUjrewNl4J6CyHB47hMLrfp5P9SxRgkTZjV1
m=audio 49306 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4 172.21.167.97
a=rtcp:9 IN IP4 0.0.0.0
a=candidate:3279868371 1 udp 2122260223 172.21.167.97 49306 typ host generation 0 network-id 1
a=candidate:2007919378 1 udp 2122194687 10.0.9.42 49307 typ host generation 0 network-id 2 network-cost 10
a=candidate:2382179619 1 tcp 1518280447 172.21.167.97 9 typ host tcptype active generation 0 network-id 1
a=candidate:959289314 1 tcp 1518214911 10.0.9.42 9 typ host tcptype active generation 0 network-id 2 network-cost 10
a=ice-ufrag:IdNE
a=ice-pwd:Km21ObgPt/oxnlCWv4+nkVs2
a=ice-options:trickle
a=fingerprint:sha-256 AD:07:E7:AF:B5:74:C8:6D:09:47:23:BC:07:DA:13:9B:F8:E3:8A:8F:F9:58:0B:49:8D:7D:B1:E9:74:15:E0:FE
a=setup:actpass
a=mid:audio
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=sendrecv
a=rtcp-mux
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:3846400613 cname:r4de2FR2wXvuEyHX
a=ssrc:3846400613 msid:fUjrewNl4J6CyHB47hMLrfp5P9SxRgkTZjV1 e912a9f2-86a9-4775-ada3-f47932f12d69
a=ssrc:3846400613 mslabel:fUjrewNl4J6CyHB47hMLrfp5P9SxRgkTZjV1
a=ssrc:3846400613 label:e912a9f2-86a9-4775-ada3-f47932f12d69

david551 · May 1, 2018, 9:52am

Options are case sensitive, at least that is normally the case for Unix and Linux.

jitendra_singhal · May 1, 2018, 1:51pm

please suggest if you see any issue with the above option.

below are the errors we are getting frequently

chan_sip.c:4274 __sip_reliable_xmit: Serious Network Trouble; __sip_xmit returns error for pkt data

iostream.c:507 ast_iostream_close: SSL_shutdown() failed: error:00000005:lib(0):func(0):DH lib, Underlying BIO error: Bad file descriptor

jitendra_singhal · May 2, 2018, 4:40pm

Guys please suggest some thing as we are facing this issue every day.this happens once or twice in a day but all connected calls gets dropped at this moment. we do get any thing in back trace.

david551 · May 2, 2018, 8:25pm

Do you have a file descriptor leak (ls /proc/fd shows numbers of file descriptors amounting to something close to the corresponding ulimit. You can also use lsof.)?

If you don’t have a file descriptor leak, you need to find out why you are not getting a dump, or you need to run asterisk under the debugger. There is an option to stop it detaching.

jitendra_singhal · May 3, 2018, 1:01pm

Hi David, finally we got some logs in back trace
can you please suggest why this is happening

: [0x56325cc75fa1] main/utils.c:2446 __ast_assert_failed() (0x56325cc75f11+90)
#1: [0x56325ca85db3] main/astobj2.c:190 log_bad_ao2()
#2: [0x56325ca86021] main/astobj2.c:256 __ao2_unlock() (0x56325ca85fc4+5D)
#3: [0x7fc4e2e07105] channels/chan_sip.c:6807 update_call_counter()
#4: [0x7fc4e2e0618f] channels/chan_sip.c:6606 sip_pvt_dtor()
#5: [0x56325ca86c70] main/astobj2.c:585 __ao2_ref() (0x56325ca865d7+699)
#6: [0x56325ca86ecd] main/astobj2.c:629 __ao2_cleanup_debug() (0x56325ca86e85+48)
#7: [0x7fc4e2df5dbd] channels/chan_sip.c:3305 __dialog_unlink_sched_items()
#8: [0x56325cc2733d] main/sched.c:781 ast_sched_runq() (0x56325cc27238+105)
#9: [0x7fc4e2e7e234] channels/chan_sip.c:29763 do_monitor()
#10: [0x56325cc72d3e] main/utils.c:1257 dummy_start()
[May 3 18:18:41] ERROR[29410]: chan_sip.c:6813 update_call_counter: FRACK!, Failed assertion bad magic number 0x7fc1 for object 0x7fc1f4027f18 (0)

jitendra_singhal · May 3, 2018, 1:16pm

backtrace.txt (604.6 KB)

david551 · May 3, 2018, 2:09pm

For the short one, you are missing debug symbols, so we can’t see the error message.

For the second one, you are using an optimised binary, which is pretty well incompatible with debugging crashes. However the nature of both crashes suggests that memory has been corrupted.

Unfortunately, memory corruption tends to happen some time before the actual crash, and in a different thread. Building with thread debugging enabled may show a locking error prior to the crash, but it slows execution down considerably.

jitendra_singhal · May 3, 2018, 6:04pm

Thanks David, what should be the action point here? where should i look into exactly to solve this issue can you please suggest.

david551 · May 3, 2018, 8:50pm

If you have thread debugging you would be looking for messages saying that locks are being removed when not set, and similar. You’d then need to work out where they should have been set.

This sort of problem is vary hard to debug. You need to have good programming skills and a fair knowledge of the internal of Asterisk, to succeed. Most people have to wait till someone with those skills finds the same problem on their system and submits a patch, or at least, explicit details of what is causing the problem.

I should add that it could also be a hardware fault, although none of the Asterisk memory corruptions I’ve had have been hardware related.

cruzjoel · October 7, 2020, 10:27pm

This problem also appeared in asterisk 17, I solved it working with PJSIP, asterisk should place a warning or deny RTC + SIP.

david551 · October 8, 2020, 12:16am

chan_sip is not maintained by the core Asterisk developers, so there is already a general warning not to use it, and it also means that the cored developers are unlikely to add further warnings to it.

Unfortunately, a lot of people seem to start with cook book solutions for chan_sip, many of which are around a decade old, rather than reading the current documentation.

Topic		Replies	Views
Asterisk Restart/Shutdown Automaticcally Asterisk Support	1	362	May 23, 2018
Webrtc sipml5 disconnects all connections randomly Asterisk WebRTC	4	449	January 26, 2022
Asterisk 14.5.0 occasional crashes Asterisk Support	21	1500	December 14, 2017
[RESOLVED] Sporadic Disconnections Asterisk Support	5	2601	March 29, 2018
Asterisk auto restart Asterisk SIP	9	916	December 1, 2021

Asterisk is getting restarted automatically in 13.20

Related topics