Random RTP Deadlock

We are facing some random RTP related deadlock on our production environment.
We use Asterisk 13.21-cert6 with chan_sip (we known it’s deprecated, working on the update) with WebRTC and almost everyday the box stops processing new SIP dialogs, with a huge Recv-Q at the 5060 port.
As far as we could go debuging it we found that a lot of the threads are locked waiting for a common one like this:

#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x7fa8500cc610) at …/sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x5596c7504780, cond=0x7fa8500cc5e8) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7fa8500cc5e8, mutex=0x5596c7504780) at pthread_cond_wait.c:655
#3 0x00005596c62093f5 in __ast_cond_wait (filename=0x5596c632cc6b “sched.c”, lineno=643, func=0x5596c632d228 <PRETTY_FUNCTION.13586> “ast_sched_del”, cond_name=0x5596c632ceaf “&s->cond”,
mutex_name=0x5596c632cd5b “&con->lock”, cond=0x7fa8500cc5e8, t=0x5596c7504780) at lock.c:600
#4 0x00005596c6294031 in ast_sched_del (con=0x5596c7504780, id=335) at sched.c:643
#5 0x00007fa86b5db832 in dtls_srtp_stop_timeout_timer (instance=0x7fa678021270, rtp=0x7fa678032ac0, rtcp=0) at res_rtp_asterisk.c:2269
#6 0x00007fa86b5da913 in ast_rtp_dtls_stop (instance=0x7fa678021270) at res_rtp_asterisk.c:1798
#7 0x00007fa86b5de47a in ast_rtp_destroy (instance=0x7fa678021270) at res_rtp_asterisk.c:3242
#8 0x00005596c62689fc in instance_destructor (obj=0x7fa678021270) at rtp_engine.c:385
#9 0x00005596c611dd05 in internal_ao2_ref (user_data=0x7fa678021270, delta=-1, file=0x5596c62edfdb “astobj2.c”, line=518, func=0x5596c62ee288 <FUNCTION.8907> “__ao2_ref”) at astobj2.c:451
#10 0x00005596c611dfe5 in __ao2_ref (user_data=0x7fa678021270, delta=-1) at astobj2.c:518
#11 0x00007fa86b5db5d6 in dtls_srtp_handle_rtp_timeout (data=0x7fa678021270) at res_rtp_asterisk.c:2222
#12 0x00005596c629478c in ast_sched_runq (con=0x5596c7504780) at sched.c:786
#13 0x00007fa86bc07366 in do_monitor (data=0x0) at chan_sip.c:29752
#14 0x00005596c62d8631 in dummy_start (data=0x5596c769b710) at utils.c:1239
#15 0x00007fa8d7cd9fa3 in start_thread (arg=) at pthread_create.c:486
#16 0x00007fa8d776deff in clone () at …/sysdeps/unix/sysv/linux/x86_64/clone.S:95

What is strage is that this is a call that was hangup almost 20 minutes before the first deadlocked thread.

During the problem, I ran the ast_coredumper --running twice, within 3 minutes. On the Google Drive link is the ast_coredumper full result.

Can someone give us a hint about how to deal with it?

The only reason for having certified version is that you have a support contract.

If it is impossible to upgrade to a supported version, at least upgrade to the last release of Asterisk 13.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.