WEBRTC: 1 to 4 second delay when establishing outgoing call

We have Asterisk 16.2.2 hooked up to a sip/js webrtc client

We are experiencing a 1 to 4 second delay to get a fully bi-directional call in about 30% of calls.
Here’s what usually happens:

  1. The respondent will receive the call; they will answer and start speaking… “hello Hello”
  2. On our end, we cannot hear this.
  3. It typically takes between 0 and 4 seconds for “full bidirectional” functionality to work.
  4. The rest of the call proceeds as usual.

Here is what we know:

  • This is intermittent - we cannot control when it happens. We have had this problem on connections with high ping, or low ping.
  • The call proceeds all the way and there are no further issues.
  • When doing calls through a regular SIP client, like Counterpath Bria, we do not experience this problem.
  • All clients and trunks are configured to use G711.

We are not sure if the problem is coming from Asterisk or from SIPjs.
What kind of information can I provide you to investigate this?

I have packet captures, and I can send them via private channels if needed.

If you’re deploying WebRTC, you’re going to need to learn how to investigate these kind of issues yourself. In this case WebRTC requires two things to be set up before media can flow - ICE and DTLS-SRTP. ICE will find the path that can be used to exchange packets, and DTLS-SRTP for the keying material to encryption. Looking at a packet capture you need to look at these aspects to put together a timeline and determine what is at fault. If it’s ICE then you need to determine why - is it taking a long time to find candidates? Is it taking a long time to find a working path?

OK will do. I will look into ICE or DTLS-SRTP.
So what if it is not ICE who is the culprit - if it is DTLS-SRTP - what do you suggest we look into?

Also - I was browsing through the settings and I saw that I could choose either DTLS-SRTP or SRTP via in-SDP for media encryption.
I could not find much information regarding these. Do you know anything about it?

More info.

I was looking at debug info and the ICE negotiation does complete before the call starts.
So then, the culprit would be DTLS-SRTP?

Tue Apr 02 2019 20:36:42 GMT+0800 (Singapore Standard Time) | sip.invitecontext.sessionDescriptionHandler | ICE candidate received: candidate:3 2 UDP 1685856254 56464 typ srflx raddr rport 56464 sip.min.js:38:10674
Tue Apr 02 2019 20:36:53 GMT+0800 (Singapore Standard Time) | sip.invitecontext.sessionDescriptionHandler | RTCIceGatheringState changed: complete sip.min.js:38:10674
Tue Apr 02 2019 20:36:53 GMT+0800 (Singapore Standard Time) | sip.transport | sending WebSocket message:
Tue Apr 02 2019 20:36:53 GMT+0800 (Singapore Standard Time) | sip.transport | sending WebSocket message: INVITE sip:*43@xxxxxxxxxx SIP/2.0 Via: SIP/2.0/WSS qd1u36hsi0ga.invalid;branch=z9hG4bK9647369 Max-Forwards: 70 To: <sip:*43@xxxxxxx> From: "ZOOP" <sip:1906@xxxxxxxxxxxxx>;tag=8prf1o60sa Call-ID: fmsbc4n9bjkrn3ol9i0h CSeq: 1673 INVITE 0: ZOOP-CID: undefined 1: ZOOP-ATTEMPT_ID: 1234 Contact: <sip:1ps3kbhs@qd1u36hsi0ga.invalid;transport=ws;ob> Allow: ACK,CANCEL,INVITE,MESSAGE,BYE,OPTIONS,INFO,NOTIFY,REFER Supported: outbound User-Agent: SIP.js/0.10.0 Content-Type: application/sdp Content-Length: 2302

Potentially, that log does show it took 9 seconds before ICE gathering was completed. You’d need to look into the actual packet capture and put together the timeline in coordination with logs to see where it’s coming from.

As for DTLS-SRTP or SRTP via in-SDP, one is DTLS-SRTP and one is SDES-SRTP. WebRTC mandates DTLS-SRTP usage.

It seems that the delay in the STUN negotiation is related to network interfaces that have an IP address but no internet access.
According to this, the solution is to reduce the timeout. Indeed setting a timeout of say 1500 ms improves it considerably.

However, there is still this 1 to 4 second delay after the call and ICE/STUN negotiations are done.
Sometimes no delay. If I redo another call right after the first call, it is fine and there is no delay. It seems that a ‘priming’ call will help.

From what I see in Wireshark running on the client, we can clearly see that the Asterisk server starts sending audio packets, the client is receiving them, but is not handling them, until the second channel is open.
Using the Wireshark RTP player, we can see that the waveform in grey is the one sent by Asterisk. The blue one is sent by the client. We start hearing the “grey” (Asterisk) stream once the blue (Client) one is active.

The difference between “Having Delay” and “No Delay” seem to be the arrival of the “proper” DTLSv1.2 packet.
When there is no delay, this packet below seems to arrive right when Asterisk sends the first RTP packets.

When there is delay, I see a lot of these dud DTLSv1.2 packets with 1 direction RTP.

Any suggestions on why the DTLSv1.2 exchange is taking so long?
Why is Asterisk starting to send audio packets when the exchange looks like it’s not over?

Asterisk does not block transmission of RTP despite ICE or DTLS-SRTP not being completed. I do not know why the DTLS exchange takes so long.

hi @jcolp - your suggestions have been very helpful, thank you.
I found this article written by you here: https://blogs.asterisk.org/2018/02/21/woes-tls-certificates-webrtc/

We are experimenting with the “dtls_auto_generate_cert” setting you mention there and currently we cannot reproduce the problem anymore. We will run wider scale tests, but it seems this could be a step in the right direction.

Your DTLS certificate may have been too large, then, causing fragmentation and problems.