Sometimes no ACK for incoming calls

In the recent past I’ve observed some incoming SIP calls from outside without incoming audio and it turned out that my 200 OKs do not get answered with ACKs in these cases.

My settings allow early audio, so that explains why RTP packets are flowing to the other end. More than 99% of all the calls are not affected, and if so, the other side are always cellular phones.

Has someone observed something like this before? Could it be that these phones are merely temporarily in a dead spot ? It would be something like that they can trigger a call, but are then not able to answer properly.

The caller should have no impact on the OK or ACK.

Well, there is no ACK. I’ll collect some data from the HOMER system I’ve hooked up and show such a sequence later.

This is a flow diagram of one of these calls (incoming leg). There’s no ACK after the initial 200 which leads to the repetitions. When the call gets terminated locally, the other side replies properly with a 200.

I am basically talking to my service provider, so anything behind that is almost a black box for me. As said before, this happens very rarely and it is hence difficult to get a SIP trace from Asterisk. Some of my systems are reporting to HOMER servers, so I can go a couple of days backwards if necessary

You didn’t say you were using TCP. I think the provider has to be broken, but I would like to see a 200 OK in its entirety, as i suppose it is just possible that the Contact header was telling it to do something silly.

Yes, the content of the 200 ok would be interesting. The destination IP seems to be Deutsche Telekom, correct? They are pretty picky about content - maybe there is a problem? But usually, if there’s anything wrong, they immediately drop the call completely. Strange.
But if you can see the problem only with cellphones, I wouldn’t investigate any time as cellphones usually are pretty unreliable in Germany (bad networks). The Response for the bye is probably generated by the SIP server itself, whereas the ACK comes from the cellphone?

As far as I am concerned, there’s nothing special here:

SIP/2.0 200 OK
Via:  SIP/2.0/TCP 217.0.XXX.XXX:5060;rport=5060;received=217.0.XXX.XXX;branch=z9hG4bK5e7871c4ebb931882b506f22d7b1afcc.3cff30f4
Record-Route:  <sip:217.0.XXX.XXX:5060;transport=TCP;lr>
Call-ID fb46651ecba67d5d@87.WWW.WWW.WWW
From:  <;user=phone>;tag=80996758
To:  <;user=phone>;tag=af3beb0f-2344-485e-9659-e7297ecf266f
CSeq:  66974835 INVITE
Server:  Asterisk PBX ...
Contact:  <sip:80.150.YYY.YYY:5060;transport=TCP>
Supported:  100rel, timer, replaces, norefersub
Content-Type:  application/sdp
Content-Length:    287

o=- 606911977 606911979 IN IP4 80.150.YYY.YYY
c=IN IP4 80.150.YYY.YYY
t=0 0
m=audio 13046 RTP/AVP 8 0 9 100
a=rtpmap:8 PCMA/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:9 G722/8000
a=rtpmap:100 telephone-event/8000
a=fmtp:100 0-16

Asterisk sets the proper values for external_media_address and external_signaling_address and again, the problem occurs very rarely, but I can identify the pattern with the missing ACK.

Meanwhile I can exclude a weak signal on the other end.

But that looks completely different in the advertising of Telekom, 1 & 1 and Vodafone. They all talk about the best network in the world, :wink:

Then the question would be, why not all calls are affected.

On the other hand, calls may get treated differently depending on whatever. I just made sure that there is no extra stuff in the Server header, as some strings of my own git branches sneaked in. On the other hand, calls may get treated differently depending on whatever. I now made sure that there is not extra stuff in the Server header, as some strings of my own git branches had sneaked in, which could possibly confuse upstream servers in case they really evaluate this header. Though, nothing to test yet.

You made my day :slight_smile:

We’re handling and providing support for about ~100.000 calls a day (not with Asterisk) - but we never support sporadic problems with cellphones - it’s time wasting. Nobody cares about temporary broken calls to cellphones (my experience).

I compared your 200 Ack with mine - I couldn’t see any relevant difference, too. It’s a pretty normal 200 Ok to me with seemingly correct IP addresses (no not reachable local IP-address):

2021/08/31 10:22:22.627325 -> 217.0.x.y:5061
SIP/2.0 200 OK
Via: SIP/2.0/TLS 217.0.x.y:5061;received=217.0.x.y;branch=z9hG4bKg3Zqkv7in4cy2cll2hvxjhv3o6tszd4kr
Record-Route: <sip:217.0.x.y:5061;transport=tls;lr>
Call-ID: p65562t1630398137m730462c73159s2
From: <;user=phone>;tag=h7g4Esbg_p65562t1630398137m730462c73159s1_2782385691-759808874
To: <;user=phone>;tag=35af02c9-6f3f-470b-9688-43e8a0795ec8
Server: FPBX-
Contact: <sip:;transport=TLS>
Supported: 100rel, replaces, norefersub
Content-Type: application/sdp
Content-Length:   324

o=- 1391192655 2855218910 IN IP4
c=IN IP4
t=0 0
m=audio 10006 RTP/SAVP 8 101
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:JJy/95HVdii9443gksW7S8JqXLt4pR/M4nIMDVjy
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16

In case there would not be the proper external address, Telekom would handle the media by means of connected media tricks. I’ve checked both scenarios a while ago and found that it doesn’t really matter what you use.

I could see different behavior regarding AllIP - you’re using the SIP trunk. Good to know.

The AllIP INVITE contains the provider name of the Caller. I don’t know if you get them as SIP trunk customer, too. If yes, can you say, that the sporadic problems can be seen with any provider or always with the same?

The AllIP product is generally a bit more critical, but I service mainly trunks. I don’t see the caller’s provider, but I happen to know that one of the “critical” callers has a Telekom contract, too.

Meanwhile I can exclude a couple of causes. It looks as if the problem occurs only with mobile phones and it’s always the same phones. So far, I don’t know what triggers the missing ACK, but it is not random (and, again, it happens very rarely).

The next thing would be for me to have a closer look at the phones that cause the problem. That might take some time.

If you can break it down to a dedicated device and if you are able to reproduce it, you should trace it even on the cellphone itself - is there VoLTE in use? Are there sporadic client crashes? I’m pretty sure it’s mostly impossible to figure out the problem without tracing the Caller’s side - honestly, I don’t expect Telekom to do anything (unless you have very good arguments) - but maybe, I’m wrong.

I can exclude that as I heard road noise for one device. My approach is to collect the signalling using HOMER and evaluate as things don’t work as expected.

The problem occurs only for incoming calls, not for outgoing calls to these devices.