Troubleshooting advice for dropped calls

Hi,

I’m facing an issue where calls to confbridge on an asterisk are dropped with an error of “no reply to our critical packet”. Several calls work fine, then (seemingly after exceeding a certain number of calls) all calls are dropped almost simultaneously.

As fas as I was able to trace it seems, that the UCS is responding to a session-timer refreshing Re-Invite, but asterisk is not sending an ACK for the response, instead retransmits the re-invite. After some timeout the call is then dropped. This then happens to all the other active calls.

Any help in finding out what causes the problem would be very much appreciated.

Please post SIP logs by enabling SIP debugging. If Asterisk is not receiving an ACK to its response, try insecure=invite in your peer configurations.

insecure=invite should have no effect. All it does is to stop Asterisk sending 401, and have it accept the INVITE without checking passwords. Even with INVITEs secure, the peer will still ACK all final responses, including 401.

If Asterisk is not sending ACK to a final response, it almost certainly means either the Call-ID or one or more of the from and to tags are garbled. In particular look for transformed domain parts in the Call-ID.

Thank you for your replies and apologies for taking so long to answer. To and From Fields are good as far as I can see. Attached Flow shows a failed call, where Asterisk (right side) sends a reinvite with CSeq 102 (First packet). The UCS (left side) answers, but instead of ACK, asterisk repeats the Re-Invite with CSeq 102 (12th Packet).
sipflow

So far I have not found out what triggers the situation. Currently we are using Asterisk 13, could it be a bug?

Thanks a lot.

Classic NAT misconfiguration symptoms. The Contact header from the UAS almost certainly contains a private address, whereas the UAC is not on the private LAN.

Actually neither of the components use NAT. Both are in the same subnet. In fact it was working before when UAS and asterisk were in different subnets (also without using NAT), but most likely the network topology would not matter.

What does the Contact header in the 200 OK actually say?

It is certainly behaving as though that Contact header address is unreachable from the UAC.

The contact header shows correct phonenumber and Address of the UAC. It indeed looks like the asterisk somehow stops the call processing in this phase as new calls are not answered for a while and also the receive counters in sip show channelstats are no longer increased (the send counter still seems to count).
In rarer cases the capture shows icmp unreachables sent from the asterisk for single RTP ports. So far I have not found any trigger for the situation.