UDP resends seem to be causing phones to go unreachable

This issue sprung up in the middle of last week and has been kicking my @$$ ever since. I have over a dozen Polycom Soundpoint 301/320/330 phones in my office on a Core 2 Quad server running Asterisk 1.6.2.7. Last week the just started showing up unreachable and then reachable again after 10 seconds. I have checked and reset every switch. I finally got around to doing a packet capture today and it seems I might have found the cause of the problem.

From the messages log:
[Oct 24 16:12:54] NOTICE[1070] chan_sip.c: Peer ‘1403’ is now UNREACHABLE! Last qualify: 17

The packet capture from near the time of the peer becoming unreachable:

[code]2012-10-24 16:12:49.448677
Frame No. 1406

OPTIONS sip:1403@192.168.1.6:5060 SIP/2.0
Via: SIP/2.0/UDP 192.168.1.218:5060;branch=z9hG4bK3d1bacd1;rport
Max-Forwards: 70
From: “asterisk” sip:asterisk@192.168.1.218;tag=as695ac428
To: sip:1403@192.168.1.6:5060
Contact: sip:asterisk@192.168.1.218
Call-ID: 74ca63f07eace5465c2d00425218dc13@192.168.1.218
CSeq: 102 OPTIONS
User-Agent: Asterisk PBX 1.6.2.7
Date: Wed, 24 Oct 2012 20:12:49 GMT
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO
Supported: replaces, timer
Content-Length: 0

2012-10-24 16:12:49.466369
Frame No. 1414 Response time 17ms

@asfQEA@|XSIP/2.0 200 OK
Via: SIP/2.0/UDP 192.168.1.218:5060;branch=z9hG4bK3d1bacd1;rport
From: “asterisk” sip:asterisk@192.168.1.218;tag=as695ac428
To: “1403” sip:1403@192.168.1.6:5060;tag=E6EE5A70-63A9715B
CSeq: 102 OPTIONS
Call-ID: 74ca63f07eace5465c2d00425218dc13@192.168.1.218
Contact: sip:1403@192.168.1.6:5060
Allow: INVITE, ACK, BYE, CANCEL, OPTIONS, INFO, MESSAGE, SUBSCRIBE, NOTIFY, PRACK, UPDATE, REFER
Supported: 100rel,replaces,100rel,timer,replaces
User-Agent: PolycomSoundPointIP-SPIP_330-UA/3.2.1.0054
Accept-Language: en
Accept: application/sdp,text/plain,message/sipfrag,application/dialog-info+xml
Accept-Encoding: identity
Content-Length: 0

2012-10-24 16:12:54.696275
Frame No. 1424 (Suspected Resend of frame: 1406)

fQ@asE/Z@3<OPTIONS sip:1403@192.168.1.6:5060 SIP/2.0
Via: SIP/2.0/UDP 192.168.1.218:5060;branch=z9hG4bK3d1bacd1;rport
Max-Forwards: 70
From: “asterisk” sip:asterisk@192.168.1.218;tag=as695ac428
To: sip:1403@192.168.1.6:5060
Contact: sip:asterisk@192.168.1.218
Call-ID: 74ca63f07eace5465c2d00425218dc13@192.168.1.218
CSeq: 102 OPTIONS
User-Agent: Asterisk PBX 1.6.2.7
Date: Wed, 24 Oct 2012 20:12:49 GMT
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO
Supported: replaces, timer
Content-Length: 0

2012-10-24 16:12:54.710379
Frame No. 1431 (Suspected resend of frame 1414) (Request frame: 1406) Response time(ms) 5261

@asfQEA@{XSIP/2.0 200 OK
Via: SIP/2.0/UDP 192.168.1.218:5060;branch=z9hG4bK3d1bacd1;rport
From: “asterisk” sip:asterisk@192.168.1.218;tag=as695ac428
To: “1403” sip:1403@192.168.1.6:5060;tag=E6EE5A70-63A9715B
CSeq: 102 OPTIONS
Call-ID: 74ca63f07eace5465c2d00425218dc13@192.168.1.218
Contact: sip:1403@192.168.1.6:5060
Allow: INVITE, ACK, BYE, CANCEL, OPTIONS, INFO, MESSAGE, SUBSCRIBE, NOTIFY, PRACK, UPDATE, REFER
Supported: 100rel,replaces,100rel,timer,replaces
User-Agent: PolycomSoundPointIP-SPIP_330-UA/3.2.1.0054
Accept-Language: en
Accept: application/sdp,text/plain,message/sipfrag,application/dialog-info+xml
Accept-Encoding: identity
Content-Length: 0[/code]

The UDP packets are occasionally getting resent and the resent Status 200 message is over 5000ms from the original Options message from the server, which seems to be causing the server to think that the phone has timed out, when the network has no issues (the status 200 resend was received less than 15ms from the resent Options message).

Can anyone tell me why packets are getting resent? I thought UDP was designed to drop packets that didn’t show up in time (which in this case didn’t happen anyway, as the original Options message was replied to in 17ms). What can I do to stop the retransmissions?

Are you sure you havent got a loop in teh network, We have seen similar when someone plugged both ports on a aatra handset into the lan.

Thanks for the reply, Ian. I just got into the office and checked, but I could not find any loops in the network.

Here’s the latest. About two hours after my original post, I got desperate and upgraded to Asterisk 1.8.17.0. The problem persists after the upgrade and I have looked around for any loops in the network as per Ian’s suggestion. A packet capture from this morning demonstrates the same behavior as the previous packet capture. A resend of the Status 200 message is sent back to the Asterisk server slightly over 5 seconds after the original Options message, causing the system to think that the phones have timed out.

is it from all phones or just specific ones? , I would mirror the port on of the sets is on then do a wireshark capture on that phone to see if its resending. then do the same on the port the server is plugged into.

I won’t be able to try that out until about 12 hours from now. It is happening on multiple phones that are connected to different secondary switches in the office. None of the secondary switches are smart switches, so I will have to move one of the phones to the server room. Could you please give me the procedure for capturing the mirrored port once I have it set up that way? Once I have captured data for both mirrored ports, what am I looking for to see which one is causing the resend?

Ian,

I still need to help with the procedure on capturing a mirrored port on the smartswitch. Could you please assist?.

Thanks,

John

Was the root cause of this issue ever discovered?

There was an issue with the registrations for the VoIP provider. I still have no idea why packets were being resent, but as soon as we went to IP based authentication, the issue went away.