Some Grandstream phones keep going into unreachable state

We have Asterisk 14.4.1 and 10 Grandstream GXP1628 phones with latest firmware (1.0.4.100), all phones are behind NAT and connect to server via public internet.

5 phones work without problems, they are connected via provider X.
Other phones are using other (different) providers, and they go into unreachable state about 20 times per day. It can last between 5 minutes to 5 hours before they become reachable again.

All phones have in their settings active NAT: Keep-alive and NAT send Options every 15 seconds. Asterisk also has qualify set to 15 seconds.

How can I debug the reason for going offline, and how can I solve the issue?

10x

Unreachable means that we sent a SIP OPTIONS and did not receive a response. Capturing traffic for a period of time and then examining things could narrow things down, specifically if Asterisk sent the OPTIONS then it would be the remote NAT or the Grandstream itself.

That’s correct, I haven’t used wireshark, but from sip debug peer I can see no response from phone. Also I stop getting keep-alive packets from phone.

This same phone, when moved to the other NAT network, works OK. Switching Grandstream phones from Digium phones, also doesn’t resolve the problem. It looks like provider’s router clears NAT routing table and the phone doesn’t/cannot try to reregister when not getting keep-alive responses.

These are the settings on the phones I am using:
OPTIONS Keep Alive Interval - Specifies the frequency (in second) in which the phone will send the Keep Alive message to the server.
OPTIONS Keep Alive Max Lost - Specifies the maximum number of allowed lost packet before the phone will
refresh its registration.

But it doesn’t during the unreachable period, internet on the phone’s network work OK.

If you’ve got Digium phones, you’ll want to turn on the udp_ka_interval option. By default, it’s set to 0, off. You set it for the number of seconds at which you’d like the phone to send a CRLF SIP packet to its registered server. That should assist in keeping the NAT hole open.

e.g.

<?xml version="1.0" ?>
<config>
    <setting id="udp_ka_interval" value="60"/>
</config>

Note that there are several devices on the market that close the hole earlier than 60 seconds, e.g. Adtran. You might need to bring it down to 30 seconds or less. You’ll have to experiment.