Endpoints being called are intermittently not responding to INVITE

Hey there, I’ve been banging away at this server for a few days now, and this is the one issue I just can’t seem to make any ground on. I’m no expert on Asterisk or sip, so I’m hoping either it’s something silly that I’m overlooking, or that someone can offer me new directions to look in. Thanks so much for any help you can offer. I’ve reached the current limits of my expertise on this one :slight_smile:

Problem

Endpoints intermittently don’t respond to INVITE when called. When this happens, the server just keeps sending INVITE to the receiving endpoint until the caller is eventually directed to voicemail. Everything works as expected on the initiating device. I don’t see any error messages in the logs or console.

Reproducing

It seems that after a contact is added or expires and is re-added, the endpoint will usually respond when called for a while. After a while, the endpoint will stop responding to INVITE again.

Setup

  • Asterisk 18 running on an Ubuntu Digital Ocean droplet
  • 2 physical voip phones
  • 1 soft phone connected to a mobile network

Characteristics

  • I have two physical devices and a soft phone running on a mobile phone. All are encountering this issue.
  • The issue still affects the soft phone when connected to the mobile network, so I don’t believe NAT is the issue.
  • 3 endpoints on a single physical phone, all endpoints react independently. (Sometimes 1 endpoint will respond and 2 will not)
  • pcaps of the INVITE requests to the receiving endpoint on failing calls look good. I see nothing wrong with them. Their contents appear identical to INVITE requests that were correctly responded to.

What’s working

  • All interactions with the server seem to work perfectly. Interacting with the IVR, leaving voicemails, etc.
  • Outbound calling via the sip trunk works perfectly every time.

Configuration

pjsip_wizard.conf

[user_defaults](!)
type = wizard
accepts_registrations = yes
sends_registrations = no
accepts_auth = yes
sends_auth = no
endpoint/context = from-internal
endpoint/tos_audio=ef
endpoint/tos_video=af41
endpoint/cos_audio=5
endpoint/cos_video=4
endpoint/allow = !all,ulaw
endpoint/dtmf_mode = rfc4733
endpoint/aggregate_mwi = yes
endpoint/use_avpf = no
endpoint/rtcp_mux = no
endpoint/bundle = no
endpoint/ice_support = no
endpoint/media_use_received_transport = no
endpoint/trust_id_inbound = yes
endpoint/media_encryption = no
endpoint/timers = yes
endpoint/media_encryption_optimistic = no
endpoint/send_pai = yes
endpoint/rtp_symmetric = yes
endpoint/rewrite_contact = yes
endpoint/force_rport = yes
endpoint/language = en

[3000](user_defaults)
aor/max_contacts = 2
aor/remove_existing=yes
endpoint/callerid = Name <3000>
inbound_auth/username = 3000
inbound_auth/password = password

[3001](user_defaults)
aor/max_contacts = 2
aor/remove_existing=yes
endpoint/callerid = Name <3001>
inbound_auth/username = 3001
inbound_auth/password = password

modules.conf

[modules]
autoload=yes

;required
require = chan_pjsip.so

noload => chan_alsa.so
;noload => chan_oss.so
noload => chan_console.so

noload => res_hep.so
noload => res_hep_pjsip.so
noload => res_hep_rtcp.so
noload => chan_sip.so
noload => app_voicemail_odbc.so
noload => app_voicemail_imap.so

noload => cdr_csv.so
noload => cdr_custom.so
noload => cdr_manager.so
noload => cdr_odbc.so

noload => res_config_sqlite.so
noload => cdr_sqlite.so
noload => cdr_sqlite3_custom.so
noload => cdr_pgsql.so
noload => cdr_tds.so
noload => cdr_radius.so
noload => cel_radius.so
noload => cel_sqlite3_custom.so
noload => cel_tds.so

noload => chan_skinny.so
noload => chan_mgcp.so
noload => pbx_dundi.so
noload => chan_iax2.so
noload => chan_ooh323.so
noload => chan_unistim.so

One reason would be that the app has been put to sleep. You can prevent this on Android, but not on iPhones.

If it is not that, it could be a dynamic firewall rule timing out, or the mobile phone’s IP address being reallocated.

Ok I’ve made some progress here. It seems as if i drop my aor/maximum_expiration very low, the calls all go through as expected. So i’m thinking it’s an issue with connections closing.

Currently i’m playing with the aor/maximum_expiration trying to find a sweet spot where the calls work correctly but aren’t reconnecting every 120 seconds.

What are some of the other directions should I look into to improve the situation? Other asterisk settings? Can different firewall configurations help?

Setting a qualify_frequency on the aor is a common thing to do to periodically send a request to an endpoint to keep the NAT mapping open.

1 Like

Using TCP instead of UDP for SIP might circumvent this problem. TCP are less battery consuming and are living longer by default in NAT gateways.
Also, using IPv6 where available has it’s advantages since you simply route around IPv4 NAT.

1 Like

Thanks so much for your reply, I saw qualify_frequency in the documentation while I was struggling and wondered if it might be useful. The server’s working beautifully right now, but I’ll dig deeper into qualify_frequency and see if I can improve my configuration further. Thanks again for the advice!

Hmm that’s interesting, I’ll dig deeper into using IPv6. I’ve never used it before, but I guess we’re all going to have to dig into it some day anyway. :slight_smile: What benefit does routing around the IPv4 NAT have? Does IPv6 have longer lifetimes?

I’ll also look into making use if TCP, thanks a lot for the advice!

NAT gateways often tend to limit the lifetime of UDP sessions (because they would eventually run out of memory if too many sessions are open). Also, the external IP address may change with every new UDP session.
In case of IPv6: There is no NAT, just plain basic routing :slight_smile:

1 Like

That’s cool, I’m going to try switching my asterisk box over to v6. Maybe it’ll improve the situation further, and if not, it’ll still be a learning experience. :slight_smile: Thanks again!