Qualify fails until restarted

Today two customers seized to work beacuse the SipTrunk becale Unreachable
However the OPTIONS request was not sent - I got this error instead:

ERROR[63] res_pjsip.c: Error 320047 ‘No answer record in the DNS response (PJLIB_UTIL_EDNSNOANSWERREC)’ sending OPTIONS request to endpoint 51_vk102136zylXXX

Tried disabling Qualify, but then the INVITES were not sent
Inbound traffic worked fine, but outbound requests never worked.

Restarting Asterisk fixed the issue, but this naturally also dropped all calls which is not a solution that our customers likes.
Do you have any clue to what could be the root cause?

I am using Asterisk v. 18.12.1, but will soon upgrade to version 20.6.0
Is this an issue that has already been resolved?

The root cause would be either an upstream DNS server failure for what it was trying to contact or reach, or that server returning no records. Every time I’ve seen this come up it’s been something upstream in some way. Turning on debug logging will also tell you what the DNS resolver is doing and what it got back.

Is Debug 1 enough to get this information, I should I set it higher?

“Unfortunately” everything works fine after a restart, so I can’t see how this can be a problem with the external DNS server, since restarting Asterisk fixes the issue.
Is there a way that I can restart or reload only the DNS part without dropping ongoing calls?

Debug level 2 would be needed I believe. And restarting Asterisk will clear any state including DNS cache. There is no functionality to “restart” the DNS portion.

What is the output of “module show like resolver”?

If it happens again you could also do a packet capture of DNS traffic at the time to see what is really happening on the network with it.

“module show like resolver” returns “0 modules loaded”
Could this be a part of the problem?

If a resolver module isn’t loaded then it falls back to the underlying system primitives. It will still work regardless, but it just nudges my analysis even further into something outside of Asterisk in some way.

I just found a third customer with same problem. This customer isn’t as busy as the others, so a lot easier to debug.
The debug log shows this (with debug set to 5):

[Apr 29 09:46:49] DEBUG[1168]: res_pjsip/pjsip_options.c:927 sip_options_qualify_aor: Qualifying all contacts on AOR ‘4_TDCNuudaySIPTrunk’
[Apr 29 09:46:49] DEBUG[1168]: res_pjsip/pjsip_options.c:856 sip_options_qualify_contact: Qualifying contact ‘4_TDCNuudaySIPTrunk@@4ddd0a01390170f2f735d1e9d2740f2b’ on AOR ‘4_TDCNuudaySIPTrunk’
[Apr 29 09:46:49] DEBUG[1168]: res_pjsip.c:1566 endpt_send_request: 0x7facb4018ce0: Wrapper created
[Apr 29 09:46:49] DEBUG[1168]: res_pjsip.c:1581 endpt_send_request: 0x7facb4018ce0: Set timer to 3000 msec
[Apr 29 09:46:49] DEBUG[1168]: res_pjsip/pjsip_resolver.c:475 sip_resolve: Performing SIP DNS resolution of target ‘vk179970.zylinccloud.supertrunk.net
[Apr 29 09:46:49] DEBUG[1168]: res_pjsip/pjsip_resolver.c:502 sip_resolve: Transport type for target ‘vk179970.zylinccloud.supertrunk.net’ is ‘TLS transport’
[Apr 29 09:46:49] DEBUG[1168]: res_pjsip/pjsip_resolver.c:545 sip_resolve: [0x7facb4007d88] Created resolution tracking for target ‘vk179970.zylinccloud.supertrunk.net
[Apr 29 09:46:49] DEBUG[1168]: res_pjsip/pjsip_resolver.c:608 sip_resolve: [0x7facb4007d88] No resolution queries for target ‘vk179970.zylinccloud.supertrunk.net
[Apr 29 09:46:49] DEBUG[1168]: res_pjsip.c:1444 endpt_send_request_cb: 0x7facb4018ce0: PJSIP tsx response received
[Apr 29 09:46:49] DEBUG[1168]: res_pjsip.c:1457 endpt_send_request_cb: 0x7facb4018ce0: Cancelling timer
[Apr 29 09:46:49] DEBUG[1168]: res_pjsip.c:1466 endpt_send_request_cb: 0x7facb4018ce0: Timer cancelled
[Apr 29 09:46:49] DEBUG[1168]: res_pjsip.c:1487 endpt_send_request_cb: 0x7facb4018ce0: Callbacks executed
[Apr 29 09:46:49] ERROR[1168]: res_pjsip.c:1619 endpt_send_request: Error 320047 ‘No answer record in the DNS response (PJLIB_UTIL_EDNSNOANSWERREC)’ sending OPTIONS request to endpoint 4_TDCNuudaySIPTrunk
[Apr 29 09:46:49] DEBUG[1168]: res_pjsip.c:1543 send_request_wrapper_destructor: 0x7facb4018ce0: wrapper destroyed
[Apr 29 09:46:49] DEBUG[1168]: res_pjsip/pjsip_options.c:755 sip_options_contact_status_notify_task: Contact 4_TDCNuudaySIPTrunk/sip:vk179970.zylinccloud.supertrunk.net:5061 status didn’t change: Unreachable, RTT: 0.000 msec
[Apr 29 09:46:49] DEBUG[1168]: res_pjsip/pjsip_options.c:775 sip_options_contact_status_notify_task: AOR ‘4_TDCNuudaySIPTrunk’ now has 0 available contacts

I haven’t restarted Asterisk yet, so if you have any other suggestions, then I would love to hear them.

What is the configuration of the transports in pjsip.conf? And what is the AOR configuration? And did you make changes to transports (such as adding a new one) without restarting?

No, I haven’t changed anything in the transport section.
It has allow_reload set to true, which in my experience works fine as long as you don’t change the certificate or changing the port, if you have mulitple trunks using the same port (which I don’t have for TLS).

Tls transport :
[transport-tls]
type=transport
protocol=tls
bind=0.0.0.0:5099
tos=cs3
cos=3
allow_reload=true
method=tlsv1_2
external_media_address=20.229.231.55
external_signaling_address=20.229.231.55
local_net=10.0.0.0/8
local_net=172.16.0.0/12
local_net=172.18.0.0/12
local_net=172.19.0.0/12
local_net=192.168.0.0/16
external_signaling_port=5099
cert_file=/var/lib/asterisk/certs/novus.crt
priv_key_file=/var/lib/asterisk/certs/novus.key

The AOR for the Siptrunk:
[19_TDCnuudaySIPtrunk]
type=aor
contact=sip:VK143439.zylinccloud.supertrunk.net:5061
qualify_frequency=60

It works fine, until it doesn’t. The only time it is needed anymore is if you change the bind option basically. It causes more problems than it helps generally.

Do you have a log of this failing case on this system I can compare? Do you have a log of a working case? Previously it said a transport type of TLS, which I’m surprised by since in this configuration you provided the URI does not specify TLS.

The example in my original post was from a customer that did not use TLS.
The debug log and the configuration was for a customer that does use TLS.

How much log do you need for the curently failing tenant?
The entire full log?

Below is a debug log for a working tenant that uses TLS:

[Apr 29 10:18:39] DEBUG[1055]: res_rtp_asterisk.c:6317 update_local_mes_stats: clientprod1-1714385794.60: rtt: 0.041567909 j: 0.000000000 sjh: 0.000168836 lost: 0.080000000 mes: 84.6
[Apr 29 10:18:40] DEBUG[1056]: manager.c:7347 process_message: Running action ‘Ping’
[Apr 29 10:18:41] DEBUG[1055]: res_rtp_asterisk.c:6317 update_local_mes_stats: clientprod1-1714385890.62: rtt: 0.039218500 j: 0.000145833 sjh: 0.000125389 lost: 0.000000000 mes: 84.7
[Apr 29 10:18:43] DEBUG[1593][C-0000001b]: res_rtp_asterisk.c:6583 ast_rtcp_interpret: (clientprod1-1714385890.62) RTCP got report of 32 bytes from 78.153.187.98:9314
[Apr 29 10:18:43] DEBUG[1593][C-0000001b]: res_rtp_asterisk.c:6283 update_reported_mes_stats: clientprod1-1714385890.62: rtt: 0.040108800 j: 0.000229167 sjh: 0.000114109 lost: 0.000000000 mes: 84.7
[Apr 29 10:18:44] DEBUG[1576][C-0000001b]: res_rtp_asterisk.c:6583 ast_rtcp_interpret: (clientprod1-1714385794.60) RTCP got report of 96 bytes from 80.251.192.49:63333
[Apr 29 10:18:44] DEBUG[1576][C-0000001b]: res_rtp_asterisk.c:6283 update_reported_mes_stats: clientprod1-1714385794.60: rtt: 0.041435087 j: 0.000250000 sjh: 0.000201718 lost: 0.000000000 mes: 84.7
[Apr 29 10:18:44] DEBUG[1576][C-0000001b]: res_rtp_asterisk.c:6705 ast_rtcp_interpret: (0x7f741c007950) RTCP 0x7f741c02d740 – from 80.251.192.49:63333: 207(Unknown) skipping record
[Apr 29 10:18:44] DEBUG[1055]: res_rtp_asterisk.c:6317 update_local_mes_stats: clientprod1-1714385794.60: rtt: 0.041435087 j: 0.000000000 sjh: 0.000167146 lost: 0.076923077 mes: 84.6
[Apr 29 10:18:45] DEBUG[1593][C-0000001b]: res_rtp_asterisk.c:6583 ast_rtcp_interpret: (clientprod1-1714385890.62) RTCP got report of 56 bytes from 78.153.187.98:9314
[Apr 29 10:18:46] DEBUG[1055]: res_rtp_asterisk.c:6317 update_local_mes_stats: clientprod1-1714385890.62: rtt: 0.040108800 j: 0.000375000 sjh: 0.000133335 lost: 0.000000000 mes: 84.7
[Apr 29 10:18:47] DEBUG[1593][C-0000001b]: res_rtp_asterisk.c:6583 ast_rtcp_interpret: (clientprod1-1714385890.62) RTCP got report of 32 bytes from 78.153.187.98:9314
[Apr 29 10:18:47] DEBUG[1593][C-0000001b]: res_rtp_asterisk.c:6283 update_reported_mes_stats: clientprod1-1714385890.62: rtt: 0.040743000 j: 0.000437500 sjh: 0.000105644 lost: 0.000000000 mes: 84.7
[Apr 29 10:18:49] DEBUG[1576][C-0000001b]: res_rtp_asterisk.c:6583 ast_rtcp_interpret: (clientprod1-1714385794.60) RTCP got report of 96 bytes from 80.251.192.49:63333
[Apr 29 10:18:49] DEBUG[1576][C-0000001b]: res_rtp_asterisk.c:6283 update_reported_mes_stats: clientprod1-1714385794.60: rtt: 0.041327292 j: 0.000250000 sjh: 0.000197737 lost: 0.000000000 mes: 84.7
[Apr 29 10:18:49] DEBUG[1576][C-0000001b]: res_rtp_asterisk.c:6705 ast_rtcp_interpret: (0x7f741c007950) RTCP 0x7f741c02d740 – from 80.251.192.49:63333: 207(Unknown) skipping record
[Apr 29 10:18:49] DEBUG[1362]: res_pjsip.c:1716 endpt_send_request: 0x7f741c09a390: Wrapper created
[Apr 29 10:18:49] DEBUG[1362]: res_pjsip.c:1731 endpt_send_request: 0x7f741c09a390: Set timer to 3000 msec
[Apr 29 10:18:49] DEBUG[1362]: res_pjsip/pjsip_resolver.c:475 sip_resolve: Performing SIP DNS resolution of target ‘3kontakt.dk’
[Apr 29 10:18:49] DEBUG[1362]: res_pjsip/pjsip_resolver.c:502 sip_resolve: Transport type for target ‘3kontakt.dk’ is ‘TLS transport’
[Apr 29 10:18:49] DEBUG[1362]: res_pjsip/pjsip_resolver.c:545 sip_resolve: [0x7f741c040838] Created resolution tracking for target ‘3kontakt.dk’
[Apr 29 10:18:49] DEBUG[1362]: res_pjsip/pjsip_resolver.c:174 sip_resolve_add: [0x7f741c040838] Added target ‘3kontakt.dk’ with record type ‘1’, transport ‘TLS transport’, and port ‘5061’
[Apr 29 10:18:49] DEBUG[1362]: res_pjsip/pjsip_resolver.c:616 sip_resolve: [0x7f741c040838] Starting initial resolution using parallel queries for target ‘3kontakt.dk’
[Apr 29 10:18:49] DEBUG[1012]: res_pjsip/pjsip_resolver.c:273 sip_resolve_callback: [0x7f741c040838] All parallel queries completed
[Apr 29 10:18:49] DEBUG[1012]: res_pjsip/pjsip_resolver.c:322 sip_resolve_callback: [0x7f741c040838] A record received on target ‘3kontakt.dk’
[Apr 29 10:18:49] DEBUG[1012]: res_pjsip/pjsip_resolver.c:322 sip_resolve_callback: [0x7f741c040838] A record received on target ‘3kontakt.dk’
[Apr 29 10:18:49] DEBUG[1012]: res_pjsip/pjsip_resolver.c:322 sip_resolve_callback: [0x7f741c040838] A record received on target ‘3kontakt.dk’
[Apr 29 10:18:49] DEBUG[1012]: res_pjsip/pjsip_resolver.c:322 sip_resolve_callback: [0x7f741c040838] A record received on target ‘3kontakt.dk’
[Apr 29 10:18:49] DEBUG[1012]: res_pjsip/pjsip_resolver.c:322 sip_resolve_callback: [0x7f741c040838] A record received on target ‘3kontakt.dk’
[Apr 29 10:18:49] DEBUG[1012]: res_pjsip/pjsip_resolver.c:322 sip_resolve_callback: [0x7f741c040838] A record received on target ‘3kontakt.dk’
[Apr 29 10:18:49] DEBUG[1012]: res_pjsip/pjsip_resolver.c:417 sip_resolve_callback: [0x7f741c040838] Resolution completed - 6 viable targets
[Apr 29 10:18:49] DEBUG[1253]: res_pjsip/pjsip_resolver.c:199 sip_resolve_invoke_user_callback: [0x7f741c040838] Address ‘0’ is 80.251.192.41:5061 with transport ‘TLS transport’
[Apr 29 10:18:49] DEBUG[1253]: res_pjsip/pjsip_resolver.c:199 sip_resolve_invoke_user_callback: [0x7f741c040838] Address ‘1’ is 80.251.192.55:5061 with transport ‘TLS transport’
[Apr 29 10:18:49] DEBUG[1253]: res_pjsip/pjsip_resolver.c:199 sip_resolve_invoke_user_callback: [0x7f741c040838] Address ‘2’ is 80.251.192.43:5061 with transport ‘TLS transport’
[Apr 29 10:18:49] DEBUG[1253]: res_pjsip/pjsip_resolver.c:199 sip_resolve_invoke_user_callback: [0x7f741c040838] Address ‘3’ is 80.251.192.53:5061 with transport ‘TLS transport’
[Apr 29 10:18:49] DEBUG[1253]: res_pjsip/pjsip_resolver.c:199 sip_resolve_invoke_user_callback: [0x7f741c040838] Address ‘4’ is 80.251.192.51:5061 with transport ‘TLS transport’
[Apr 29 10:18:49] DEBUG[1253]: res_pjsip/pjsip_resolver.c:199 sip_resolve_invoke_user_callback: [0x7f741c040838] Address ‘5’ is 80.251.192.45:5061 with transport ‘TLS transport’
[Apr 29 10:18:49] DEBUG[1253]: res_pjsip/pjsip_resolver.c:205 sip_resolve_invoke_user_callback: [0x7f741c040838] Invoking user callback with ‘6’ addresses
<— Transmitting SIP request (486 bytes) to TLS:80.251.192.41:5061 —>
OPTIONS sip:3kontakt.dk:5061 SIP/2.0
Via: SIP/2.0/TLS 20.229.231.55:5077;rport;branch=z9hG4bKPj66d45bc4-fed8-4fe5-9fb1-857a573b773f;alias

What I need is consistency of information and usage. Bits and pieces from different things doesn’t help me, so:

  1. A configuration
  2. A failing log
  3. A working log

From the same setup.

For non-TLS, what transport was used?

The code is acting as if the transport wasn’t available at startup so it’s not doing a DNS lookup.

The non-TLS was using TCP

Sorry about the bits and pieces.
I sadly do not have a log with debug enabled from before the error started to occur.
Instead I have attached configuration and logs for 2 tenants - one that is working and one that is failing.both tenants are using TLS.
I hope that this is good enough to work with.

working8_full.txt (557.1 KB)
working8_configuration.txt (2.4 KB)
failing4_log.txt (27.2 KB)
failing4_configuration.txt (1.8 KB)

I really really don’t trust allow_reload being set to yes.

Do you have logging from the startup of Asterisk on the failure case? Specifically these:

[Apr 29 10:54:27] VERBOSE[1694] res_pjsip/pjsip_resolver.c: 'UDP+IPv4' is an available SIP transport
[Apr 29 10:54:27] VERBOSE[1694] res_pjsip/pjsip_resolver.c: 'TCP+IPv4' is an available SIP transport
[Apr 29 10:54:27] VERBOSE[1694] res_pjsip/pjsip_resolver.c: 'TLS+IPv4' is an available SIP transport
[Apr 29 10:54:27] VERBOSE[1694] res_pjsip/pjsip_resolver.c: 'UDP+IPv6' is not an available SIP transport, disabling resolver support for it
[Apr 29 10:54:27] VERBOSE[1694] res_pjsip/pjsip_resolver.c: 'TCP+IPv6' is not an available SIP transport, disabling resolver support for it
[Apr 29 10:54:27] VERBOSE[1694] res_pjsip/pjsip_resolver.c: 'TLS+IPv6' is not an available SIP transport, disabling resolver support for it

No, sadly I have only enabled debug from the console
I have enabled it in the config for the 2 tenants that I sent you logs for but if I restart the failing one then the problem goes away, and can’t get debug logs for a failing scenario.

Is there any useful information that I should gather before restarting?

Nothing comes to mind.