DNS lookup failures

Hi – sent this earlier but realized that didn’t send it from the mail address associated with my user.

Sorry about that. Trying again :blush:

I have now for 3 days in a row experienced that a SIP trunk is deemed unreachable due to a DNS lookup failure.

Happens every time when trying to send OPTIONS request. Qualify frequency is set to 60 seconds.

The customer is using TLS.

The log outputs this:

res_pjsip.c: Error 320047 ‘No answer record in the DNS response (PJLIB_UTIL_EDNSNOANSWERREC)’ sending OPTIONS request to endpoint [name of endpoint]

Restarting Asterisk makes it work again – using core restart now. No change to the machine running Asterisk.

Do you know any possible reasons for this?

Asterisk is running in Azure and network glitches can happen, but it seems that once a DNS lookup fails, all subsequent lookups fail as well, and it and never recovers until Asterisk is restarted.

I have for now set outbound_proxy for the endpoint to the IP behind the AOR contact.

Would that make Asterisk survive another DNS lookup failure, so the OPTIONS requests can be sent?

Is there another thing I can do to reset the DNS lookup somehow without having to restart the entire Asterisk?

Simple core reload does not change anything.

This is the transport and the SIP trunk configuration from pjsip.conf

[transport-tls]

type=transport

protocol=tls

bind=0.0.0.0:5143

tos=cs3

cos=3

allow_reload=false

method=tlsv1_2

external_media_address=20.8.xxx.xx

external_signaling_address=20.8.xxx.xx

local_net=10.0.0.0/8

local_net=172.16.0.0/12

local_net=172.18.0.0/12

local_net=172.19.0.0/12

local_net=192.168.0.0/16

external_signaling_port=5143

cert_file=/var/lib/asterisk/certs/novus.crt

priv_key_file=/var/lib/asterisk/certs/novus.key

[trunk1]

type=aor

contact=sip:[hostname]:5061

qualify_frequency=60

[trunk1]

type=endpoint

transport=transport-tls

media_encryption=sdes

outbound_proxy=sip:77.234.xxx.xx:5061;transport=tls;lr

force_rport=yes

rewrite_contact=no

disallow=all

allow=alaw

allow=ulaw

user_eq_phone=yes

dtmf_mode=rfc4733

connected_line_method=invite

direct_media_method=invite

direct_media=no

trust_id_inbound=yes

trust_id_outbound=yes

100rel=no

context=from_trunk1

tos_audio=ef

cos_audio=5

timers=yes

timers_min_se=90

timers_sess_expires=1800

auth=

aors=trunk1

[trunk1]

type=identify

endpoint=trunk1

match=77.234.xxx.xx

Kind regards

Morten Sølvberg

+4524240113

morten@soelvberg.com

There are two DNS resolvers that can be used.

If the res_resolver_unbound module is loaded, then the unbound library is used for DNS and it can have its own caching and methods. There’s nothing explicitly in Asterisk that resets or touches it.

If the system resolver is used then it uses the system primitives to do the DNS lookup, with Asterisk not caching the result. Something outside could cache it.

Is a local caching server installed and being used? If you do a packet capture for DNS, does it show the requests occurring?

Thank you for your answer.

Right now we do not use the res_resolver_unbound module, but I will definitely try it. Sounds promising.

Do you know whether setting the outbound_proxy to the IP as we do now will help in this situation so we can use this as a temporary solution, or will the IP it self cause a DNS lookup?

The IP won’t cause a DNS lookup.

How do I get the res_resolver_unbound module loaded?
I just added: load = res_resolver_unbound.so in the modules.conf file and added a resolver_unbound.conf files to the configuration files with the text

[general]
nameserver = 127.0.0.53

However the module is not shown in the modules list.

module show like resolve returns 0 modules loaded

Does the res_resolver_unbound module have any additional dependencies that has to be loaded - or are further configuration changes needed?

As expected I still see these lines in the full log when an OPTIONS request is attempted, so it is still using the inbuilt one.

res_pjsip.c:1716 endpt_send_request: 0x7faed800d980: Wrapper created
res_pjsip.c:1731 endpt_send_request: 0x7faed800d980: Set timer to 3000 msec
res_pjsip/pjsip_resolver.c:475 sip_resolve: Performing SIP DNS resolution of target ‘[MyTarget]’
res_pjsip/pjsip_resolver.c:502 sip_resolve: Transport type for target ‘[MyTarget]’ is ‘TCP transport’
res_pjsip/pjsip_resolver.c:545 sip_resolve: [0x7faed80ad298] Created resolution tracking for target ‘[MyTarget]’
res_pjsip/pjsip_resolver.c:608 sip_resolve: [0x7faed80ad298] No resolution queries for target ‘[MyTarget]’
res_pjsip.c:1594 endpt_send_request_cb: 0x7faed800d980: PJSIP tsx response received
res_pjsip.c:1637 endpt_send_request_cb: 0x7faed800d980: Callbacks executed
res_pjsip.c:1769 endpt_send_request: Error 320047 ‘No answer record in the DNS response (PJLIB_UTIL_EDNSNOANSWERREC)’ sending OPTIONS request to endpoint [MyEndpoint]
res_pjsip.c:1693 send_request_wrapper_destructor: 0x7faed800d980: wrapper destroyed

I ran a tcpdump as well, and I can’t see any requests to any port 53

From what I can see in the pjsip_reolver.c, the log line with No resolution queries for target basically means that no lookup is performed and that the callback is called directly.

Any kind of help would be extremely appreciated.

It requires the libunbound library and developer package to build. You can see what dependencies exist for each module in “make menuselect”.

This says TCP, but your configuration previously showed TLS. Are you using TCP for connecting to other things? The resolver won’t try to resolve things if no transport is available for it, so if no TCP transport is actually available but you request it - then it won’t resolve.

Sorry should have mentioned that this is taken from a different server that uses TCP for the hostname that we are trying to resolve. This server is not used by customers, and the error is easily reproduced.

All customers (and this one) has all transports defined, and the error happens both for TCP and TLS. Haven’t tried UDP, but I assume that it would produce the same result.

To ensure that the TCP transport is in fact working, I added a second endpoint to same host name where I specified the IP in the outbound_proxy so DNS was bypassed, and I am able to send OPTIONS requests and receive the response.

The TCP transport is configured like this:

[transport-tcp]
type=transport
protocol=tcp
bind=0.0.0.0:[port]
tos=cs3
cos=3
allow_reload=false
external_media_address=20.16.xxx.xxx
external_signaling_address=20.16.xxx.xxx
local_net=10.0.0.0/8
local_net=172.16.0.0/12
local_net=172.18.0.0/12
local_net=172.19.0.0/12
local_net=192.168.0.0/16
external_signaling_port=[port]

The aor, endpoint and identify sections are almost the same as what I sent before - except for the transport parameter naturally.

Was the TCP transport added after Asterisk was started?

Oh, and an IP address doesn’t do the transport check it looks like.

The TCP transport is actually added after Asterisk has started. Can this be the source of the problem?

If so, will using the unbound module make a change?

Please note that in most cases i do not see this problem and the transports seems to works perfectly.

I don’t know the code well enough to determine why the DNS resolve object does no contain any queries. Really hope that you can see something.

It can certainly be the source. Adding/removing transports after startup is a bit off-nominal. I don’t think the code handles that scenario for the resolver, so it’s unaware the transport was added. It would need to be changed to support that.

You can file an issue[1] but I have no time frame or whether it would get looked into.

[1] GitHub · Where software is built

Do you think that using the unbound library would make a difference?

Is it only the pjsip_resolver that apparently only checks transport configuration at startup?

I guess that I will find out when I try it, but would love it if you either knew or had a “gut feeling” about it.

It would not, and yes pjsip_resolver only looks at module load time at startup.

Ok - thanks. Then I need to make a rather radical change to out deployment procedure, but thank you for pointing me to the problem.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.