We had an outage of our upstream internet connection yesterday. Unfortunately asterisk stopped working locally aswell and started answering REGISTER-requests with a 503 error. I looked into it and found, that the dns_system_resolver_tp taskprocessors queue was constantly increasing. That caused pjsip to answer with 503. To replicate this issue asterisk needs to query a non-answering DNS server (so no response at all, no “SERVFAIL” and no “icmp inreachable”), then you should be able to see dns_system_resolver_tp counts starting to increase. Is there any way to prevent this? I believe this can be considered a bug?
Not 100% sure (because the incident was a stressy situation) but I think I can confirm this also with 18.15. I had the same issue that asterisk stopped working becaus of unreachable DNS servers. The interesting thing is that theoretically, my asterisk would not need to do any DNS quieres. I do not use any hostnames, my VoIP provider requires me to send registers and calls to an IP Adress.
So I am really not sure why unreachable DNS Servers bothered my setup - asterisk has nothing to resolve in my case. But after making sure that DNS Servers are reachable again, everything started to work again.
For PJSIP it uses the core DNS support which is pluggable. The recommended resolver to use is unbound which uses an outside library. It has true asynchronous functionality and doesn’t suffer from such issues. If this is unavailable it falls back to system primitives which do not have such functionality and block on DNS queries, when implementing I believe we looked at what the system ones were capable of and how to handle this and didn’t find anything to allow us to be more tolerant.
@TheNextDay There are cases where it will do a DNS lookup on its local hostname to find its own IP address.
Alright, I didn’t have unbound installed/in use, I fixed that now. Thanks for the fast help!
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.