Possible DNS resolution problem

Hi!

Some time ago I described my problems I have with the German Telekom ALL-IP product, where apparently proxy servers abruptly change in the background leading to all kinds of connection problems for a couple of minutes until the next (re)registration. Please note that I no longer say “does change”, but “apparently changes.”

I’d like to put my current observations up for discussion, which could point to a different root of the problem. The following came from evaluations of pcap traces.

The given registrar/SIP proxy, tel.t-online.de, can be evaluated using service requests with NAPTR, SRV and then A/AAAA requests.

At the top the following DNS servers are offered:

ns1.edns.t-ipnet.de 212.185.255.209
ns2.edns.t-ipnet.de 212.185.255.217
ns3.edns.t-ipnet.de 212.185.255.225
ns4.edns.t-ipnet.de 212.185.255.233
ns5.edns.t-ipnet.de 212.185.255.241
ns6.edns.t-ipnet.de 212.185.255.249

where one of them is used to evaluate the next level to retrieve the offered services like _sip._tcp.tel.t-online.de.

The request to get the A record, of say a currently preferred server m-epp-110.edns.t-ipnet.de, is then evaluted using servers like dns02.dns.t-ipnet.de (194.25.2.172). It seems as if the nsX servers and the more general servers dnsXX are not synchronized.

I evaluated the chain manually and never found any difference among the nsX servers.

I picked ns1.edns.t-ipnet.de and evaluated the chain for more than a day constantly to monitor the returned servers for the service _sip._tcp.tel.t-online.de. There was not a single change.

My current conclusion is that the way the addresses are resolved may play a role.

German Telekom has a different product “German Telkom SIP-Trunk” that does not show these problems. From the outside one difference is that one can resolve the proxy reg.sip-trunk.telekom.de only using NAPTR and SRV requests, whereas tel.t-online.de additionally returns an A record, which might or might not confuse the DNS resolution.

Could stricter specifications for which DNS servers to use solve the problem?

I did not find a problem with the DNS resolution. The router in this case is running unbound and I repeatedly checked the cache (Asterisk is running the system resolver) and evaluated the DNS hierarchy with unbound-host. The fluctuations come from the upstream servers and some make it to Asterisk depending on the TTL values with the result, that sometimes outgoing calls use other servers than the one one is registered to (and fail).

Either one does not let Asterisk evaluate the SRV records and use something like m-epp-110.edns.t-ipnet.de, because these servers do not simply vanish, or one registers a fake address within unbound(-control local_data …), so one knows when to update the registrations within Asterisk.