Multiple REGISTRATIONs for a single AUTH

I am currently working (again) through some SIP related interface descriptions given by German Telekom. In particular 1TR114 (https://www.telekom.de/hilfe/downloads/1tr114.pdf) and 1TR118 (https://www.telekom.de/hilfe/downloads/1tr118.pdf).

Asterisk with PJSIP does DNS NAPTR and SRV lookups very well and prefers the SRV record with the highest priority. The prioritization of the returned SRV records does change and DTAG writes that this has to do with maintenance and failure situations (1TR114, page 24), which is reasonable. The docs also describe how the user equipment has to react.

When the old registration expires it registers with the new IP, which is what one would expect. Does that mean that the old registration gets terminated or does it still exist in the background until it times out?

According to PJSIP Configuration Sections and Relationships - Asterisk Project - Asterisk Project Wiki there can be only up to a single REGISTRATION for a given AUTH, so that probably implies that the old registration gets terminated before the systems registers against the new IP. Without going into the details, the DTAG servers allow only calls with the server that one has registered against, which either requires to change the source code a bit, or handle this outside of Asterisk. According to 1TR114 the DTAG servers allow to have more than a single active registration for a given account in order to handle their priorities correctly.

I am not sure whether this is intended, but one can independently register with all available proxy servers that are returned by the SRV requests. This ensures maximum reachability.

All registered servers signal an incoming call and this behaves roughly like an Asterisk dial group. So far I have not received a call from Telekom to check my system configuration…

They should have the same callid, but different branchid’s, which differs from Asterisk multi-destination calls, which have different callids.

I am currently studying how I can distinguish the different registrations of the incoming calls. I am using a HOMER system for that.

So far I can tell that the branch parameters inside the VIA headers differ as well as the line parameter in the INVITEs, The CallIDs and the tags in the From header are the same. Of course, the Contact (and Record-Route) headers differ as well.

I guess the dialplan is running in two (or more) different threads such that I cannot easily store and compare the values once I retrieve them. Maybe with a little wait time and some fiddling I can route one of the calls where the other ones stay passive.

Still no calls from German Telekom to check my phone system…

The three servers you get from the SRV response are completely independent server structures. And they are prioritized:

As described within RFC 2782 [47] a client MUST attempt to contact the target host (P-CSCF) with the lowest-numbered priority it can reach; target hosts with the same priority SHOULD be tried in an order defined by the weight field. Within Deutsche Telekom network normally only the priority field is used.

It’s not intended and not allowed to register to more than one of the returned servers. All subsequent SIP requests must go to this server - the other servers don’t know you. Asterisk doesn’t support this. You have to take own means to ensure that asterisk always uses the same destination server (something like bind’s RPZ e.g.).
As a customer since more than 6 years of SIP / All-IP I never had any problem with the SIP servers of Deutsche Telekom. They are / have been rock stable so far and I think that behind each returned IP resides a complete HA cluster architecture.

Edit 15.05.2021:
Here you can find some more information about transports in pjsip / asterisk, asterisk and mediasec (needed for srtp and Deutsche Telekom) and how to disable a server transport in asterisk / pjsip. A complete example on how to configure Asterisk / pjsip for Deutsche Telekom(based on tls).

When you quote from the 1TR114 doc, you should know that it also somewhere says that the old registration times out, when one uses the latest advertised SRV record with priority 10. This is exactly the problem with the ALL-IP accounts and this does not apply to the SIP-TRUNK products.

The problem is how they rotate their priorities and there are not 3 but at least 5 proxy servers involved, although only 3 are returned with the RR type 33 requests. Let’s say we have 4 servers A-D, the pattern is as follows:
Lookup #1: A B C
Lookup #2: D B C
Lookup #3: A B C
Lookup #4: D B C

If you obey the TTL values of 7200, then with the the usual algorithms the next lookup will occur every 3600 seconds, i.e. there will be a different registration server about every hour and this leads to two problems.

(1) If you start a call while registered to server A and in between the account gets re-Registered with server D, then this call may get terminated after a while since the account is no longer registered with A and this gets checked. I have some pcap traces of this scenario and there is a unique cause code for that in the records.

(2) If you are registered to A and start a new call, which implies that Asterisk will start a DNS lookup, any new returned proxy address will let the call fail. In essence the Telekom server expect you to use only the outbound proxy that you are registered with. Asterisk simply does not do it this way.

Even if you do calls only with the registered server, there is still problem #1.

In a private environment none of these problems really matter, but I have customers, where it does. Think of a small medical office that treats about 100 patients a day. You end up with 100-200 calls per day and about the same number of outgoing calls. Now you can do the math and check how often calls fail statistically. None of these problems occur with the SIP-TRUNK, but the ALL-IP product is the successor of the “ISDN Mehrgeräteanschluss”, which is what you’d find in small businesses and it is difficult to persuade them to switch to a TRUNK product (with a different external phone numbers).

You are right that their servers are rock stable, such that one does not need the failover capabilities of their system. One can do the DNS lookup outside of Asterisk as part of the startup script and use a static server like h2-epp-100.edns.t-ipnet.de as the outbound_proxy within Asterisk. With the next restart, this might change.

I would say: “Nice try, but no cigar.” With the exception of the VoIP products, which are offered with symmetrical fiber optic connections, in my opinion all other German VoIP offers are worse with more quirks, :).

BTW, your choice of words is a bit strange for a customer response and I did not mention the (older) setup requirements for TLS support.

That’s interesting: I’m running since years now a script which feeds a bind RPZ. It does the NAPTR and SRV lookup and feeds the RPZ of bind. Asterisk does the lookup via the local configured bind (nameserver in /etc/resolve.conf). This script takes, too, care of changes in the DNS resolution. Since years, I didn’t see any change. Always the same answer. If the answer would change, the script checks for running outbound calls in asterisk. If there is no call, an unregister is issued, the RPZ is changed and a new register is issued. But since years, I didn’t encounter any change (Telekom All-IP).
Which DNS servers are you using for the resolution of the SIP-Servers? Do you use the one you get by the pppoe login or any other? I’m using the one given by Telekom during pppoe login.

My script uses this dig call to get the SRV entries for sips (as resolution of the NAPTR call done before):

dig +noall +answer _sips._tcp.tel.t-online.de SRV @2003:180:2:a000::53 | sort -u

The sort is done on base of the priority each server has:
_sips._tcp.tel.t-online.de. 3592 IN SRV 10 0 5061 s-eps-110.edns.t-ipnet.de.
_sips._tcp.tel.t-online.de. 3592 IN SRV 20 0 5061 h2-eps-100.edns.t-ipnet.de.
_sips._tcp.tel.t-online.de. 3592 IN SRV 30 0 5061 d-eps-100.edns.t-ipnet.de.

Only the first entry is given to the RPZ. The complete file feeding the RPZ looks like this:

server your_bind_server_ip
zone rpz-tonline
update delete tel.t-online.de.rpz-tonline.
update delete _sips._tcp.tel.t-online.de.rpz-tonline.
update add tel.t-online.de.rpz-tonline. 60      NAPTR   10 0 "s" "SIPS+D2T" "" _sips._tcp.tel.t-online.de.
update add _sips._tcp.tel.t-online.de.rpz-tonline.      60 SRV  10 0 5061 s-eps-110.edns.t-ipnet.de.

This can be fed to bind with nsupdate.

Your point (1) is correct - but I think Deutsche Telekom is smart enough to know about this problem. Therefore they for sure won’t drop running published servers in a few minutes. They will plan it over hours or even days. They may drop it from the DNS response - but not the running system itself. They will wait until it’s empty and nobody uses it any more. This means for the client: use the actual server as long as you have running calls. If there is no more call active, go to the next server. It’s pretty easy.

Your point (2) is correct, too, that’s why you have to feed asterisk from outside (via bind RPZ) with only one correct DNS answer. Asterisk must see just one server of the SRV list.

You can’t say anything about my environment, because you don’t know it :-). We are working meanwhile for three companies here at the same time, and we are registering 4 different numbers (we could use up to 10) without any problem. There is / was no outage or any other problem if you do it like I described it. The only extreme short outage (about 1 second) could be, if the DNS record changes. But I never saw this since years here.

Almost none will fail if you do it as described. I’m e.g. working for a big company since more than a year in homeoffice and it would be a great problem, if this way wouldn’t work reliably. We have meetings, lasting over hours or even days (-> 10h) - I didn’t face any dropped call so far (but lots of other participants have to recall usually)!

I admit it would be much better or easier if Asterisk would handle this operational concept itself - but they didn’t want to as of now (they know about this problem since long ago).

In my setups I link Asterisk with the unbound libs such that I have more control about name resolution (if necessary). Normally, I let the lib scan /etc/resolv.conf, which basically results in whatever the system provides. I’ve seen setups on ip-phoneforum.de where a local dns server with a fake domain was used for that. I don’t think that this is necessary and it is more straightforward to use the names of the servers as provided by the SRV records. One only needs to explicitly set the outbound proxy.

That said, I cannot confirm your observations, but as I mentioned the dns pattern for ALL-IP is different than for the SIP-TRUNK. It seems that there are some regional differences, though. I have a customer in Prussian Sibera who has never complained and he has an “unpatched” configuration. Either you observe more systematically, or add an identify section to your SIP configuration with something like:

[tcom_<whatever>_identify]
match=tel.t-online.de
endpoint=tcom_<whatever>_endpoint

It’s probably not really needed, as the line parameter is more important. The side effect is that after about a day you can list the endpoint and see what Asterisk has found for tel.t-online.de. In my case I currently have:

> pjsip show endpoint ...
...
   Identify:  tcom_..._identify/tcom_..._endpoint
        Match: 217.0.27.161/32
        Match: 217.0.28.32/32
        Match: 217.0.29.32/32
        Match: 217.0.20.194/32
        Match: 217.0.28.34/32
        Match: 217.0.29.36/32
...

The starting point for this discussion was a few years ago when PJSIP was introduced and some other things changed. If you look at the Asterisk code, then I would say that the DNS resolution remains transparent for various modules, regardless of whether an A record is resolved directly or recursively via NAPTR/SRV requests. Ultimately, the SIP code has no idea what the structures look like in between. You enter a symbolic address at the top, the the registration process gets a single IP at the bottom.

If you don’t do that, you have to drag along the registration IP for every registration and signal whether you want to use it. Much more code needs to know about the DNS internals. If a change is pending, you also have to check whether the associated endpoint is currently active. Basically, you have to check before every INVITE and every OPTION request whether something has just changed (although you can do without qualifying the accounts with the Telekom servers). That would be more than ugly.

There are some people at Telekom who know their way around. Years ago I complained to their business customer hotline, but initially nobody knew what I was talking about. Later I received a call from Hamburg, and this technician was able to understand everything after a bit of back and forth, but also said in a general sense that nothing could be changed so quickly with a mass product.

Again, it works, but as long as the Telekom servers do not check the authentication data centrally, the system, which is actually designed for redundancy and failure safety, is not really fully used - in my opinion. Some parts of the 1TR114 seem to have forgotten that SIP is a stateful protocol.

That’s true - they are behaving differently in some parts. One part e.g. is, how they are handling the registration of one or more numbers / ranges. As I read some time ago, it’s not possible to do it on the same transport (because Asterisk uses the same connection to the same destination when on the same transport). If you want to register more than one number / range, you should try it with different transports.

Well, I don’t use identify at all, or, to be more precise, I’ve set it intentionally to 127.0.0.10 (it will never match). I’m always using the line option, because it’s the only way to reliably bind an incoming call to the corresponding trunk configuration - this is necessary to get the correct SDP configuration (my trunks have different SDP configurations - this would be impossible on base of the match configuration, because they’re all coming from the same IP).
Interesting - I don’t see any additional match entry here (Asterisk 18.4 - but there would be just one).

IP-Address or Hostname?
If you are using TLS, you must use the correct hostname (at least with All-IP) - otherwise you can’t connect because of an obvious TLS error during TLS handshake. All my statements always refer to TLS. I don’t use any other protocol any more.

“Floating” servers:
Well, normally I’m the one making things complicated :slight_smile: . This time, I think you’re making it even more complicated as I do.
From my point of view, it’s enough to periodically observe the actual DNS answers and give them to Asterisk if appropriate. This is possible, because I assume that Telekom doesn’t switch off a running server holding thousands of calls, which therefore would break.
My implementation in Asterisk would be pretty strait forward: On first Register, remember the servername and reuse it for all subsequent requests (if you are using tcp/tls you even don’t need it any more, because there isn’t anything to do on tcp/tls level - you have a static tcp/tls connection). A parallel thread continuously checks the DNS. If it detects that a new server should be used, queue an unRegister (which is processed at a moment the trunk is idle - asterisk knows if a trunk is idle), provide the new server name for the next register and do the register with the new server. Now you’re ready to go with the new server. The whole process shouldn’t take more than a second. That’s really pretty easy!

If you want to register more than one number / range, you should try it with different transports.

Hmm, I am running 2 different Telekom SIP-Trunks plus 5 singly registered ALL-IP lines on a single Asterisk box. All with TCP transport and I can nicely check the states of these connections in the router. I think I read about your concerns before myself, but never checked why these configurations don’t work. For Asterisk configuration recipes probably the same applies as for cooking recipes on the Internet. Not everything is a revelation.

this is necessary to get the correct SDP configuration

That’s why I use the line parameter (The PJSIP Outbound Registration 'line' Option ⋆ Asterisk).

A parallel thread continuously checks the DNS.

You’d need to check before any outgoing request. The Telekom servers don’t go away, even if they are no longer advertised. The only thing that changes is the prioritization and there are more than 3 servers. When you query an address it depends on the queried DNS server what you get, simply because caching may be different. Typically you get different results whether you query a Telekom server or a Google DNS server. Actually, tel.t-online comes with its own set of name servers. AFAIK, Asterisk does not look for NS records when it checks the NAPTR record, though this would be possible. With the unbound lib, this can be configured, but I haven’t checked that yet. The 1TR114 doc somewhere says that one should only use the Telekom DNS servers, but the internal servers seem to visible from everywhere nowadays.

Well, normally I’m the one making things complicated…

This is also the case here. I have a separate cron job that updates small text files (every couple of hours) with a single line “outbound_proxy=…” that are included in the config files at the higher level. Basically, every restart potentially pulls a different proxy. This does not require any src code changes. On the other hand, unless there is a restart or reload, the same server could be used for months and that does not seem to be a problem.

The other approach is to hook up to virtually everything Telekom offers, where I was a bit confused initially.

Regarding All-IP:
They will “work”. But there is a slight difference in handling depending if you are using one transport for all or an own transport for each number. You will see the difference, if you issue an unregister for one of the trunks. If you’re using one transport for all trunks (= all numbers are handled through the same TCP connection), all the registered numbers will be dropped (and not only the one you issued the unregister). Why is it happening? Because Telekom drops the TCP connection after a timeout after the unregister has been performed.
If you use different transports for each number, each trunk will get its own connection and therefore each number can be handled individually. Means: unregistering of one number doesn’t drop unaffected numbers. You are even able to manage them individually if it comes to change the destination server.

Regarding SIP Trunk
I can’t say anything from my own experience, but if I remember correctly, somebody wrote about his problems registering more than one number / range to the SIP trunk using the same connection (= same transport). Maybe he did other errors - I don’t know. If you say, you are able to register more than one trunk to their SIP trunk using the same destination servers through the same transport (as it is possible with All-IP), I am really happy to learn!

No - definitely not. Why?

Correct! That’s why it’s unnecessary to always check before each request. You have a static connection up and running. This connection can always be used.
If there is any change detected in the parallel running thread checking for changes, the change should be rated. Is there just a priority change? I most probably wouldn’t do anything. Did the server completely disappear? Put a reRegister to the queue which is executed trunk based if the trunk is idle (one more reason why you want to have an own transport for each trunk).

I don’t use outbound proxies at all. I can’t see the use case at the moment. Which problem does it solve exactly (for All-IP)?

I need to check whether the connections get dropped, but I think they will not for my configuration. In the past, I’ve played with manually registering and unregistering accounts, but i didn’t notice any side effects.

In my case the outbound_proxy is already outside, but it turned out that this was the only parameter I needed to control to switch between evaluating an A record or the entire DNS structure.

You need to check unless you are controlling the DNS resolution yourself. Asterisk queries the DNS servers for any outgoing connection and in case there is a change, your call goes nowhere since you are registered to a different server. With my setup I don’t do anything specific about the DNS resolution. The only thing I do is that I (usually) restrict to which server I talk using the outbound_proxy parameter. So far this is the most stable setup.

Asterisk logs DNS queries with a debug level of 2 or higher.

I should say that I usually have a safeguard in my routers as I set up some specific “outbound NAT” rules for the Telekom net such that I don’t need to maintain state about the connections. The idea is that only the registrations should suffice without any need to send OPTION requests or any keepalive tricks to keep the connections open (for IPv4).

Another reason for using dedicated transports was the fact, that reRegistering of one or other number sometimes timed out (or it took several package resends - and they have all been answered seconds later - which means none of the package was lost) when they reRegistered at exactly the same time. This means a break during the retry time. Since I’m using dedicated transports I’ve never seen any problem so far (even if they are reRegistering to exactly the same time). Why do I know it? Because I’m monitoring some errors - these are one of them.

BTW: dig +noall +answer _sips._tcp.reg.sip-trunk.telekom.de SRV gives here exactly 3 entries - no matter if asking Google or Telekom DNS. Exactly the same servers for sip.

Not for me. I get usually different answers:

$ dig +noall +answer _sips._tcp.reg.sip-trunk.telekom.de SRV
_sips._tcp.reg.sip-trunk.telekom.de. 3600 IN SRV 30 0 5061 d-ipr-a02.edns.t-ipnet.de.
_sips._tcp.reg.sip-trunk.telekom.de. 3600 IN SRV 10 0 5061 n-ipr-a02.edns.t-ipnet.de.
_sips._tcp.reg.sip-trunk.telekom.de. 3600 IN SRV 20 0 5061 n-ipr-a01.edns.t-ipnet.de.
$ dig @8.8.8.8 +noall +answer _sips._tcp.reg.sip-trunk.telekom.de SRV
_sips._tcp.reg.sip-trunk.telekom.de. 3599 IN SRV 20 0 5061 s-ipr-a02.edns.t-ipnet.de.
_sips._tcp.reg.sip-trunk.telekom.de. 3599 IN SRV 10 0 5061 s-ipr-a01.edns.t-ipnet.de.
_sips._tcp.reg.sip-trunk.telekom.de. 3599 IN SRV 30 0 5061 d-ipr-a01.edns.t-ipnet.de.

So far I’ve not seen any problems when registering multiple accounts at the same time.

Today they are differing, too. But I’m getting completely different servers compared to you - even via Google. That’s surprising.

No, not at all, :innocent:. Actually, the Telekom doc urges you to obey the TTL values. With “direct” contact you get almost always a different answer, i.e. your PBX would switch registrars faster than you can make phone calls. At a certain location they have about 5-6 servers, maybe sometimes more. Overall, I guess they have many more, so your reply means that you are not close to my location.

It’s basically an IP Multimedia Subsystem (IP Multimedia Subsystem - Wikipedia) and we are dealing here with their Proxy-CSCF. Kamailio seems to have special registration procedures in a special module for such systems. Maybe in a couple of weeks I can say more about that…

So far I’ve received no angry calls because of my multiple registrations and I am close to handle the additional incoming calls within the dialplan such that the local phones see only a single call.

My surprise only referred to Google’s answers - not to Telekom’s answer.

What do you meant with “direct” contact? Querying Telekom’s own DNS servers?
I did the same lookups again at the moment and got the very same answer as yesterday. Doesn’t seem that “unstable” here - or just chance. Lets wait and see.

Which module are you referring to regarding Kamaillo? There are several IMS-modules. Idea would be probably to put a Kamaillo server before Asterisk, which handles the connection to the ISP’s trunks and presents just one single connection to Asterisk. It would therefore handle the “grouping”.

In your first posts, you asked about what happens if a Registration ends during holding a call. I would expect that this call would break. Therefore Kamaillio’s module should do something like this: Detects new servers, registers with them and dropping removed servers if appropriate (i.e. if there are no more active calls - but as long as there are calls, the registration is refreshed as long as necessary). At the same time, this module doesn’t route any calls any more from or to a server, which has been removed but is still hold registered until a previously started calls ends. Or are there special methods to migrate a running call from one server to the other? Sounds cool.

Yes, that’s what I meant. It’s not a lack of stability.

Before I look at the details of Kamailio, I need to buy some good red-wine for this type of weekend project. Currently, I cannot tell to what degree I’ll be able to control things and what the Kamailio modules actually do.

Anyway, Asterisk/PJSIP has no problems to register with several Telekom servers for a single account and this is indirectly described in the 1TR114 doc. The only minor problem is how to route these multiple INVITEs inside Asterisk. This can be done within the dialplan.

If you don’t do anything (maybe a re-INVITE after the registration proxy changed), the call will ultimately break, but that affects only longer calls. I have some PCAP traces about this scenario. Keeping the registration with an old server doesn’t make too much sense, I think, since the server might be taken out of the SRV list for maintenance reasons and will go away anyway some time later. As I said, at the moment I don’t know what is possible. If I find something that is interesting and displays some unusual and spectacular configurations, I think I’ll suggest that for a conference. It’s also possible that I’ll get an angry call from the Tele-Comedians to stop all of this…