It’s not a good idea, I totally agree with that, but if I request something the provider doesn’t want as it’s too low they should respond with the appropriate response (defined in the RFC) instead of just extending it (it’s not like there is no option to reject a too fast registration interval). Extending it is undefined in the RFC, so if they do something to us that has no defined response or action we can basically do whatever we want as it’s undefined and that basically means the outcome is random or endpoint-specific. If we do something the provider doesn’t want as a reaction to their broken implementation they will follow the RFC as they will notice that it’s the only way to make all clients do what they want, which is something positive for everyone.
What implementations do and what they should do is vastly different. In the case of Asterisk we try to be as forgiving as possible within reason. In this case as we can handle such a thing, we do. It appears as though even chan_sip has behaved this way for its lifetime.
Is there a possibility to add the non-standard conform and weird option “talk_only_to_registration_host_despite_the_service_provider_offers_a_bunch_of_otherwise_valid_ips”?
A patch would be to detect this specific failure of an outgoing call and enforce a fresh registration. Easily done from the cli, but there seems to be no associated pjsip function such that one can do it also from the dialplan.
I have a box, where the Telekom DNS failure occurs regularly (and others that don’t show that problem, which I do not understand) and I’ll check this weekend whether the new registration always works.
We review any patches that are put up for review, so if someone posted such a thing we would review it and consider it for inclusion. It ultimately depends on the code and implementation though.
Implementation would be difficult though. Such functionality is counter to how things work.
Maybe it is something for PJSIP. I am thinking more into the direction of a quick and dirty patch along these lines:
Dial(…)
; evaluate DIALSTATUS and HANGUPCAUSE
System(asterisk -rx “pjsip send unregister damned_line”)
Wait(4)
System(asterisk -rx “pjsip send register damned_line”)
Dial(…)
; and give up
This is not runnable code. In case it works one would only need the ability to register/unregister from the dialplan. I think that the problem will be gone sooner or later.
Being forgiving in this case does in my opinion mean that it still works and the only way to guarantee that is ignoring the response if it’s unexpected. Obviously this is not working, it might be nice to the provider but the provider is presenting me something not-nice, so why should we be nice in return? We are nice enough to not re-register every second and use our 10-second minimum interval, thats enough to say the we don’t want to harm the provider or their infrastructure. They expect us to do something which is not covered by the RFC, we can decide to do that (which is how it is now) and this is fine if everything works, just when stuff is broken they obviously don’t want us to do that, otherwise it would work.
Having a register/unregister function in the dialplan could solve many issues, I had another provider before that was forcing long times aswell that were way to long for my firewall, so things were not working when I wanted to place an outgoing call unless I did a sip reload before (causing re-registration).
Ignoring the response doesn’t guarantee it would work for everyone. It may help your scenario, and then hinder others. Despite the intentions of not wanting to harm the infrastructure that doesn’t mean the infrastructure itself will not see such traffic as disruptive or problematic.
Ultimately you’re trying to work around a provider that is outright lying to you. That’s not normal and is a bug on their side. Other providers who may violate the RFC and return back a higher expiration don’t lie. What they say is what it is.
If you do come up with an optional patch to cover it then it’ll be reviewed like I mentioned, not just by me but by others as well. It’ll be a collective opinion on whether it makes sense to include or not.
Another question might be - how do other implementations/devices now behave with this provider? What do they do in the scenario?
They should not return a higher expiration though, the RFC doesn’t allow that. If they return something invalid (higher than requested) I would simply drop the invalid parts and try to make the best out of what else is there, meaning the existing expiration header from our request which was apparently not too low as otherwise according to the RFC would have resulted in an error. This behaviour is just like in html when you add a tag that doesn’t exist, the browser still does it’s best to render it somehow, dropping the part it can’t understand. It doesn’t try to somehow interpret the bad information or something like that.
Also I do not think that this causes issues for anyone, at least not something serious. If there is really another provider violating the RFC and if they are indeed returning something higher, all the user has to do is set the expire to what they return while I have no option to fix it my issue the current implementation. We could even log a warning like “Your provider returned an invalid expiry-duration, we will use the duration from your settings. If you do not want this, please change it to the providers value which is xxx”.
I do not want to put the effort into writing a patch that will never make it in because it’s something completely against the “philosophy” of the Software, that’s why I am trying to make sure that before starting to work on it. What if I just add a warning (or error so everyone sees it) for now, saying that the provider is currently violating the procol and in future versions handling of this will be different and they should consider changing their expiry-value to xxx if everything works right now. That’s enough to start a debate if that’s the way to go and like 3 lines of code. I’m not trying to find a fix for me here, the provider will probably fix this soon also to avoid legal trouble when their protocol description (which they have to provide by law) is incomplete or incorrect and for me things currently work, I’m trying to make everyone benefit from this though.
It seems like the major German Router Manufacturer AVM simply ignores the response in their devices, that’s why they still work. I will have to check that again though. Could also be that they simply request 3600 seconds (they have a “profile” for every provider and have the settings for them defined in there) and that’s why they aren’t affected.
Oh and by the way: I can totally imagine that they just “by accident” returned the higher value and that this is not done on purpose, especially since this causes issues now if people actually respect that.
That’s the problem - “all the user has to do is set the expire to what they return”. If your behavior change was optional and off by default then it could have a better chance of going in. The tolerant behavior as it exists now has been there for almost 20 years. While the RFC states one thing in practice it hasn’t been a problem for everything else. This may be extremely common in fact. It’s just that it works for people, so noone has ever noticed or cared. I’ve even asked other individuals who don’t use Asterisk but do SIP and they’re of the opinion that while the RFC says one thing in practice everyone seems to ignore that part.
That’s pretty much my thoughts on this. I’m not sure I have anything else to add so I’ll leave this be.