Asterisk suddenly fails to register SIP


#1

Ive had asterisk running on Ubuntu for a few years. It’s been set and forget. But in late December all my registration attempts suddenly started failing:

[Jan 11 12:52:10] NOTICE[2829] chan_sip.c:    -- Registration for '09xxxxxx@sip01.mynetfone.com.au' timed out, trying again (Attempt #108)

This was about the same time my cable broadband provider gave me a new gateway modem. It’s easy to conclude the gateway is blocking registration/Sip requests or replies.

However, using wifi from my phone and a SIP application (csipsimple) I can both register and make calls. To me this means it cannot possibly be the gateway.

Its also tempting to conclude its the VOIP provider that is not responding to my requests. But I use two completely independent VOIP providers. Both (mynetphone and clickncall) now fail with registration timeout.

I therefore conclude it must be something to do with asterisk. However the last change I made to asterisk was a minor extension.conf change on 7th December. The system worked fine up to the 20th December.

I did finally realise the host IP address had changed with the installation of the new gateway. So I updated EXTERNADDRESS and HOST in sip.conf. This made no difference.

So in trying to diagnose the problem:

  1. I can ping the sip urls of both providers from the machine I host asterisk on.

  2. traceroute shows the requests leaving the gateway, reaching my ISP and for one provider reaching their servers, for the other the last 15 hops or so are all * * * (presumably showing the request is being sent.

  3. tcpdump agrees the requests are going out. Whether they are valid or not is a different matter.

12:58:38.794201 IP (tos 0x0, ttl 64, id 53587, offset 0, flags [none], proto UDP (17), length 461)
    192.168.4.80.5060 > 125.213.xxx.xx.5060: [bad udp cksum 0xe4e9 -> 0xc347!] SIP, length: 433
	REGISTER sip:sip01.mynetfone.com.au SIP/2.0
	Via: SIP/2.0/UDP 58.173.112.144:5060;branch=z9hG4bK66b69844;rport
	Max-Forwards: 70
	From: <sip:09466676@sip01.mynetfone.com.au>;tag=as262c4b53
	To: <sip:09xxxxxx@sip01.mynetfone.com.au>
	Call-ID: 5cbace9c294d43193dd2710b50939d78@127.0.0.1
	CSeq: 228 REGISTER
	User-Agent: Asterisk PBX 11.7.0~dfsg-1ubuntu1
	Expires: 120
	Contact: <sip:09xxxxxx@58.173.xxxxxxxx:5060>
	Content-Length: 0
	

So Im a bit stumped. Each explanation I think of I can discount as follows:

  1. The responses are blocked by the gateway and never reach asterisk. But my VOIP app would fail if this were the case.

2.The VOIP provider has changed something. But for two different providers to do this at the same time is implausible. Again its unlikely my SIP app would still be OK and asterisk not.

  1. asterisk is suddenly sending invalid registration information. But the registration credentials have not changed in asterisk. For good measure I set up my VOIP app again with the credentials from asterisk and it works just fine (through the same gateway and internet connections).

Its almost as if Ive got a new asterisk installation and Ive messed up the config. Except the system is not new, and has been running fine since 2013. Obviously something has changed somewhere.

Does anyone have any plausible ideas, given the analysis I’ve done above?


#2

The SIP REGISTER looks perfectly fine, it would seem it is either 1 or 2.


#3

Well that’s my point. It cannot be 1 as my phone sip app could not possibly workbif it were. It can’t be 2 either, realitsically speaking. Both mynetphone and clickncall change their registration process in December 2018 and kick off thousands of asterisk users? I really don’t think so!

I think you’re as stumped as me!!

Does anyone have any other suggestions for what I can check?


#4

Have you checked with your providers to see if they are getting the messages from Asterisk? If they are getting the messages is everything OK with them on their side when they get them? And if they are, are they replying? So far you are speculating what could be happening and haven’t laid out any sort run down of how you’ve eliminated items that could be the cause.

I’m seeing this: [bad udp cksum 0xe4e9 -> 0xc347!] SIP, length: 433 <-- That is generally an indication something ain’t right on your system. What that is, I’m not really sure.

However, is there a reason you are running a version of Asterisk that has completely EOL’d over for a year but also a release of it that is over 6 years old? Part of your issue is going to be that not only is Asterisk 11 completely EOL (therefore no fixes, updates) but that Chan_SIP has been deprecated and not supported since Aug 2014. The few updates/fixes it has gotten since then has been community driven.

So are you saying that every user can’t register or connect or that just your providers peers can’t register/connect? You need to be a little more specific on that because if ALL users are being dumped then yeah this is more a system issue than anything else. If it is just your provider peers, then that could be something completely different.


#5

That will be because he is using a long term stable version of Ubuntu that doesn’t expire until April this year, although appears to have been created in 2016.

Linux distributions tend to be conservative about the version of the software they package, and will generally only apply security fixes (although these may be back ports from later versions) during the life of the distribution. Stability of the behaviour of the distribution is paramount, and there has to be a major security issue before they will do anything that risks breaking backwards compatibility or introducing new bugs.

The consequence is probably that this issue should be taken up with the Ubuntu packager, rather than the Asterisk community.


#6

Thanks for the replies.

If I can reply to the last comment re ubuntu first. David is correct and this is the asterisk version that Ubuntu distributes.

While the version of asterisk may be the cause, I cannot see how it would suddenly be a problem on 20 December 2018. I was on the same version of both Ubuntu and asterisk both before and after. I suspect concluding it’s a Ubuntu problem may not get me anywhere with the Ubuntu community largely because one day my registrations worked, the next they did not. While its possible the same version of asterisk and Ubuntu work one day and not the next I personally think they’d see it as a tall story and point me right back at asterisk or my environment/network.

Yes I have contacted the provider. They were less than helpful and would only successfully connect with a soft phone and conclude the problem is therefore not at their end. They added given I can also connect with a soft phone from within my own network its most certainly not them. They do have a point.

They tell me they cannot look to see if they receive my registration requests. We all know that is nonsense, but what can I do? They stubbornly refused to help further.

Regarding not clearly laying out what I’ve done, my apologies if I wasn’t clear. Let me try and clarify:

1 eliminated it being my gateway - I can successfully use a softphone through the same gateway.

2 eliminated it being the comms path by successfully pinging both providers

3 eliminated not receiving registration responses as my softphone receives them

I can’t say I’ve eliminated the following, but I do say these are on the list of least likely (is last on the list of potential causes) to highly improbable:

1 highly unlikely to be the provider in general as both providers suddenly failed on the same day. The providers have nothing whatsoever to do with each other and would.not synchronise changes.

2 highly unlikely to be the provider that made a change as they would both have had to make a change on the same day to make asterisk suddenly stop working.

3 highly unlikely to be the providers that both made a change making asterisk no longer work (if that were the cause) as there would be many other asterisk users worldwide (using Ubuntu) suddenly having the same issues. As far as I know there are none.

I’m hoping that is a bit clearer? But despite my conclusions above, I still.am no nearer solving this puzzle!

I also noted the bad checksum. I have no idea if it occurred before this problem, nor what might cause it to suddenly and consistently happen if so.

However a quick google tells me it isn’t a problem as per this article.https://www.google.com/amp/s/blog.bumpinthe.net/2016/05/18/bad-checksum-in-packet-captures/amp/

Certainly everything above, to me, points to this being either asterisk or the machine it’s running on. But I can’t think what might have changed and how to cause this.


#7

Ubuntu actually distribute different Asterisk versions with different Ubuntu versions. The one in question is on the oldest current version, which is going end of life in April this year.


#8

I can look at upgrading my Ubuntu version, but it’s a major piece of work as not all the software running in my platform is compatible with later versions. I’d need to do extensive testing.

Given the asterisk version I run is supported (right?) what in that version could suddenly stop working? And why would it suddenly do that?

I’m not rejecting the idea, I’m just trying to hit the most likely cause first. I still see this one as unlikely which is fine except it’s a lot of work to update my OS.


#9

Wrong. Full support ended on October 25th 2016 and security fix support ended on October 25th 2017.


#10

Hmm. Ok, well Ill look at upgrading just asterisk to elimenate that as a cause since that seems to be the suggestion for what the problem is. Unlikely though it seems to me, I don’t have any better suggestions!

Assuming it’s straightforward to upgrade asterisk it will prove the version was the problem. Or not as the case emay be. I suppose there’s nothing to lose as it isn’t working any more.


#11

There’s a wiki page[1] which details the dates for the different versions, it’s always wise to consult it periodically.

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Versions


#12

Then the Ubuntu package maintainer is actually fixing the issues in Asterisk 11? Because no one else has and won’t be. Just because the OS is LTS until April doesn’t mean the software running on it is LTS during the same time. Ubuntu having an old version of Asterisk in their packages is like a 3 rd party using Asterisk 11 for their PBX coughIssabellacough. Just because the PBX is LTS doesn’t make the Asterisk version being used also LTS with it.

So again, Asterisk 11 is EOL and unless the individual package maintainers are make direct fixes in Asterisk then the Asterisk version is still EOL. Basically there could be issues in Asterisk 11.7, in fact there were. There have been a few bug fixes since then as Asterisk 11 went out on 11.27 or something, including the last release being a security fix 2 weeks before the SFO expired.


#13

LTS doesn’t make the Asterisk version being used also LTS with it.

That’s a good point. And definately reason to upgrade which I will do.

While starting on that I happened to make a test call wich to my surprise worked. What the hell? So I looked in my logs . It appears that on 12th Jan asterisk registered the providers again and started working.

Its possible that it’s dropping in and out, although as of right now it seems to be registered just fine.

Given its an old version, which I will upgrade, so just as a point of discussion here is the log where it started working again (possibly failing later on)

 out, trying again (Attempt #362)
[Jan 11 18:39:47] NOTICE[2837] chan_sip.c:    -- Registration for '266xxxxxxx@sip.clickncallnow.com' timed
 out, trying again (Attempt #363)
[Jan 11 18:40:07] NOTICE[2837] chan_sip.c:    -- Registration for '266xxxxxxx@sip.clickncallnow.com' timed
 out, trying again (Attempt #364)
[Jan 11 18:40:27] NOTICE[2837] chan_sip.c:    -- Registration for '266xxxxxxx@sip.clickncallnow.com' timed
 out, trying again (Attempt #365)
[Jan 11 18:40:47] NOTICE[2837] chan_sip.c:    -- Registration for '266xxxxxxx@sip.clickncallnow.com' timed
 out, trying again (Attempt #366)
[Jan 11 18:41:34] ERROR[2837] netsock2.c: getaddrinfo("sip01.mynetfone.com.au", "(null)", ...): Temporary 
failure in name resolution
[Jan 11 18:41:34] WARNING[2837] acl.c: Unable to lookup 'sip01.mynetfone.com.au'
[Jan 11 18:41:49] WARNING[2837] chan_sip.c: Retransmission timeout reached on transmission 43db31ad111a591
f350eb25a7c063f4b@127.0.0.1 for seqno 467 (Critical Request) -- See https://wiki.asterisk.org/wiki/display
/AST/SIP+Retransmissions
Packet timed out after 62699ms with no response
[Jan 11 18:41:49] NOTICE[2837] chan_sip.c:    -- Registration for '266xxxxxxx@sip.clickncallnow.com' timed
 out, trying again (Attempt #367)
[Jan 11 18:41:50] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Reachable. (1010ms / 2000ms)
[Jan 11 18:41:50] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Reachable. (1034ms / 2000ms)
[Jan 11 18:42:09] NOTICE[2837] chan_sip.c:    -- Registration for '09xxxxxxx@sip01.mynetfone.com.au' timed 
out, trying again (Attempt #2)
[Jan 11 18:42:09] NOTICE[2837] chan_sip.c:    -- Registration for '266xxxxxxx@sip.clickncallnow.com' timed
 out, trying again (Attempt #368)
[Jan 11 20:37:55] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Lagged. (2709ms / 2000ms)
[Jan 11 20:38:05] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Reachable. (10ms / 2000ms)
[Jan 11 22:13:10] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now UNREACHABLE!  Last qualify: 9
[Jan 11 22:13:20] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Reachable. (10ms / 2000ms)
[Jan 12 00:32:02] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now UNREACHABLE!  Last qualify: 10
[Jan 12 00:32:12] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Reachable. (16ms / 2000ms)
[Jan 12 03:06:20] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now UNREACHABLE!  Last qualify: 12
[Jan 12 03:06:28] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now UNREACHABLE!  Last qualify: 14
[Jan 12 03:07:38] NOTICE[2837] chan_sip.c:    -- Registration for '266xxxxxxx@sip.clickncallnow.com' timed
 out, trying again (Attempt #2)
[Jan 12 03:07:58] NOTICE[2837] chan_sip.c:    -- Registration for '266xxxxxxx@sip.clickncallnow.com' timed
 out, trying again (Attempt #3)
[Jan 12 03:08:18] NOTICE[2837] chan_sip.c:    -- Registration for '266xxxxxxx@sip.clickncallnow.com' timed
 out, trying again (Attempt #4)
[Jan 12 03:08:36] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Reachable. (12ms / 2000ms)
[Jan 12 03:08:43] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Reachable. (12ms / 2000ms)
[Jan 12 03:45:46] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Lagged. (2170ms / 2000ms)
[Jan 12 03:45:56] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Reachable. (11ms / 2000ms)
[Jan 12 04:03:43] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now UNREACHABLE!  Last qualify: 12
[Jan 12 04:03:53] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Reachable. (19ms / 2000ms)
[Jan 12 04:40:01] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now UNREACHABLE!  Last qualify: 12
[Jan 12 04:40:11] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Reachable. (10ms / 2000ms)
[Jan 12 04:40:56] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Lagged. (2695ms / 2000ms)
[Jan 12 04:41:06] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Reachable. (16ms / 2000ms)
[Jan 12 04:58:10] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now UNREACHABLE!  Last qualify: 9
[Jan 12 04:58:32] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Reachable. (14ms / 2000ms)
[Jan 12 07:38:17] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Lagged. (2261ms / 2000ms)
[Jan 12 07:38:27] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Reachable. (12ms / 2000ms)
[Jan 12 08:19:40] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Lagged. (2481ms / 2000ms)
[Jan 12 08:19:50] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Reachable. (12ms / 2000ms)
[Jan 12 13:39:01] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Lagged. (3879ms / 2000ms)
[Jan 12 13:39:23] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Reachable. (19ms / 2000ms)
[Jan 12 13:53:28] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now UNREACHABLE!  Last qualify: 12
[Jan 12 13:53:38] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Reachable. (12ms / 2000ms)
[Jan 12 14:42:41] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Lagged. (2822ms / 2000ms)
[Jan 12 14:42:51] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Reachable. (14ms / 2000ms)
[Jan 12 15:46:47] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now UNREACHABLE!  Last qualify: 10
[Jan 12 15:46:57] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Reachable. (9ms / 2000ms)
[Jan 12 17:30:47] NOTICE[2837] chan_sip.c:    -- Registration for '266xxxxxxx@sip.clickncallnow.com' timed
 out, trying again (Attempt #2)
[Jan 12 19:04:04] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now UNREACHABLE!  Last qualify: 16
[Jan 12 19:04:14] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Reachable. (33ms / 2000ms)
[Jan 12 19:36:19] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now UNREACHABLE!  Last qualify: 12
[Jan 12 19:36:29] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Reachable. (11ms / 2000ms)
[Jan 12 21:20:14] VERBOSE[14864] chan_skinny.c:     -- Starting Skinny session from 192.168.4.80
[Jan 12 21:20:14] WARNING[14864] chan_skinny.c: read() returned error: Connection reset by peer
[Jan 12 21:20:14] WARNING[14864] chan_skinny.c: Skinny Client was lost, unregistering
[Jan 12 21:20:14] VERBOSE[14864] chan_skinny.c:     -- Ending Skinny session from 192.168.4.80 (bad input)
[Jan 13 00:45:06] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now UNREACHABLE!  Last qualify: 12
[Jan 13 00:45:16] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Reachable. (10ms / 2000ms)
[Jan 13 02:28:44] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now UNREACHABLE!  Last qualify: 1012
[Jan 13 02:28:54] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Reachable. (9ms / 2000ms)
[Jan 13 02:42:57] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Lagged. (2700ms / 2000ms)
[Jan 13 02:43:07] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Reachable. (11ms / 2000ms)
[Jan 13 03:06:11] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now UNREACHABLE!  Last qualify: 13
[Jan 13 03:06:25] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now UNREACHABLE!  Last qualify: 8
[Jan 13 03:07:31] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Reachable. (11ms / 2000ms)
[Jan 13 03:07:31] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Reachable. (11ms / 2000ms)
[Jan 13 03:14:35] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now UNREACHABLE!  Last qualify: 10
[Jan 13 03:14:45] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Reachable. (9ms / 2000ms)
[Jan 13 03:30:36] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now UNREACHABLE!  Last qualify: 14
[Jan 13 03:30:46] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Reachable. (17ms / 2000ms)
[Jan 13 07:51:57] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now UNREACHABLE!  Last qualify: 10
[Jan 13 07:51:58] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Lagged. (3261ms / 2000ms)
[Jan 13 07:52:07] NOTICE[2837] chan_sip.c: Peer 'clickncall' is now Reachable. (9ms / 2000ms)
[Jan 13 07:52:08] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Reachable. (36ms / 2000ms)
[Jan 13 08:01:00] NOTICE[2837] chan_sip.c: Peer 'mynetfone-home-prv' is now Lagged. (2029ms / 2000ms)
[Jan 13 08:01:00] NOTICE[2837] chan_sip.c: Peer 'out-mynetfone-bbwd' is now Lagged. (2065ms / 2000ms)
[Jan 13 08:01:10] NOTICE[2837] chan_sip.c: Peer 'mynetfone-home-prv' is now Reachable. (29ms / 2000ms)
[Jan 13 08:01:10] NOTICE[2837] chan_sip.c: Peer 'out-mynetfone-bbwd' is now Reachable. (29ms / 2000ms)
[Jan 13 08:02:00] NOTICE[2837] chan_sip.c: Peer 'in-mynetfone-bbwd' is now Lagged. (3024ms / 2000ms)
[Jan 13 08:02:12] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now UNREACHABLE!  Last qualify: 13
[Jan 13 08:02:12] NOTICE[2837] chan_sip.c: Peer 'mynetfone-home-prv' is now Lagged. (2025ms / 2000ms)
[Jan 13 08:02:12] NOTICE[2837] chan_sip.c: Peer 'out-mynetfone-bbwd' is now Lagged. (2025ms / 2000ms)
[Jan 13 08:02:22] NOTICE[2837] chan_sip.c: Peer 'out-clickncall' is now Reachable. (9ms / 2000ms)
[Jan 13 08:02:22] NOTICE[2837] chan_sip.c: Peer 'mynetfone-home-prv' is now Reachable. (27ms / 2000ms)
[Jan 13 08:02:22] NOTICE[2837] chan_sip.c: Peer 'out-mynetfone-bbwd' is now Reachable. (27ms / 2000ms)
[Jan 13 08:02:23] NOTICE[2837] chan_sip.c: Peer 'in-mynetfone-bbwd' is now Reachable. (26ms / 2000ms)

It would appear that since Jan 13th both SIPS have been registered OK. What is odd is that I have been away on holiday since 12th. The point being I couldnt have made any changes to make that happen.

I also noted while scrutinising the logs, two erroneous # characters in extensions.conf (which I had put in as comments - I know it should be a ; but Id been doing a lot of bash scripting!). I cant see those making a difference, but I mention them for completeness. I only corrected them just now.

Again this info is just for interest only, I accept this is an unsupported version. Still, its odd behaviour!


#14

^^ That can be the issue right there.


#15

It only occurs once in all the logs, the registration failures and successes are both before and after it.