Possible deadlock on 1.6.2.12 and 14

I have been using:

AsteriskNOW 1.7.1 distribution
Asterisk 1.6.2.12
B410P BRI interface card connected to Australian Telco
DAHDI Version: 2.4.0 Echo Canceller: MG2
libpri version: 1.4.11.4

Recently upgraded to:

AsteriskNOW 1.7.1 distribution
Asterisk 1.6.2.14
B410P BRI interface card connected to Australian Telco
DAHDI Version: 2.4.0 Echo Canceller: MG2
libpri version: 1.4.11.5

In both cases after about 200 - 300 calls on a fleet of about 40 SNOM phones (with lots of BLFs) I’m getting a lockup that I think is probably a deadlock. The phones lose registration (retry is is set to 120 seconds), in-flight calls continue and CLI continues to run. I have to restart Asterisk to clear the condition.

I am wondering if this is the same as issue 0018310.

Can someone please outline the steps I should take when I observe the condition (before restarting) and how I can tell if this is the same bug as described in 0018310.

I’m assuming that since I’m running an AsteriskNow install updated with yum rather than a build from source that I can’t patch until a resolution is released?

Sorry of these are dumb newbie questions.

Howdy,

They’re not dumb questions.

It seems reasonable that the same sort of thing might be going on. From that issue, there’s a comment from MKemner about seeing similar things from the locks, as output by gdb. The instructions here:
wiki.asterisk.org/wiki/display/ … +Backtrace
will help you get a backtrace, and you can compare your locks to what’s reported there.

Correct, because you’re using RPM based releases of Asterisk, you can’t patch them directly; you’d have to download the source, patch, and then install to test.

Thanks for your reply Malcolm.

I just took a look and it looks like the rpm of 1.6.2.15 has recently become available in the repository so I think I’ll do an update shortly and just see if that addresses the problem - I’d far prefer to be a little more methodical in my testing though!

I will certainly be doing my future builds from source so that I have more options fore when things go wrong.

Do you think it would be useful for Digium to put instructions on the site for converting an AsteriskNow install from binary-based to source-based? I’m not sufficiently familiar with Linix/Kernel builds/SRPMS to know if this suggestion even makes sense (my developer days are 20 years in the past)! That way one might get the convenience of a simple distribution install for when everything goes smoothly but can still relatively easily move to a patch/debug approach if the going get tough.

Thanks for your help.

[quote=“JustIntonation”]
Do you think it would be useful for Digium to put instructions on the site for converting an AsteriskNow install from binary-based to source-based? I’m not sufficiently familiar with Linix/Kernel builds/SRPMS to know if this suggestion even makes sense (my developer days are 20 years in the past)! That way one might get the convenience of a simple distribution install for when everything goes smoothly but can still relatively easily move to a patch/debug approach if the going get tough.

Thanks for your help.[/quote]

Howdy,

Actually, doing this is counter to what running on an RPM-based system affords you - simple package management. I think doing that could result in unexpected badness down the road for someone that decided to go “back on the reservation” with RPMs.

If you’re an RPMer, the best path is to stay there. If you’re brave enough to venture into source land, you’re also best to stay there.

Hi Malcolm,

I can see what you’re saying - so deciding source/binary really is essentially a one time deal at the time you build your Asterisk server. Pity - as that’s the stage at which you have the least experience to make the decision wisely, at least for the first server. A lot of life is like that I guess. I’ll be rebuilding this server shortly on better hardware and will definitely give a source build a go next time round.

BTW I’ve had 1.6.2.15 running for 48 hours now and no deadlocks after about 600 calls through the system. It also got rid of those irritating “No D-channels available!” messages for my configured but not in service BRIs.

So far I’m loving 1.6.2.15 !

Thanks for your replies.

[quote=“JustIntonation”]Hi Malcolm,

I can see what you’re saying - so deciding source/binary really is essentially a one time deal at the time you build your Asterisk server. Pity - as that’s the stage at which you have the least experience to make the decision wisely, at least for the first server. A lot of life is like that I guess. I’ll be rebuilding this server shortly on better hardware and will definitely give a source build a go next time round.
[/quote]

More that going back and forth between them isn’t necessarily the best idea, so it’s best to stick with one.

[quote=“JustIntonation”]
BTW I’ve had 1.6.2.15 running for 48 hours now and no deadlocks after about 600 calls through the system. It also got rid of those irritating “No D-channels available!” messages for my configured but not in service BRIs.

So far I’m loving 1.6.2.15 !

Thanks for your replies.[/quote]

Yay :smile:

Sigh… spoke too soon.

When I returned from leave it turns out we are still having the problem with 1.6.2.15 unfortunately.

I have set up an action URL on one of the phones so that when registration is lost we get a SMS (initiated via email) and can restart Asterisk quickly. I have also done some scripting to save some pertinent info and force a core dump with gcore when we run the “service asterisk restart”. Should have some hard info soon.

These steps seem to have scared it into submission - no lockups for 24 hourn now!

OK, that worked eventually - got 3 backtraces.

Entered as issue 18629 .