[HELP] Best way to solve deadlock problem


#1

I have a call center based on Asterisk, in production, that generally works just fine. However, I have noticed that sometimes (once per hour, or so) I get the following warning in the log file:

Apr 13 13:57:19 WARNING[23558] channel.c: Avoided initial deadlock for ‘0xa62520’, 10 retries!

How can I find out what is causing that?


#2

Thats is currently a tough nut on the asterisk bugtracker, see

bugs.digium.com/main_page.php

It MIGHT help to look into the “weight solution” issue posted there.
On the other hand, it might be a ressource problem.

Which version you are running ?

I am trying 1.2.6 tonite since we have similar problems with the queue app.

It looks like something (stack? not freed variable memory) is majorly messing up in the queue application - channel.c is depending on it if you are a callcenter (like we are).


#3

I’m using 1.2.4, with patched app_queue.c that includes the following two patches:

Better strategy behaviour (which I think is standard part of Asterisk from a few weeks ago - not completely sure though):
bugs.digium.com/view.php?id=5577

Multiple periodic announcements:
bugs.digium.com/view.php?id=5273

This deadlocks started to appear just a few days ago, when nothing was changed, although the traffic did increase alot!

Are there any good strategies for solving deadlocks on a production machine? And even more important: what can such deadlock cause to the system?


#4

This deadlocks started to appear just a few days ago, when nothing
was changed, although the traffic did increase alot!

hehe…this IS exactly the problem:
When a call is…lets say not freeing 24 byte of mem, you wont recognize with 100 calls a day.

But with 1000 calls you will recog it soon…

Its just hardening my ressource-slack theory, something isnt cleared there in asterisk…

Asterisk is supposed to clear every variable with the hangup.
I suspect that not working 100% , thats why we run out of ressources causing this problem on heavy-load pbx’

Well…thats what i guess, could be wrong tho.

So far, you only have two choices:

A cronjob rebooting your machine 6am in the morning time
See your local webmin interface and create one with the command
nohup shutdown -r now &

and shedule it to 6am every day.

Second choice, which i am trying tonite:
1.2.6 update…

If you go with the cron reboot:
Make sure all needed services (cdr, mysql etc) are started automatically on boot !


#5

PS:
Make it 1.2.7, just saw there is a new release !


#6

hmm, that sounds scary. My production PBX is working in Moscow 24/7 (thousands of miles away from me, plus must work all the time), so rebooting machine is out of question.

With cca. 1300 calls per day (10-20 deadlocks per day), when can I expect the whole thing to crash?


#7

uhm…normally the deadlock means rebooting anyway ?!
Or are you still able to restart it with the deadlocks ?

If you issue a
nohup shutdown -r now

it should reboot safe and you can reconnect via putty etc, no ?

So far, till fixed, a daily (nightly) reboot via cron is your best bet…

If you then still experience 10-20 DLs per day, i really suggest another machine, set it up with the same network config and send it to moscow and let them swap the machines.

This problem ALSO seems to be systemdependend…so it could help to setup a new machine.


#8

Yeap, I’m able to restart it with the deadlocks. No problems at all. Hmm, another machine? Huh…no good news then. :cry:


#9

Well you know, i always think in economic ways too, not only technical.

We admins cant afford to have productionservers serving “experimental purposes”.

So the customer needs to be happy, so machine should exchanged.

Then you can torture the naughty machine at you rplace as long as you want… :smiling_imp:

But the productionserver must RUN.

What did the guy said in the Predatormovie ?
“I dont have time to bleed…” :wink:


#10

Yeah, you’re definately right there.

I’m just wondering how come nothing freezes or so - thanks god though (like described in voip-info.org/wiki/index.php … k+deadlock for deadlocks) and yet I am experiencing ones…


#11

Well, i know that asterisk is using its own stack in the size of 256 * 1024.

Since this is a “stack” only for asterisk, it might not dump the whole machine when a problem is arising there.

Be happy its not… :blush:

I have to reboot my machine, the memory is def. screwed then with no “soft-restart” chance… :cry:


#12

OOH !!!

That brings an idea to my mind…

Make a cron job !

Shedule it to start all 30 minutes.

Command:
asterisk -r -x “restart when convenient”

The quotes are important !


#13

Hmm, thanks for that. That’s a good idea… Thanx! 8)

Btw: are you using Asterisk in the call center for long time now? What are your experiences with it so far?


#14

[quote=“bziherl”]Hmm, thanks for that. That’s a good idea… Thanx! 8)

Btw: are you using Asterisk in the call center for long time now? What are your experiences with it so far?[/quote]

Its amazing !

You can do things with it you cant dream of.

There is HARDLY anything you would hear a “will not work” from me, regarding asterisk…

No problems here, very stable, very good quality, very good serveruse (CPU/RAM).

The ONLY problem now is the ressource/memory leak, but i will kill this bitch quick.

THis is MY server, not asterisk’s server, so i decide who is when wasting what ram :laughing: :laughing:

Just need to tell it asterisk :smiling_imp: