Sometimes Asterisk use 100% CPU for several minutes

Hello,

we implemented an oubound callcenter application for a customer. It uses AGI to connect to an java backend.

Nearly one times per week the CPU load raised to 100 and above and the system doesn’t respond to anything, expect ping.
Only a hardware reset solved the problem.
To debug the system we set a cpuset which limit the asterisk process to one cpu (4 cores) on the dual cpu system.
Now it runs mostly very well. But sometimes die CPU-load raises for 1 to 5 minutes to 40 and above. In this time the asterisk doesn’t
react and all active channels get disconnected. This happens approximately one time in two weeks.
Everytime if this happens the asterisk process consumes 100% CPU. No other process consumes high CPU load.

We tried to restart the asterisk server every night, but it didn’t solve the problem.
We also tried asterisk 1.4.22, but with this version the problem occured more often.

Used Hardware:

  • Two Quad Core Intel® Xeon® E5310 @ 1.6 GHz
  • 4 GB RAM
  • Two Sangoma Technologies Corp. A104d QUAD T1/E1 AFT cards, six E1 lines connected

Installed Versions:

  • OS: Debian 5.0 Lenny
  • Kernel 2.6.22 Custom
  • Asterisk 1.4.21.1
  • Zaptel 1.4.12.1
  • libpri 1.4.5
  • Wanpipe 3.2.7.1

Average number of active clients: approximately 80
Average number of sumultanous active Calls: approximately 50
Average nummber of outbound dials per minute: approximately 100

Regards,

Christian Linden

I once had this happen when I edited a script and it was looping really bad.

Does crap pour into your CLI when this thing is running at 100%?

What do the logs say - is there a loop in there (cannot destroy channel - or something like that).

Any ideas how many active channels you have going when it does this?
Does it do it under load?
Has it ever done it at night when presumably nobody is on the system?
I had been installing 1.4.17 with zaptel-1.4.10.1 for a while because it was rock solid and there was a lengthy gap between releases (barring any IAX security patches that have since been applied).

P.S.
cpuset looks interesting. And debian 5.0 - when did that happen!!?!! Thanks for posting.

Thanks for answering

@Mark Logan:
We do verbose logging (verbose log level 3) and couldn’t see any looping, but somethimes there are log entrys missing. But the next time we will take a clouser look on that.
The CLI doesn’t work, because asterisk isn’t responding.

@chris.mylonas:
The only error or warning comes from ZAP. But we thing this is a result, not the cause of the high CPU load.
We are monitoring active SIP and ZAP Channels with Zabbix once per minute. But there aren’t any higher values bevor the high load.
It never happens at night.

of all things asterisk, AGI is the one I have least knowledge in.

Are you using asterisk-java for the agi server (as in fastagi - this is less of a resource hog). Is the AGI server on the same box?

Or are the AGI scripts perl that get spawned a few times per call?

IIRC a new thread is created for each new AGI (that’s why fastagi is the way to go).


that will give you more of an idea what is going on when it locks up

that will give you more of an idea what is going on when it locks up