We’re a SIP provider for some 700 clients, mostly using Linksys ATAs. The OS is 2.6.17-gentoo-r4 #1 SMP and we’re running Asterisk 1.4.19.
For the past several days at least, our Asterisk server has been running at 100% CPU usage, and as far as I can tell, there’s not really anything unusual happening in Asterisk, except that perhaps an unusually high number of SIP channels are open - over 100 where “normal” operation has about 60 or so.
What would I need to do to properly diagnose this problem and find the culprit?
Unfortunately Asterisk requires several days of uptime before it starts doing this, and as such I can’t post this stuff until it starts doing this again. I restarted it yesterday and that “fixes” the problem, but it’s only a short term solution.
In the meantime though, I can definitely say that at the time Top showed that Asterisk was the only process consuming any real resources. Neither Mysql nor any other process was showing any significant CPU time, either in Top or ‘ps ax’.
Currently this is what ‘ps axo time,comm --sort stime’ says:
I am also getting the same problem. Asterisk shows over 100% cpu usage, dialparties.agi also shows up frequently some times with about 70% or more cpu usage so does op_server.pl. What confuses me though is that at times not many calls will be coming in or at times none but cpu % will still be high. Were you able to find a solution to your problem? I have 15 zap channels, 3 sip channels, 35 extensions, 2GB ram and 3.6 GHz processor. Please help me out.
voip2*CLI> show channels
Channel Location State Application(Data)
Zap/20-1 778XXXXXXX@local:2 Up VoiceMail(u778XXXXXXX)
Zap/2-1 (None) Up Bridged Call(SIP/0014BFxxxxxx-
SIP/0014BFxxxxxx-1-b 604XXXXXXX@default:1 Up Dial(Zap/g1/604XXXXXXX)
SIP/xxx.xxx.xx.xx-08 (None) Up Bridged Call(SIP/0014BFxxxxxx-
SIP/0014BFxxxxxx-1-b 1250XXXXXXX@default: Up Dial(SIP/XXXXXXXXXXXXXXX@203.1
SIP/xxx.xxx.xx.xx-08 (None) Up Bridged Call(SIP/0014BFxxxxxx-
SIP/0014BF4EC157-1-b 1250XXXXXXX@default: Up Dial(SIP/XXXXXXXXXXXXXXX@203.1
SIP/xxx.xxx.xx.xx-08 (None) Up Bridged Call(SIP/0014BFxxxxxx-
SIP/0014BFxxxxxx-1-b 1250XXXXXXX@default: Up Dial(SIP/XXXXXXXXXXXXXXX@203.1
Zap/3-1 (None) Up Bridged Call(Zap/19-1)
Zap/19-1 604XXXXXXX@default:1 Up Dial(Zap/g1/604XXXXXXX)
SIP/xxx.xxx.xx.xx-08 (None) Up Bridged Call(SIP/0014BFxxxxxx-
SIP/0014BFxxxxxx-1-b 1406XXXXXXX@default: Up Dial(SIP/XXXXXXXXXXXXXXX@203.1
SIP/xxx.xxx.xx.xx-08 (None) Up Bridged Call(SIP/0014BFxxxxxx-
SIP/0014BFxxxxxx-1-b 1250XXXXXXX@default: Up Dial(SIP/XXXXXXXXXXXXXXX@203.1
SIP/xxx.xxx.xx.xx-0 (None) Up Bridged Call(SIP/0018F8xxxxxx-
SIP/0018F8xxxxxx-2-b 01135XXXXXXX@defa Up Dial(SIP/011358XXXXXXX@204.2
SIP/203.160.32.46-b5 (None) Up Bridged Call(SIP/001C10xxxxxx-
SIP/001C10xxxxxx-1-b 1604XXXXXXX@default: Up Dial(SIP/XXXXXXXXXXXXXXX@203.1
Zap/21-1 604XXXXXXX@local:3 Up VoiceMailMain()
20 active channels
11 active calls
Coincidentally, the stuff behind the XX’s looks pretty normal to me, just incoming and outgoing calls
Well if this is all thats going on then you have a serious problem I have boxes with over 200 calls in progress and I would be worried if the load wne over 0.3 let alone 1.4 as yours does.
You need to set a lab system copy the dialplan and hammer it to see whats going wrong with it. something isnt right, What codecs are you using ?
I’m getting the same issue on one of my asterisk boxes, also running 1.4.19.
Funny thing is, though, that I’m running two other identical Asterisk machines with slightly differing hardware, and neither of them have this issue. Dialplans, etc, are all the same.
One of the boxes is an HP ProLiant DL380 G5 with 2gb RAM and a quadcore Xeon, with a TE110P card.
Another is an HP ProLiant DL380 G3 with 2gb RAM and two single core Xeons, with a Sangoma A101 card.
The affected box, however, is an HP ProLiant ML350 with 4gb RAM and a quadcore Xeon, with a Sangoma A102 card hooked to two T1s. I rebooted it this morning and it was fine for a few hours and then began being affected again.
These machines all run the exact same dialplan, OS versions, kernel versions, and Asterisk versions. The only difference is the hardware.
It doesn’t affect anything other than the reading of callback numbers in voicemails, but this is an incredibly high-maintenance client, and I’d love for a resolution to be found. For the most part, my top looks like his.
This morning, I pulled the server down because I had to go from an A102 to an A102D card (customer complained of echo on the line), and in the process, I removed 2gb RAM bringing the total down to 2gb.
That seems to have fixed the issue.
Is there a known problem running asterisk with 4gb RAM?
I’m having the same problem on 1.4.19.1. It has been happening on a few of my boxes - one with 5GB of ram and approx 230 peers and 4 T1s, another with 3GB of ram and <30 peers. Both use FreePBX for configuration, and both seem to use 100% of a single core (dual hyperthreaded xeons) after being up for several days. It doesn’t seem to happen to another box I have that doesn’t use FreePBX and has 2GB of ram.
I never noticed this behavior on 1.4.18 - it started when we moved to 1.4.19.1.
I have the a similar problem. What I have noticed is that everytime my CPU spikes to 106% is that someone was checking there voicemail and the SIP channel is still open even though they have hung up and the user reports that nothing strange happened to them while they were checking voicemail and that they just hung up when they were done. Their phone also shows that the call was hungup. Here is an example from core show channels:
Channel Location State Application(Data)
SIP/209-b6c89b88 777@custom-from-inte Up VoiceMailMain(209@default)
First I thought it might be a corrupted mail box, but it happens to be different mailboxes every time and always with someone who was checking their voicemail.
Bingo, that’s the problem alright. It’s always someone checking their voicemail. We have our voicemail number set as *98 (with another number that works if they want to check it remotely) and it’s always that number. The application VoiceMailMain() appears to be the culprit, and it seems to be stuck in a loop.
Is there any way to kill the application without restarting Asterisk?