High CPU Load issues

We recently added more volume to our Asterisk server and are now suffering voice quality issues due to high cpu load. We have turned off some non-essential features such as call recording and call monitoring. Real time reports on our GUI were also rebuilt to minimize traffic on mysql db. Aside from the sound quality issues due to the high CPU load we are also having problems with agents not being able to login or logout consistently. Not sure if this is a DTMF issue or a CPU load issue. Supervisors in the call center have ability to ‘force off’ agents from phones but this is becoming a chore.

Suggestions by our vendor have been to change all IVR prompts from .gsm format to g711ulaw 8k mono .wav format to help reduce load on server.

I converted one file but the file size is four times bigger than the same recording in .gsm format. Is this the right thing to change or are there any other suggestions on reducing CPU load and/or fixing our login/logout issue?

We are using approximatley 80 Nortel phones connected through four Citel gateways that are then connected to the Asterisk server via an 8 port hub.

Nortel phones have been setup with speed dial buttons for the feature code *904 for login/logout. We have disabled ‘agent pause’ as we were using a custom script that was wreaking havoc with agent status on the server. Even when agents logged out it the switch thought they were still available and was sending calls to the phone.

Help!!!

specs on the server?

Not to mention a hub? Is there particular need for that as a switch in Full duplex mode may remove some of the latency if there are collisions / etc.

Could you post the specs on your server as well as what the loadavg for the server is when it is at high load?

How many concurrent channels do you have active when this starts to happen?

Server specs are as follows:

Hardware
Processor : Intel® Xeon™ CPU 2.80GHz
Cache : 1024 KB
Bogomips : 22404.26
Kernel : 2.6.9-34.ELsmp (SMP)
Uptime : 2 days 10 minutes
Load Average : 8.39 / 7.39 / 7.12

There are actually 2 Xeon processers on this system in a hyper threaded environment. I have been told by other sources that hyperthreading should be turned off so recommendations here are welcome. System also has 4 GB of RAM.

Hub? I meant switch. It is an 8 port switch that is connected to an internal IP. The Asterisk server is also connected to an external IP for remote configuraiton through the GUI.

Load averages as of right now are above. Typical load average prior to the increase in call volume was 5 to 7 but alot of optimization was done on real time rpeorts to get back close to that number.

As for active channels we have seen as many as 60 agents on calls with another 50, 60 or even 70 calls in queue.

The system has 8 PRI’s connected through 2 Digium cards.

Load average when it spiked earlier this week was ridicuously high. In the hundreds

We had the rx gain set rather high as agents were complaining that they couldn’t hear the callers. We have also been experiencing dropped calls for a couple weeks now which we thought were PRI related. Additional info I have found in these forums suggest that keeping the tx and rx gain closer to 0 will reduce chances of dropped calls. I have changed this this morning but not certain if zaptel channels or telephony server need to be restarted to take effect?

for a single processor system a loadavg of 1.0 is 100%. So you are running routinely at 500% and higher of capacity which is horrible for Asterisk and causes all sorts of problems.

If you want to continue with the single-server approach you need a faster server with more real CPUs(non HT) or you need to think about moving to a distributed architecture of multiple Asterisk servers sharing the load.

We are running two processers on this box that are both hyperthreaded which does make 4 total processers but I understand that going back to an envrionment without hyperthreading is ideal. Right?

Even with 2 processors and HT-enabled, a sustained 5.0 is still 125% which is not good.

As for HyperThreading. We use it on all of our Intel servers and it works fine for us. The best explanation for HT is that it’s like running two processors at 60% capacity of your processor’s speed which yields a slight advantage under some applications. Depending on what you do with Asterisk it can help or hurt.

Either way you need to get a faster server or start thinking about a distributed setup.