2000+ Phones cause server to hang when network error occurs


#1

We have a fairly large Asterisk server implimentation; Almost 3000 peers of which about 2600 are registered at any one time.

We are running Asterisk 1.4.26.2, we have two servers running Linux HA in a Hot/Standby mode. All of our PSTN/external calls are through a Sip trunk to a Cisco router with multiple PRI cards, so the Asterisk server is only doing Sip calls.

We are currently having a lot of network issues throughout our campus, which are being addressed, but will take some time to resolve. When one of these issues hit we will loose hundreds (or more) of phones. Now that we are in the 2600 range we are finding that when this happens the server will momentarily hang, get busy trying to contact all the phones, which has caused problems, like call quality degredation, corrupted voicemail messages, etc. It has even hung the server long enough that Linux HA has flipped the servers as the server did not respond in time to a Linux HA heartbeat.

The server load is not significant (97% idle), except when these network outages occur.

Are there some settings I can tweek that would make this event have less of an impact on the server.

I’m looking at increasing our Linux HA threshold, but are there Asterisk settings we can do minimize the impact of hundreds or thousands of phones going off and online?

quality=? would turning this off or down help? What about other sip timeout settings?

See settings below;

Sip.conf “General” Settings
[general]
allowguest=no
bindport=5060
bindaddr=10.1.1.30
localnet=10.0.0.0/255.248.0.0
disallow=all
allow=ulaw
allow=alaw
dtmfmode=rfc2833
promiscredir=yes
context=default
srvlookup=yes
limitonpeers=yes
jbenable=yes
jbforce=yes
jbmaxsize=80
jbresyncthreshold=1000
jbimpl=adaptive
jblog=no
tos_sip=cs3
tos_audio=ef
tos_video=af41

Sip.conf “typical phone” settings

[5553824]
type=friend
secret=<>
nat=yes
host=dynamic
reinvite=no
canreinvite=no
qualify=yes
callerid=Joe Blow <5553824>
context=UnRestrictedCalling
dtmfmode=rfc2833
mailbox=5553824


#2

Make sure you are using only IP addresses for all you sip devices/trunks. Asterisks goes belly up when name resolution is not available. The qualify settings in your case seems like a waste. Qualify only makes sense when asterisk acts as a client behind NAT connecting to some other SIP entity and it allows the NAT mapping to remain alive.


#3

Hi

Ok network issues are a big problem, it can stall it, also I assume that you are using NO agi in the extension dialing as this will stall things if dns is not correct or fails.

also make sure the sets have their registration time set to a level as to not flood the server.

Finally turn off all logging or have it at a minimum as this can cause issues, We have seen this on busy conference servers.

Ian
www.cyber-cottage.co.uk
twitter @cyberco