How can I get asterisk to run for 60 days with no intervention on my part?
- Keep troubleshooting and upgrading.
- Hire someone to take a look at the problem.
- Learn the code and debug it for real. It’s the only way.
- Make and take less calls
I have several asterisk servers, but one of the locations has never quite worked right. We have been through three motherboards, two tdm400p, a sangoma a200, countless config tweaks, and just about every version of asterisk from 1.1, to 1.2, to 1.4 but nothing seems to help.
Over the years it has gotten more reliable (I no longer see catastrophic deadlocks, and only rare crashes), but lately it has plateaued and maybe even started to get worse.
The main issue from my perspective is that inbound calls sometimes stay open indefinitely, and long after both parties have disconnected. Given enough time, I will show channels, and see that all three of my FXO are tied up.
Calls will then start coming in over VoIP at 2cents per minute. I have to either restart asterisk or soft hangup to free them. If i restart asterisk daily it doesn’t guarantee me anything.
When this happens the system is much more likely to go into a state my users affectionately call ‘haywire’.
When the system has gone haywire, many intermittent problems will rear their ugly heads.
Phantom Calls- Calls come in but are disconnected the moment we answer at one of our SIP phones.
Locked SIP Channels- I often see that a call between a sip phone and one of the parking extensions is in an indefinitely active state.
Random disconnects when transferring- Calls will randomly be dropped when parking or transferring a call.
I was planning on getting a butt set to monitor the incoming lines, but I can tell that the TelCo is fluctuating the voltages as it should in some of the software that came with my sangoma card.
The only difference between this location and another location which works much more reliably (besides the actual building) is that the problematic server uses AT&T as its CLEC, whereas the well-behaved server uses Verizon.
(*Note that this “sister-server” is only relatively well-behaved. Occasionally I will see 20 active sip channels between two of my Snom 360 phones, and no one is actually on the line. This doesn’t seem to bother the server, and I just soft-hangup or restart it when this is an issue, but it’s similar enough to this problem that I thought it worth mentioning.)
The server is running Gentoo linux
Linux claudia 2.6.18-gentoo-r6 #6 SMP Wed May 9 22:02:20 EDT 2007 x86_64 Dual Core AMD Opteron(tm) Processor 165 AuthenticAMD GNU/Linux
The motherboard is a Tyan S265 series socket 939 with nforce4 chipset.
Im using a sangoma a200 with 4FXO and 4FXS, but when I used a tdm400p (and * 1.2.X) everything was less reliable in general, with deadlocks on top.
;autogenerated by /usr/local/sbin/config-zaptel do not hand edit ;Zaptel Channels Configurations (zapata.conf) ; ;For detailed zapata options, view /etc/asterisk/zapata.conf.orig [trunkgroups] [channels] context=default usecallerid=yes hidecallerid=no callwaiting=yes usecallingpres=yes callwaitingcallerid=yes threewaycalling=yes transfer=yes canpark=yes cancallforward=yes callreturn=yes echocancel=yes echocancelwhenbridged=no echotraining=256 group=1 callgroup=1 pickupgroup=1 immediate=no ;Sangoma A200 [slot:9 bus:1 span:1] context=outbound group=0 signalling = fxo_ks channel => 1-4 rxgain=-6 txgain=-6 context=inbound hanguponpolarityswitch busydetect=yes busycount=10 group=1 signalling = fxs_ks channel => 5-8
Here is my dialplan for incoming calls:
[open] exten => s,1,Answer exten => s,n,wait(1) exten => s,n,Dial(zap/1&zap/2&sip/carmen&sip/aron&sip/maria&sip/register,25) exten => s,n,voicemail(b0) exten => s,n,hangup
If I call my FXO line and hangup within 5 seconds, asterisk will miss the hangup about 15% of the time and when someone answers they will get a dialtone directly on an FXO. I’m not sure if this is in any way related to the other problems we’ve been having. My understanding is that all PBX have some problem with that situation.
The only thing I see in my log that worries me is an occasional:
[May 29 19:58:36] WARNING chan_sip.c: Remote host can't match requestBYE to call 'blah@blah'. Giving up.
but like every other problem it seems to be totally random.
I’m almost inclined to believe it is something with the location, but its so intermittent and I am never there. The closest I’ve come to experiencing it first hand is once I called in and had the call dropped when I expected they had answered on their end.
[b]I’m not expecting someone to have a magic bullet, but I’m hoping someone out there has some troubleshooting ideas for me.
If I could have asterisk run for 30 days without any intervention on my part, that would be a huge first step.[/b]
On the other hand, perhaps it is time to throw in the towel. If it were crashing I would at least have a dump for someone else to work with, but I don’t even have that. My history with Asterisk has been one of intermittent predictability, so why should anything ever change?
I see no reason to start throwing more money at the problem and swapping more parts, and how would I post about this in the ‘Job Opportunities’ forum? “Bounty: Make my asterisk run for 60 days.” (If I’m paying for it, I expect at least a two month solution).
I’m not a software engineer, and I’m starting to realize features are a lot less important to me and my users than stability. Perhaps Asterisk just isn’t for me, and requires someone a little more talented to administrate and troubleshoot.
If this is the case, will someone please tell me . I wont be offended. Perhaps there is an alternate software solution someone could recommend for us mere mortals who only know enough C to not quit our day jobs.