I have had an ongoing problem with one of our servers…the system is hanging/locking and dumping memory to the console - we are having to manually power off and reboot the machine.
System is a Dell 2850, brand new, running Fedora Core 4 minimal install (mysql and httpd are only processes running). We are running Asterisk Biz Edition B-1.1 (I know it’s not released publicly, Digium sent me a copy) with a TE410P card, rev 2.
This phone system has been set up on three different machines, and we have probably done a complete reformat/reinstall about a dozen times now. We have gone through Asterisk versions 1.0.7, 1.2.4, 188.8.131.52, ABE A-2.5, and ABE B-1.1. Still having the system lock up.
I have a ticket open with Digium, and it was bumped to tier 2 yesterday. I have been trying to get a core dump when the system crashes, but I cannot get anything to write to disk. I have been manually setting the ulimit for core to unlimited after the machine reboots, but the next crash doesn’t save anything either.
I just wondered if anyone had any suggestions as to what I might try to get some debugging information saved, so that Digium can hopefully determine what the problem might be.
If anyone would like to know more about our particular issue, please let me know - I can write a book on what we’ve done and tried.
That is a nasty one. I was wondering if it actually had something to do with a bad hard drive or controller card but you mentioned you tried 3 different machines so it can’t be that.
Is it possible that you can turn debug mode on and just log everythging? Maybe it will give them some indication on what was the last thing going on, not that that is always the cause of the problem but soemtimes it could.
I get a crash every ones in a while when the agents are logging in all together at once and I am getting a “Signal 11 error Segmentation fault” in the core dump but I have no idea what it means.
Good luck with your troubleshooting, I know the feeling.
Actually, I ended up getting a hold of derek at digium, and we turned off APIC. We didn’t have any crashes yesterday, and our ‘danger period’ for today is almost half over…that doesn’t necessarily mean anything, but that is the first day in over a week that we haven’t locked up.
In any case, even turning on full logging in asterisk doesn’t show anything worthwhile, or at least I haven’t found anything that would indicate an issue to me. I will check and start comparing logs if it locks up again.
FYI, digium also suggested disabling SMP, but with the load on this box, i REEEEEEEEEEEALLY don’t want to do that…
I’ll update if/when I know more, thanks for the post.
did You always try it on Fedora Core 4 ?
with custom kernel or one from fedora core standard ?
i have one problem some time ago.
pentium 4 3.0 GHz
1 GB RAM
2x WD HDD in soft mirror raid
junghanns quadBRI ( one 2B+D line)
ups connected by rs232
faxmodem connectded by rs232
nokia cell connected by rs232
additional I/O card 2 x rs232
debian 3.1 kernel 2.6.16
(custom kernel lspci -vv and choosing proper devices)
we have few similar systems, but without asterisk ans quadBRI
once a week - sunday about 2:00 - 4:00 (counting 24h)
system gat total hangup, restarted by power shwitch
in syslog we’ve got only 'out of memory’
and first or second process killed was asterisk
we put some memory tests to find out what eat whole ram
even with this nothing was clear enough
but we have got some logs (too often) from gnokii<->nokia cell
gnokii was loosing connection with phone, rs232 port was from additional I/O card
when we get rid of that I/O card, stystem start runnig with no errors at all
i don’t reely know what can be a cause of Your problem, but
Can You eliminate potentially buggy kernel drivers for hardware ?
Can OS packages do some bad interaction with hardware ?
since You done lots of diffrent Asterisk installation i will eliminate Asteisk as a cause of crash
- i wish You lots of luck with this !
we’re running the stock FC4 SMP kernel…
i agree, i don’t think asterisk is the issue, and i’m fairly certain it’s not server hardware, i think it’s probably some combination of the dell and digium hardware not playing nice when certain drivers/kernels/modules are used.
however, since we have identical servers running identical OS’s, running almost identical asterisk installs with almost identical dialplans and only ONE of them has issues…i really can’t tell you for sure (big sigh).
in any case, i’ll let you all know how the next few days go…out to be interesting.
did You try swapping Digium card beetween ‘good’ box and ‘bad’ box ?
this should eliminate or prove card errors
if this ‘bad’ box is still bad … then first on line to check should be dell hardware
we’ve actually gone through 4 different TE410Ps…we have 10 total, 8 in use, 2 spares…all of the cards seemingly work fine in other boxes.
not sure if i’d mentioned it - when we put a sangoma A104D in, we can leave the server up for WEEKS without an issue (except DTMF, which is why we moved back to digium) - NOTHING else changed. the problem is only present with a Digium card.
i’m putting an A101 in a Dell 1850 tomorrow … what was the DTMF issue ??
whoiswes, it’s a mystery
this one ‘bad’ box have to be diffrent from others somehow
i would diff lspci -v to check if machines heve the same hardware
as You pointed "dell and digium hardware not playing nice when certain drivers/kernels/modules are used"
if OS,kernels,modules are the same, diffrent digium cards - still crash… then dell hardware looks suspicious
Maybe is RAM. Change it.
fdragowski and filippos:
we have gone through three separate dell boxes, all had the same issue.
it’s not the machine, although i really wish it was…
baconbuttie, the issue with the Sangoma (i was on vacation, so wasn’t here for any of this) was apparently where, out of the blue, it stopped recognizing DTMF. i THINK it’s the hardware echo can, and instead of playing with settings (specifically relaxdtmf=yes, which is needed with the sangomas) they just put the digium cards back in.
i really want to go back to the sangoma cards (they sound better and have MUCH better drivers than the digium cards, IMHO) but probably won’t be able to for quite a while, if ever…