NMI Error Crashing Asterisk


#1

All.

Having a problem with a new asterisk server. After a few hours to days of operation, I’m getting the following errors in syslog. Following the error, asterisk dies.

Uhhuh. NMI received for unknown reason 31 on CPU 0.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?

Running
Centos 4.2 stock kernel 2.6.9-22.0.2.ELsmp
Asterisk 1.2.4
Zaptel 1.2.3
libpri 1.2.3

(This replaced an Asterisk 0.7.0 server with two T400P cards on Redhat 8.0 )

6 T-1 lines to Adtran 750 Channel Banks
1 E&M T-1 Long distance
1 PRI for Local

CDR logging to MySQL
10-15 Cisco 7960 or PolyCom IP500 phones

Here are the specs:

Gateway 975 Dual 2.4 Ghz Xeon Processors /w 512 Cache
LSI MegaRAID U320-1 one-channel Ultra320 SCSI RAID controller with 64MB cache
3 HDD in RAID5 configuration
Integrated dual channel Ultra320 SCSI (Not used)
Dual Integrated Intel® PCI 10/100/1000 Twisted Pair Ethernet (1 used

2 x TE411P Digium cards

lspci output:
00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub (rev 01)
00:00.1 Class ff00: Intel Corporation E7500/E7501 Host RASUM Controller (rev 01)
00:03.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface C PCI-to-PCI Bridge (rev 01)
00:03.1 Class ff00: Intel Corporation E7500/E7501 Hub Interface C RASUM Controller (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1) (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2) (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42)
00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface Controller (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 02)
01:0c.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
02:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
02:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04)
02:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
02:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04)
03:07.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 01)
03:07.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 01)
03:08.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID (rev 01)
03:09.0 Communication controller: Unknown device d161:0410 (rev 02)
03:0a.0 Communication controller: Unknown device d161:0410 (rev 02)
04:07.0 RAID bus controller: Adaptec AIC-7902 U320 w/HostRAID (rev 03)
04:07.1 RAID bus controller: Adaptec AIC-7902 U320 w/HostRAID (rev 03)

cat /proc/interrupts output:
CPU0 CPU1 CPU2 CPU3
0: 1261976 1180422 1180448 1180399 IO-APIC-edge timer
1: 9 0 0 0 IO-APIC-edge i8042
8: 1 0 0 0 IO-APIC-edge rtc
9: 10 0 0 0 IO-APIC-level acpi
12: 70 0 0 0 IO-APIC-edge i8042
15: 611 20682 20364 720 IO-APIC-edge ide1
50: 532373 1570402 640161 1981671 IO-APIC-level wct4xxp
169: 0 0 0 0 IO-APIC-level uhci_hcd
177: 2079 24585 0 16772 IO-APIC-level uhci_hcd
193: 30 0 0 0 IO-APIC-level aic79xx
201: 30 0 0 0 IO-APIC-level aic79xx
209: 473542 0 0 0 IO-APIC-level eth0
225: 5294 1157 4106 2538 IO-APIC-level megaraid
233: 432370 500129 2361733 1430366 IO-APIC-level wct4xxp
NMI: 1 0 0 0
LOC: 4802954 4802828 4802827 4802951
ERR: 0
MIS: 0

lsmod:
Module Size Used by
md5 8001 1
ipv6 240097 24
wct4xxp 60352 166
zaptel 196740 357 wct4xxp
crc_ccitt 6081 1 zaptel
dm_mirror 28449 0
dm_mod 58949 1 dm_mirror
button 10449 0
battery 12869 0
ac 8773 0
uhci_hcd 32729 0
hw_random 9557 0
snd_usb_audio 61729 2
snd_pcm_oss 52345 0
snd_mixer_oss 21825 1 snd_pcm_oss
snd_pcm 91973 4 snd_usb_audio,snd_pcm_oss
snd_timer 27973 1 snd_pcm
snd_page_alloc 13641 1 snd_pcm
snd_usb_lib 15681 1 snd_usb_audio
snd_rawmidi 27749 1 snd_usb_lib
snd_seq_device 11849 1 snd_rawmidi
snd 56997 9 snd_usb_audio,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer,snd_rawmidi,snd_seq_device
soundcore 12961 1 snd
e1000 96429 0
floppy 58065 0
sg 38113 0
ext3 118729 2
jbd 59481 1 ext3
aic79xx 187613 0
megaraid_mbox 37073 3
megaraid_mm 17905 1 megaraid_mbox
sd_mod 20545 4
scsi_mod 116429 4 sg,aic79xx,megaraid_mbox,sd_mod

Any ideas.


#2

One more note. Some of the NMI events have been occuring shortly after editing extensions.conf. not sure if its related… but does seem odd.


#3

Like 99.999% of the time, Kernel NMI simply indicates bad memory.


#4

agreed. run memtest86 on it overnight…


#5

I am having same trouble, on an Intel Server, Dual XEON 2.8g processors.
CentOS 4.3 w/ Kernel-smp-2.6.9-34.12EL, Zaptel 1.2.5
The only time this happens is when I load the Zaptel driver. As soon as I load the Zaptel module the sysem spews NMI messages. If I unload the module then all is well again.