BUG: spinlock lockup on CPU by Asterisk and MySQL

Dear All,

We have a high end server for running Asterisk PBX services with Sangoma telephony cards. This was previously running fine. since couple of days it is giving this error:

Mar 19 10:59:46 localhost kernel: BUG: spinlock lockup on CPU#7, asterisk/15200, ffff8100a3cf1028 (Tainted: G )
Mar 19 10:59:46 localhost kernel:
Mar 19 10:59:46 localhost kernel: Call Trace:
Mar 19 10:59:46 localhost kernel: [] _raw_spin_lock+0xcd/0xeb
Mar 19 10:59:46 localhost kernel: [] _spin_lock+0x47/0x52
Mar 19 10:59:46 localhost kernel: [] unix_stream_sendmsg+0x255/0x363
Mar 19 10:59:46 localhost kernel: [] do_sock_write+0xc6/0x102
Mar 19 10:59:46 localhost kernel: [] sock_aio_write+0x4f/0x5e
Mar 19 10:59:46 localhost kernel: [] do_sync_write+0xc7/0x104
Mar 19 10:59:46 localhost kernel: [] autoremove_wake_function+0x0/0x2e
Mar 19 10:59:46 localhost kernel: [] dnotify_parent+0x1f/0x79
Mar 19 10:59:46 localhost kernel: [] vfs_write+0xe1/0x174
Mar 19 10:59:46 localhost kernel: [] sys_write+0x45/0x6e
Mar 19 10:59:46 localhost kernel: [] tracesys+0xd5/0xdf
Mar 19 10:59:46 localhost kernel:
Mar 19 11:12:11 localhost kernel: BUG: spinlock lockup on CPU#7, mysqld/20042, ffff81012c4afc08 (Tainted: G )
Mar 19 11:12:11 localhost kernel:
Mar 19 11:12:11 localhost kernel: Call Trace:
Mar 19 11:12:11 localhost kernel: [] _raw_spin_lock+0xcd/0xeb
Mar 19 11:12:11 localhost kernel: [] _spin_lock+0x47/0x52
Mar 19 11:12:11 localhost kernel: [] unix_stream_sendmsg+0x255/0x363
Mar 19 11:12:11 localhost kernel: [] do_sock_write+0xc6/0x102
Mar 19 11:12:11 localhost kernel: [] sock_aio_write+0x4f/0x5e
Mar 19 11:12:11 localhost kernel: [] do_sync_write+0xc7/0x104
Mar 19 11:12:11 localhost kernel: [] autoremove_wake_function+0x0/0x2e
Mar 19 11:12:11 localhost kernel: [] file_has_perm+0x48/0xa3
Mar 19 11:12:11 localhost kernel: [] vfs_write+0xe1/0x174
Mar 19 11:12:11 localhost kernel: [] sys_write+0x45/0x6e
Mar 19 11:12:11 localhost kernel: [] tracesys+0xd5/0xdf

Below is the system information:

[root at localhost ~]# uname -a
Linux localhost.localdomain 2.6.18-238.19.1.el5debug #1 SMP Fri Jul 15 09:01:56 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

[root at localhost ~]# lspci
00:00.0 Host bridge: Intel Corporation Core Processor DMI (rev 11)
00:03.0 PCI bridge: Intel Corporation Core Processor PCI Express Root Port 1 (rev 11)
00:05.0 PCI bridge: Intel Corporation Core Processor PCI Express Root Port 3 (rev 11)
00:08.0 System peripheral: Intel Corporation Core Processor System Management Registers (rev 11)
00:08.1 System peripheral: Intel Corporation Core Processor Semaphore and Scratchpad Registers (rev 11)
00:08.2 System peripheral: Intel Corporation Core Processor System Control and Status Registers (rev 11)
00:08.3 System peripheral: Intel Corporation Core Processor Miscellaneous Registers (rev 11)
00:10.0 System peripheral: Intel Corporation Core Processor QPI Link (rev 11)
00:10.1 System peripheral: Intel Corporation Core Processor QPI Routing and Protocol Registers (rev 11)
00:1a.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 (rev 05)
00:1c.4 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 5 (rev 05)
00:1d.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5)
00:1f.0 ISA bridge: Intel Corporation 3400 Series Chipset LPC Interface Controller (rev 05)
00:1f.2 IDE interface: Intel Corporation 5 Series/3400 Series Chipset 4 port SATA IDE Controller (rev 05)
00:1f.5 IDE interface: Intel Corporation 5 Series/3400 Series Chipset 2 port SATA IDE Controller (rev 05)
01:03.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW WPCM450 (rev 0a)
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet (rev 20)
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet (rev 20)
05:00.0 PCI bridge: PLX Technology, Inc. PEX8112 x1 Lane PCI Express-to-PCI Bridge (rev aa)
06:04.0 Network controller: Sangoma Technologies Corp. A104d QUAD T1/E1 AFT card

[root at localhost ~]# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 329604018 0 0 0 0 0 0 0 IO-APIC-edge timer
8: 1 0 0 0 0 0 0 0 IO-APIC-edge rtc
9: 0 0 0 0 0 0 0 0 IO-APIC-level acpi
58: 48 0 0 0 0 0 302492 77909 PCI-MSI-X eth1-0
66: 31 67 86 1256 0 575 8253 0 PCI-MSI-X eth1-1
74: 22 223 216 21737 0 1018 920 0 PCI-MSI-X eth1-2
82: 23 2612 1128 207 0 14616 406 930 PCI-MSI-X eth1-3
90: 22 118 7263 7790 0 8 0 7709 PCI-MSI-X eth1-4
98: 37 17838 316 3724 0 6702 309 25094 PCI-MSI-X eth1-5
106: 21 225 95 132 0 6673 7937 34191 PCI-MSI-X eth1-6
114: 30 923 48 38 0 404 113 323 PCI-MSI-X eth1-7
122: 2 0 0 0 0 0 0 0 PCI-MSI-X cnic
169: 43305 0 0 329418631 0 0 0 0 IO-APIC-level wanpipe1, wanpipe2, wanpipe3, wanpipe4, wanpipe5, wanpipe6, wanpipe7, wanpipe8
217: 131 0 0 0 0 0 0 0 IO-APIC-level ehci_hcd:usb1, ehci_hcd:usb2
225: 12858 0 271539 0 0 0 0 0 IO-APIC-level ata_piix
233: 222 3565275 0 0 0 0 0 0 IO-APIC-level ata_piix
NMI: 40211 9293 9929 153109 34809 9374 9205 14554
LOC: 329603953 329603848 329603776 329510665 329603638 329603562 329603481 329603420
ERR: 0
MIS: 0

[root at localhost ~]# asterisk -V
Asterisk 1.6.2.19

dahdi-linux-complete-2.4.0+2.4.0

MySQL Server version: 5.0.77 Source distribution

Please help me resolve this issue as this server is production server and having issues periodically. Please let me know if any other information is required.

Thanks

Linux kernel bug.

Thanks Dave,

So asterisk is fine and I have to update the linux kernel. One more favor: can I simply do it like yum update with -y switch to update everything on my system and it will also update the kernel…?

Cheers!

It is a kernel problem, but not necessarily a fixed one. One can be sure it is a kernel problem as only bugs in kernel mode code (or hardware faults) can cause most kernel mode failures. As this doesn’t look like a dahdi call, and dahdi isn’t on the call stack, the presumption has to be that it is a kernel level problem.

If you replace the kernel, you must use a version of dahdi built against that exact version of the kernel.

(It appears to be on a network write.)