Underun detected by hardware error

system · June 30, 2014, 3:52pm

Hi all. We are currently running 9 asterisk systems at remote locations. We are in the process of swapping out our older asterisk systems with new hardware, and are having some issues with our first test system.

We are using a single TE133 card in a CentOS system with an Intel Atom C2750 processor and 4GB of RAM. HD’s are (2) intel SSD’s in RAID0. Hardware is all new.

We are using asterisk 11.9.0 and dahdi 2.9.1.1.

At random times, several times a week, dahdi will crap out and we’ll get the following message dumped into our log file repeatedly:

And dahdi will go down:

[Jun 28 03:03:47] WARNING[2097] sig_pri.c: Span 1: D-channel is down!

This will happen at seemingly random times, even when the system isn’t under load (such as at 3am). Restarting the dahdi service fixes the problem and the phone system is useable again until the next time it happens.

Any help would be approciated.

Thanks.

malcolmd · June 30, 2014, 5:37pm

Howdy,

Please contact our Support department directly via digium.com/support

Cheers

sruffell · June 30, 2014, 5:59pm

You might want to open a ticket with Digium’s technical support to help troubleshoot this problem.

But, based on what you said here, it sounds like something is happening on your host system which is either preventing the interrupt handler from running in a timely fashion (i.e., is the system going into a low power mode? Is there a framebuffer running, a slow serial console? ) or interrupts are not being routed reliably on this platform.

munozj · August 21, 2014, 7:37pm

I’m having the same issue. What was your resolution?

tipstrade · November 17, 2014, 10:46am

Like munozj, I’m having a very similar problem:

TE133
Asrock C2550D4i motherboard
Centos (FreePBX 6.5, kernel 2.6.32-431.el6.x86_64)
Asterisk 11.13.1
DAHDI Version: 2.10.0.1 Echo Canceller: HWEC
2xSamsung SSDs (Intel controllers) in RAID-0
All brand new hardware

We were getting hardware under-runs after about 5.5 to 6.5 days, but it too would happen at quiet times. Framebuffers are disabled.

After replacing the PCIe riser card, we thought we fixed the problem, but not we get hardware under-runs after about 10 days. Our next option is to discard the riser completely, but that involves replacing the case (1U chassis), so before I go ahead and do that, I wanted to know if you resolved the issue.

Many thanks,
John

Edit: Additional info

tipstrade · November 24, 2014, 1:40pm

Update - swapped the system into a new case so I don’t need a riser card, and I got an underrun after less than 4 days - a new record!

I beginning to wonder if the card just doesn’t like the motherboard / chipset. I’ve asked Digium support about that and am waiting for a reply. At this rate, I’m going to have a 2nd machine around

tipstrade · December 1, 2014, 1:22pm

Another update - spoke to Digium and there “appears” to be an issue that they are trying to patch.

I also wondered whether this was being caused by power management, so I decided to disable ACPI and APIC by adding this to the kernel configuration line in /boot/grub/menu.lst

A consequence is that the PRI card is no longer on it’s own IRQ, but is sharing it with a USB hub (not in use) and smbus. However, by monitoring /proc/interrupts, I can see that I’m getting an average of 1,005 interrupts per second (Min: 1,003; Max: 1,010).

So far it’s been up for almost 7 days - I await the dreaded “John, the phones are down”!

dtobal · March 11, 2015, 8:27pm

Hi Tipstrade (John?),

Did you solve your Digium problem?

I am getting the same condition… my system is a DELL PE T410 … Raid1… no Riser card.

Have you updated your kernel?
I am asking this, because in another forum I found an issue related to kernel version 2.6.32.

Your post is from 01 Dec '14… so… Dahdi has not been up to dated from 2.10.1 … no patches from Digium to solve this issue?

In time… I opened a ticket in Digium, and I am praying to system won’t hang up in this interval…

Thanks
Denilson

sruffell · March 31, 2015, 6:37pm

Just a heads up that commit wcxb: Fix “I/O error reported by firmware” followed by underruns was recently added to the master branch of DAHDI and I believe will resolve these types of errors.

It will be in dahdi-linux 2.11 when it is released but you should feel free to run the master branch now if you would like.

Topic		Replies	Views
Asterisk not seeing Dadhi as inactive or line down Asterisk Hardware	1	211	July 2, 2023
Dahdi seems not ready for Primetime Asterisk Support	1	311	March 5, 2010
Dahdi Error (chan_dahdi.c:2981 my_handle_dchan_exception) Asterisk Support	9	1443	March 16, 2017
Dahdi Fails to detect calls after (?) event Asterisk Support	1	198	March 26, 2012
DAHDI PRI HDLC Abort on D-Channel Asterisk Support	5	1644	May 30, 2014

Underun detected by hardware error

Related topics