Underun detected by hardware error

Hi all. We are currently running 9 asterisk systems at remote locations. We are in the process of swapping out our older asterisk systems with new hardware, and are having some issues with our first test system.

We are using a single TE133 card in a CentOS system with an Intel Atom C2750 processor and 4GB of RAM. HD’s are (2) intel SSD’s in RAID0. Hardware is all new.

We are using asterisk 11.9.0 and dahdi 2.9.1.1.

At random times, several times a week, dahdi will crap out and we’ll get the following message dumped into our log file repeatedly:

And dahdi will go down:

[Jun 28 03:03:47] WARNING[2097] sig_pri.c: Span 1: D-channel is down!

This will happen at seemingly random times, even when the system isn’t under load (such as at 3am). Restarting the dahdi service fixes the problem and the phone system is useable again until the next time it happens.

Any help would be approciated.

Thanks.

Howdy,

Please contact our Support department directly via digium.com/support

Cheers

You might want to open a ticket with Digium’s technical support to help troubleshoot this problem.

But, based on what you said here, it sounds like something is happening on your host system which is either preventing the interrupt handler from running in a timely fashion (i.e., is the system going into a low power mode? Is there a framebuffer running, a slow serial console? ) or interrupts are not being routed reliably on this platform.

I’m having the same issue. What was your resolution?

Like munozj, I’m having a very similar problem:

  • TE133
  • Asrock C2550D4i motherboard
  • Centos (FreePBX 6.5, kernel 2.6.32-431.el6.x86_64)
  • Asterisk 11.13.1
  • DAHDI Version: 2.10.0.1 Echo Canceller: HWEC
  • 2xSamsung SSDs (Intel controllers) in RAID-0
  • All brand new hardware

We were getting hardware under-runs after about 5.5 to 6.5 days, but it too would happen at quiet times. Framebuffers are disabled.

After replacing the PCIe riser card, we thought we fixed the problem, but not we get hardware under-runs after about 10 days. Our next option is to discard the riser completely, but that involves replacing the case (1U chassis), so before I go ahead and do that, I wanted to know if you resolved the issue.

Many thanks,
John

Edit: Additional info

Update - swapped the system into a new case so I don’t need a riser card, and I got an underrun after less than 4 days - a new record!

I beginning to wonder if the card just doesn’t like the motherboard / chipset. I’ve asked Digium support about that and am waiting for a reply. At this rate, I’m going to have a 2nd machine around

Another update - spoke to Digium and there “appears” to be an issue that they are trying to patch.

I also wondered whether this was being caused by power management, so I decided to disable ACPI and APIC by adding this to the kernel configuration line in /boot/grub/menu.lst

A consequence is that the PRI card is no longer on it’s own IRQ, but is sharing it with a USB hub (not in use) and smbus. However, by monitoring /proc/interrupts, I can see that I’m getting an average of 1,005 interrupts per second (Min: 1,003; Max: 1,010).

So far it’s been up for almost 7 days - I await the dreaded “John, the phones are down”!

Hi Tipstrade (John?),

Did you solve your Digium problem?

I am getting the same condition… my system is a DELL PE T410 … Raid1… no Riser card.

Have you updated your kernel?
I am asking this, because in another forum I found an issue related to kernel version 2.6.32.

Your post is from 01 Dec '14… so… Dahdi has not been up to dated from 2.10.1 … no patches from Digium to solve this issue?

In time… I opened a ticket in Digium, and I am praying to system won’t hang up in this interval…

Thanks
Denilson

Just a heads up that commit wcxb: Fix “I/O error reported by firmware” followed by underruns was recently added to the master branch of DAHDI and I believe will resolve these types of errors.

It will be in dahdi-linux 2.11 when it is released but you should feel free to run the master branch now if you would like.