[SOLVED] Problem with TDM400P, Asterisk, FXS module lockups


#1

I got a fun one… Probably some sort of hardware issue (I really hope not).

Symptoms: After 5 months of faithful, reliable service first FXS module on TDM400P will stop all audio. Does not ring, no dial tone, can’t dial, asterisk can’t access. Only solution is cold reboot of system. Restarting asterisk, unloading/loading modules and warm reboot/reset doesn’t work. THis will recur every 1-3 days now. After the FXS locks, but before I cold boot, the rest of the modules are still usable/accessible and the TDM card keeps generating interrupts.

The thing that totally boggles my mind is that this config was working fine for 5 months (june-october), and then by upgrading to zaptel/asterisk 1.0.9 from 1.0.8 – the problems started. I tried switching back to 1.0.8 but the problems followed. Deleting all modules from the kernel module directory before recompiling (and unloading modules) didn’t help.

I have never experienced any clicks, pops, hiss, intermittent echo or other problems that seem to be linked to interrupt issues. But interrupt issues are the only common symtoms that have stuck out from my research in the asterisk-user mailing list and asterisk issue tracker.

When trying to restart interface after a FXS failure, I get the following errors:

[code]Nov 9 21:46:01 [kernel] Zapata Telephony Interface Registered on major 196
Nov 9 21:46:06 [kernel] Freshmaker version: 71
Nov 9 21:46:06 [kernel] Freshmaker passed register test
Nov 9 21:46:09 [kernel] Timeout waiting for calibration of module 0

  • Last output repeated twice -
    Nov 9 21:46:11 [kernel] Proslic Failed on Second Attempt to Auto Calibrate
    Nov 9 21:46:12 [kernel] Proslic Failed on Second Attempt to Calibrate
    Manually. (Try -DNO_CALIBRATION in Makefile)
    Nov 9 21:46:12 [kernel] Module 0: FAILED FXS (FCC)
    Nov 9 21:46:13 [kernel] Module 1: Installed – AUTO FXS/DPO
    Nov 9 21:46:13 [kernel] Module 2: Not installed
    Nov 9 21:46:13 [kernel] Module 3: Installed – AUTO FXO (FCC mode)
    Nov 9 21:46:13 [kernel] Found a Wildcard TDM: Wildcard TDM400P REV E/F (4 modules) [/code]

TDM400P is not sharing interrupt with any other board. Tried compiling Asterisk/Zaptel cvshead with -DNO_CALIBRATE but this can’t unwedge a stuck FXS module.

TDM400P board with 2 FXS, 1 FXO module, E/F hardware revision. Purchased in May, in production since June of this year. Computer details: I was using kernel 2.6.7 successfully; then problems started. Tried using 2.6.13, same issues. Have tried both XT-PIC and IO-APIC configurations. Motherboard: ASUS A7V8X-X, 1.5GB ram. Processor: Athlon XP 2500+.

zttest indicates values between 100-99.7%; zttool doesn’t seem to indicate missed interrupts or errors. Looking at a cat /proc/interrupts after a FXS lock, nothing stands out; the TDM card still is generating interrupts.

cat /proc/interrupts:

CPU0 0: 78985244 IO-APIC-edge timer 1: 33581 IO-APIC-edge i8042 9: 0 IO-APIC-level acpi 12: 208128 IO-APIC-edge i8042 14: 1497428 IO-APIC-edge ide0 15: 28 IO-APIC-edge ide1 169: 16786404 IO-APIC-level serial, nvidia 177: 820153 IO-APIC-level ide2, ide3 185: 50 IO-APIC-level ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4 193: 1524819 IO-APIC-level eth0 201: 7 IO-APIC-level aic7xxx 209: 53662 IO-APIC-level VIA8233 217: 2728 IO-APIC-level wctdm NMI: 0 LOC: 78987647 ERR: 0 MIS: 0

lspci -v says:

00:0e.0 Communication controller: Tiger Jet Network Inc. Intel 537 Subsystem: Unknown device b100:0003 Flags: bus master, medium devsel, latency 32, IRQ 217 I/O ports at 9800 [size=256] Memory at f1000000 (32-bit, non-prefetchable) [size=4K] Capabilities: [40] Power Management version 2


#2

What is the port number? For some reason I have run into problems with port 1 on these cards. You should contact Digium if you suspect a HW problem. Does the problem follow the line? Probably not a line problem but it would not hurt to check.


#3

Yup, that rings a bell. Always seems to be the first module causing problems. I tried switching to using module 2 (no moving hardware around, just in the config files and unpluged the handset from 1, moved to 2) to see if that could be the problem. Still had a crash on the second module, but when I tried to reload the wcfxo/wcfxs/zaptel modules it was module 1 that didn’t want to reload.

It is an FXS card, so not an incoming line from telco, but local extension. I tried switching to a different handset, still testing… No conclusive answer yet.

Also, as a note, I have noticed “power alarm on module 2” dmesg errors. Don’t know if it could be a power problem.

I tried contacting digium support, and they sent me like 2 pages of info on troubleshooting interrupt problems. I don’t think it is an interrupt problem because I seem to be missing notable symptoms like dropped calls, noise,. squeaks, clicks, hiss… And zttool never says a peep about missing interrupts.


#4

Solved this problem.

Got in touch with Digium support. The problem was that I was running the TDM400 card on a computer with excessive interrupt load (besides the digium card).

I put it on its own machine, turned off X server and framebuffer, and removed all other hardware and functions that the machine didn’t need (usb, sound, printer port, serial ports, everything) was doing. For reference, I put it on a ASUS VIA motherboard A7V266-E with an AMD Athlon 2400+ processor and 512MB ram. Didn’t have to change the DMA level on the hard disk (left at UDMA 5), though digium did recommend this.

In summary: The digium TDM400 card is a total pig for interrupts. If anything else is trying to use them on the machine, and it can’t get them 100% of the time, it will do weird things (like weird hardware things). It really likes to work alone. The upside to this is that the minimum hardware requirements are 500Mhz and PCI 2.x or better (although have heard of people getting it working with less). Though it really needs it own machine to run, the price of hardware won’t kill you.