Analog channel is losing dial tone over night with TDM410P

Installed TDM410P with FX0 on port 1&2 and FXS on port 4 in CentOS 4.7 (Xen Dom0). Installed Asterisk 1.6.06 and DAHDI 2.1.04 and GUI 2.0.
After getting the initial configuration with one analog phone, one SIP phone and two trunk lines right, the analog phone is losing the dial tone (and connectivity) over night. I then downgraded from Asterisk 1.6.0.6 to 1.4.23.2,
but no change.
The GUI shows the analog phone (DAHDI/4-1) as busy. When I try to call the analog phone from the SIP, I get the busy:
[Mar 15 09:01:09] WARNING[1702]: app_dial.c:1286 dial_exec_full: Unable to create channel of type ‘DAHDI’ (cause 17 - User busy)
== Everyone is busy/congested at this time (1:1/0/0)
== Auto fallthrough, channel ‘SIP/101-082080c0’ status is ‘BUSY’
[Mar 15 09:01:36] WARNING[24725]: chan_sip.c:13058 handle_response: Remote host can’t match request NOTIFY to call ‘3fc225cd3ceb56291ee8c48e14a7abc5@192.168.25.250’. Giving up.

CLI> dahdi show channel 4
Channel: 4CLI>
File Descriptor: 11
Span: 1
Extension:
Dialing: no
Context: DLPN_DialPlan1
Caller ID: 100
Calling TON: 0
Caller ID name:
Destroy: 0
InAlarm: 0
Signalling Type: FXO Loopstart
Radio: 0
Owner: DAHDI/4-1
Real: DAHDI/4-1
Callwait:
Threeway:
Confno: -1
Propagated Conference: -1
Real in conference: 0
DSP: yes
Relax DTMF: no
Dialing/CallwaitCAS: 0/0
Default law: ulaw
Fax Handled: no
Pulse phone: no
Echo Cancellation: 128 taps, currently OFF
Actual Confinfo: Num/0, Mode/0x0000
Actual Confmute: No
Hookstate (FXS only): Offhook

CLI> dahdi show status
Description Alarms IRQ bpviol CRC4
Wildcard TDM410P Board 1 OK 39 0 0

CLI> core show channels
Channel Location State Application(Data)
DAHDI/4-1 s@DLPN_DialPlan1:1 Rsrvd (None)
1 active channel
0 active calls

CLI> core show channel DAHDI/4-1
– General –
Name: DAHDI/4-1
Type: DAHDI
UniqueID: 1237105341.109
Caller ID: 100
Caller ID Name: (N/A)
DNID Digits: (N/A)
State: Rsrvd (1)
Rings: 0
NativeFormats: 0x44 (ulaw|slin)
WriteFormat: 0x4 (ulaw)
ReadFormat: 0x4 (ulaw)
WriteTranscode: No
ReadTranscode: No
1st File Descriptor: 11
Frames in: 826
Frames out: 0
Time to Hangup: 0
Elapsed Time: 4h53m8s
Direct Bridge:
Indirect Bridge:
– PBX –
Context: DLPN_DialPlan1
Extension: s
Priority: 1
Call Group: 2
Pickup Group: 2
Application: (N/A)
Data: (None)
Blocking in: (Not Blocking)
Variables:
TRANSFERCAPABILITY=SPEECH

CDR Variables:
level 1: clid=100
level 1: src=100
level 1: dst=s
level 1: dcontext=DLPN_DialPlan1
level 1: channel=DAHDI/4-1
level 1: start=2009-03-15 04:22:21
level 1: duration=0
level 1: billsec=0
level 1: disposition=NO ANSWER
level 1: amaflags=DOCUMENTATION
level 1: uniqueid=1237105341.109

dahdi restart
Destroying channels and reloading DAHDI configuration.
– Hungup ‘DAHDI/4-1’ == Unregistered channel -2
== Unregistered channel 1
== Unregistered channel 2
== Unregistered channel 4
… still no dial tone and channel still busy and off-hook

To get the dial tone back I have to stop Asterisk, restart DAHDI and start Asterisk again.

Just found in my syslog the following:

Mar 15 04:22:01 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 10 ms in order to compensate.
Mar 15 04:22:01 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 11 ms in order to compensate.
Mar 15 04:22:01 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 12 ms in order to compensate.
Mar 15 04:22:02 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 13 ms in order to compensate.
Mar 15 04:22:02 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 14 ms in order to compensate.
Mar 15 04:22:02 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 15 ms in order to compensate.
Mar 15 04:22:02 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 16 ms in order to compensate.
Mar 15 04:22:02 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 17 ms in order to compensate.
Mar 15 04:22:02 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 18 ms in order to compensate.
Mar 15 04:22:02 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 19 ms in order to compensate.
Mar 15 04:22:03 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 20 ms in order to compensate.
Mar 15 04:22:03 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 21 ms in order to compensate.
Mar 15 04:22:03 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 22 ms in order to compensate.
Mar 15 04:22:03 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 23 ms in order to compensate.
Mar 15 04:22:04 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 24 ms in order to compensate.
Mar 15 04:22:05 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 25 ms in order to compensate.
Mar 15 04:22:05 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 26 ms in order to compensate.
Mar 15 04:22:05 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 27 ms in order to compensate.
Mar 15 04:22:21 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 28 ms in order to compensate.
Mar 15 04:22:21 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 29 ms in order to compensate.
Mar 15 04:24:19 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 30 ms in order to compensate.

That is exactly the time when the ‘off-hook’ starts. It seems to happen when the cron.daily or weekly with the default configuration starts.

Any hints what’s going on here?

Do not use Xen, it is known to not properly work with hardware.

Well, not using Xen would be the end of my Asterisk research project. I have only one server here and I am the only user on that server. I have 2 Linux DomU and one Windows server 2003 DomU. No heavy load at all. Just a proxy and DNS server and a few files.
H/W: PowerEdge SC1430, dual core, 2GB

Since I realized that the problem always happened when the daily cronjob was running, I changed the time for each DomU when the cron job will run and for the first time, the dial tone was available after the night.

Still getting a latency increase from time to time.
wctdm24xxp0: Missed interrupt. Increasing latency to 6 ms in order to compensate.
/proc/interrupts:
22: 1127046206 Phys-irq wctdm24xxp0
256: 223247127 Dynamic-irq timer0
The TDM410P has by far more interrupts than any other device in the system.
Would be great if the responsible dahdi driver area (voicebus.c) could handle this latency issue better. Like self-recovering, healing and not shutting down the TDM410P port or card.

dahdi...../linux/drivers/voicebus.c: if (DRING_SIZE == latency) { .... } else { VB_PRINTK(vb, INFO, "Missed interrupt. " \ "Increasing latency to %d ms in order to compensate.\n", latency+1); /* Set the minimum latency in case we're restarted...we don't * want to wait for the buffer to grow to this depth again in * that case. */ voicebus_set_minlatency(vb, latency+1); vbb = voicebus_alloc(vb); if (unlikely(NULL == vbb)) { BUG_ON(1); } else { vb->handle_transmit(vbb, vb->context); }

Any idea?

The latency increases is how it handles not being able to receive interrupts. There is a cap for the max latency that the driver will allow at about 30ms. When this is reached the card will not process any longer which is why you were not receiving dial tone. The possible reasons there is a latency cap are because it will start affecting the bridging of calls to other channels, affecting audio, and affecting echo cancellation if it were to continue increasing.

As it seems to be much better now after changing the cronjob you should set a starting latency. Keep it running for some time and find what the latency as has reached. Then set the starting latency to something a little more.

IE. modprobe wctdm24xxp latency=20

Thanks! Will observe the latency changes in the next few days and then set the starting latency.

But it would still be nice if the DAHDI driver could recover from this without a complete restart of DAHDI and Asterisk. I checked the zaptel driver, but the handling seem to be exactly the same.

It works a bit like a time bomb … when it reaches once the 30ms, the system needs a restart. Reached 7ms for the day …
Mar 17 12:40:36 xenserver kernel: wctdm24xxp0: Missed interrupt. Increasing latency to 7 ms in order to compensate.
I am now on dahdi: Version: 2.2.0-rc1

Should I better have tried a non-digium analog telephony card?

HI

I am wondering if there has been any change to this ?

I am getting affected.

Wondering if digium can push out the latency to sysfs (/proc) so that it can be changed on the fly.

I am getting dead virtual loops when my adsl goes down, this soaks up a lot of cpu and …