Have installed a high-availability system with two asterisks:
Four ethernet cards:
1. Network.
2. Heartbeat dedicated.
3 and 4. Two FoneBRIDGEs for 8 E1s.
The system is working OK until the spans go down. Asterisk logs:
[May 23 10:19:52] WARNING[11781] chan_dahdi.c: Detected alarm on channel 94: Yellow Alarm
[May 23 10:19:52] WARNING[11781] chan_dahdi.c: Detected alarm on channel 95: Yellow Alarm
.
. (every channel of the span 4)
.
[May 23 10:19:52] NOTICE[11780] chan_dahdi.c: PRI got event: Alarm (4) on D-channel of span 4
.
.
.
[May 23 10:19:52] NOTICE[11781] chan_dahdi.c: Alarm cleared on channel 94
[May 23 10:19:52] NOTICE[11781] chan_dahdi.c: Alarm cleared on channel 95
.
. (again, alarm clear in every channel)
.
[May 23 10:19:52] NOTICE[11780] chan_dahdi.c: PRI got event: No more alarm (5) on D-channel of span 4
The kernel logs say:
May 23 10:19:52 ambato kernel: TDMoX: New master: DYN/ethmf/eth1/00:50:c2:65:d7:10/3
The jumps are random: I don’t have any other warning or error message, doesn’t have any order (can be the 1, 2, 3, 4, etc.), doesn’t jump the same times (once, twice, 23 times, etc.). But they just occur when we have telephony traffic: from 7am to 9pm.
The project is not finished: we are expecting a lot of people using the system. Now, we are using only the first span, and they lost telephony when the span 1 jumps. But the other spans are already connected and waiting for traffic.
I have disconnected the avahi-daemon (no more avahi messages in kernel logs) and set the IRQs of ethernet card (no more HDLC messages in Asterisk logs). The only messages I have in kernel and Asterisk are the examples show above. Heartbeat has been disconnected and we are working with only one system and 4 spans. The jumps continue as always…
Telco is going to check the spans, but by the night… Any sugestion? Someone has had a similar scenario?