Dead air on recorded calls

Asterisk 1.6.2.16.1 built by root @ pbx.local on a i686 running Linux on 2011-02-13 18:33:31 UTC
DAHDI version is 2.4.0

We have 5 servers - four using Rhino RCB24 cards connected to POTS-style phones, one with a Rhino RxT1 card connected to a PRI and two T1s for a total of 96 phones and 71 outgoing DAHDI channels. All of the services are outbound, there is no inbound calling.

After our recent upgrade from 1.2 to 1.6, we have been experiencing a problem with the recorded audio and with call clarity.

If I “reload” the servers, everything is fine for anywhere from 10 to 20 minutes. After that, the calls start getting “noisy” - the people on the phones call them “spaceship noises”. It’s the same kind of noise that happens when you connect the tip on one phone and the ring on another through a butt-set. After that, the audio on the calls start to drop out; sometimes intermittently during a call. Sometimes, the audio drops before the call gets connected.

If I “reload” again, the problem goes away for a while, but it consistently comes back.

The first indication of a problem was the audio quality of our call recordings - we record all outgoing calls, which means we record every call. We convert the wav files produced by MixMonitor to mp3 files. Under the old system, this worked great - the new system’s recordings are barely audible (chopping, clipping, dead air, every audio problem you can imagine).

To test Rhino’s theory that the problem was gain on the Rhino cards, I adjusted the tx and rx gain on each of the cards to see if that helped. After the reload, everything is golden for about 10 minutes.

We have a couple of folks that connect via SIP phones from another location - they are seeing exactly the same symptoms as the people here in the office (with the POTS handsets). I decided to try an experiment and changed the outbound route to a “fall-back” SIP trunk to see what would happen.

I reloaded - the problem was gone, for about 20 minutes.

So, regardless of how the CSRs connect to the system, their calls end up choppy, garbled, or just dead air. This tells me that the Rhino cards are not part of the problem.

Note - this isn’t an RTP problem, the only SIP/IAX2 connections in the network (normally) are through the dedicated GIG Ethernet that connects the five servers. The firewalls on all of the internal network connections are turned off, so RTP and TOS are not culprits. Also remember that these problems didn’t exist on our old 1.2 system, and we can’t go back to 1.2 because we had to upgrade all of the Rhino cards for the 1.6 upgrade.

The only other place I’ve seen something that sounds like a similar problem was someone that talked about the “synchronous writing to the hda0 drive causes latency” in an file system that is “problematic”. Would moving the /var/spool/asterisk/monitor directory to a NFS drive make any difference (since the writes would then be asynch)?

We don’t use chanspy to monitor calls in progress - all we need is “record()”.