Asterisk ignoring IP phones after a time

I was using asterisk-1.6.1.0-rc5 until this morning, just downloaded and installed asterisk-1.6.1.0. The rest of this post is wrt rc5. I won’t know if the behavior is in the release version until later today or tomorrow.

At the moment I have two Asterisk servers. One is being used as a switchboard for some IP phones. It’s a dev PBX for connecting developers/testers to the actual application. The other server is the actual application, with most of the work being done via an AGI daemon.

The SIP phones I’ve been given are Cisco 7905g desk sets. These are kind of old now, but that’s what I have.

Running on Debian 4.0.

Things work pretty well when I bring up both servers. I can call from the IP phones to the dev PBX and have it redirected to the core server where stuff happens and results percolate back to the phone (I hear recordings on the handset). This goes on for some period of time.

If, however, I leave the systems alone for a while (e.g. long lunch or overnight) the phones will no longer connect to the dev pbx. I tried rebooting the phones but no dice. I ran tcpdump on the dev pbx server and it clearly shows SIP packets coming in from the phones but there is no response from Asterisk to the phones. I do see some traffic between the two peers.

I’ve been running with verbose set to 3 and now I’ve added debug of 3, but thus far I’ve seen no messages indicating a problem.

I’ve gone through the console commands and showed everything that made any kind of sense. Nothing jumped out at me.

Is there something else I can do to trace this down? Assuming, of course, that the problem continues to occur in 1.6.1.0.

Takes about an hour for Asterisk to lose contact with the phones. As before, tcpdump shows the packet from the phone to Asterisk:

11:45:23.180698 IP 10.104.230.40.sip > vspbx1.devint.marchex.com.sip: SIP, length: 446

but there is never a response back from Asterisk as in the case where the call goes through:

11:49:10.824491 IP vspbx1.devint.marchex.com.sip > 10.104.230.40.sip: SIP, length: 503

Rebooting the dev pbx (to which the phones directly connect) fixes the problem, as has been the case all along.

http://bugs.digium.com/view.php?id=15014

Hi

Looking at your sip.conf I see nat is set to yes ?

but no localnet or externip or even bind address is defined.
Also if teh sets are realy nated then using qualify may be a good idea.

you deed to do a capture with tcpdump and stick it in wireshark to see whats hapening as well

Ian

The bug I created has been resolved. There is a lot of data in the bug but the short answer is that chan_iax2 was being started without a configuration file due to autoload=yes in modules.conf. It was starting a timer using res_timing_pthread and the action of the timer was filling up a pipe one byte at a time with no one reading the pipe. The write to the pipe is inside of a crucial lock, so when the pipe fills up the system deadlocks. This only happened on a 64-bit server.

A fix has been committed to the Asterisk SVN repository on head and several current branches. In the interim, the work-around is to use noload => chan_iax2.so in modules.conf.

If you’re not using IAX (and have no configuration file) you may not see the issue. I found that on my 32-bit desktop it never happened because the chan_iax2.so initialized prior to res_timing_pthread.so and the timer was therefore not created. Using preload => res_timing_pthread.so in modules.conf on my 32-bit system would activate the bug and cause the deadlock.

Be aware that using autoload=yes in modules.conf loads modules in an unspecified order that may vary depending on platform. I was able to demonstrate different directory traversal order on my desktop and the server (see the bug for details).