Asterisk randomly stops processing SIP calls

Hi,

I was hoping someone may be able to help with my issue, which I have so far been unable to resolve.

We have been running Asterisk 1.8.4.2 in production for some time now, on Centos 5.6 (32bit) machines, without any issues whatsoever.

We are now trying to swap them out with some new machines, running Centos 6.4 (64bit), Asterisk 1.8.15-cert2, same config/dialplan etc. However, randomly, after a period of no load, Asterisk stops processing SIP calls yet the CLI and manager interface are both fully responsive.

When it happens, the command “netstat -anp |grep 5070” (the port we are listening on) shows lots of packets queued waiting to be processed.

Then, all of a sudden, after 10-15 mins or so, Asterisk springs back into life and the queued messages are processed.

I have the output of “core show locks” and also a backtrace but can’t see a way to attach files here - I can provide links to them if required.

I hope someone has some experience of a similar issue, as I am just about running out of ideas :frowning:

Many thanks in advance,

Charles

TCPDUMP and Wireshark might help you on this issue.

Thanks for the reply, but that’s the first thing I tried. The SIP messages come in but there is no response from Asterisk while it is in this state. As said before, the messages appear to be queued and are eventually processed when Asterisk wakes up again.

Make sure you are using a current version of dahdi, or a non-dahdi timing source. If that is OK, you need to follow the deadlock dbugging procedures:

build with thread debugging enabled and optimisation disabled (the former has as significant performance cost). When it stalls, run the “core show locks” CLI command (start a new CLI connection, if necessary) and use gcore to get a dump then get backtraces (google “asterisk wiki backtrace” for details.

Thanks, David - I am only using Dahdi as a timing source and it is the most current version. I have disabled all other timing modules.

As per my original post, I already have the output of “core show locks” and also a backtrace, but there is nowhere to attach them here and I didn’t like to paste them directly into the message :confused:

I have now placed them here for reference:

dl.dropboxusercontent.com/u/30150555/locks.txt
dl.dropboxusercontent.com/u/301 … hreads.txt

(I have replaced references to any public IPs with xx.xx.xxx.xxx).

Thanks again for your help.

Charles

It appears to be blocked in the “realtime” mysql handler. I believe that is community supported, which could mean that getting a fix will take a long time.

It looks to me as though it is trying to read the table “ast_sip_buddies”, but may have lost the database connection and is trying to re-establish it.

Thanks - I did think that, but Wireshark shows no attempts to reconnect and the Mysql server is up the entire time (it is in use by other Asterisk boxes and I can connect to it manually using mysql client), so not sure what Asterisk is waiting for in this instance.

I will try switching to res_odbc and report back. I have searched before but didn’t really find any definitive answers - is there a significant performance hit from introducing the additional layer of abstraction or is it negligible? I understand it will largely depend on hardware and how heavily the database is used in our configuration, but there must have been some kind of comparison or benchmark performed somewhere in the past?

Cheers :smile:

It looks like Asterisk addons are trying to verify the mysql connection, but the mysql library is not returning from the call to mysql_ping.

I suppose it could also be looping on reconnect.

} else { /* MySQL likes to return an error, even if it reconnects successfully. * So the postman pings twice. */ if (mysql_ping(&conn->handle) != 0 && (usleep(1) + 2 > 0) && mysql_ping(&conn->handle) != 0) { conn->connected = 0; conn->connect_time = 0; ast_log(LOG_ERROR, "MySQL RealTime: Ping failed (%d). Trying an explicit reconnect.\n", mysql_errno(&conn->handle)); ast_debug(1, "MySQL RealTime: Server Error (%d): %s\n", mysql_errno(&conn->handle), mysql_error(&conn->handle)); goto reconnect_tryagain; }

Yes, it could be, although there are no entries like that in the log to suggest that is what is happening :confused:

I suppose it could be an issue with libmysqlclient - currently, the latest is installed from RPM but I think I will try the one in the Centos repo (although older, it’s likely to be 100% compatible with the OS, at least).

Update…

  1. Tried changing libmysqlclient to the one in the Centos repo - no difference.
  2. Switched to using res_odbc for Asterisk realtime - still no difference!

So, it seems the issue is outside of Asterisk, at least. However, I’ll be blowed if I can find any issue whatsoever with the DB or network :frowning:

I realise this is an Asterisk forum, but if anyone does have any ideas as to where to look next, I’d be glad to hear them!

Thank you, david, for your help before :smile: