I was able to make my test system flake out. I changed qualify to no for the gateway and cluster/nodes, then set rtcachefriends to no. Did a sip reload. Got a backtrace here:
So what versions of unixODBC and the MySQL connector for ODBC are you using? I downloaded unixODBC-2.3.4 and version 5.3.6 of the mysql-connector-odbc, but after having so many little issues and not being able to reproduce them in a test env, I’m a little hesitant.
This morning, I received some warnings, and asterisk exited.
[Sep 19 10:03:05] WARNING[797] res_odbc.c: SQL Execute returned an error -1: 08S01: [MySQL][ODBC 5.1 Driver][mysqld-5.5.49-0ubuntu0.14.04.1-log]Lost connection to MySQL server during query (104)
[Sep 19 10:03:05] WARNING[797] res_odbc.c: SQL Execute error -1! Verifying connection to asterisk [asterisk-connector]...
[Sep 19 10:03:05] WARNING[797] res_odbc.c: Connection is down attempting to reconnect...
...
[Sep 19 10:03:10] NOTICE[797] res_odbc.c: Connecting asterisk
[Sep 19 10:03:10] NOTICE[797] res_odbc.c: res_odbc: Connected to asterisk [asterisk-connector]
...
[Sep 19 10:03:10] WARNING[797] res_odbc.c: SQL Execute returned an error -1: 08S01: [MySQL][ODBC 5.1 Driver][mysqld-5.5.49-0ubuntu0.14.04.1-log]Lost connection to MySQL server during query (104)
[Sep 19 10:03:10] WARNING[797] res_odbc.c: SQL Execute error -1! Verifying connection to asterisk [asterisk-connector]...
[Sep 19 10:03:10] WARNING[797] res_odbc.c: Connection is down attempting to reconnect...
...
[Sep 19 10:20:20] Asterisk 13.5.0 built by root @ asterisk-twc01 on a x86_64 running Linux on 2016-08-18 01:31:35 UTC
I’m running 2.3.4 and 5.3.4 respectively. I wouldn’t change anything right now.
I do have something for you to try though.
After upgrading to 13.11.2 but before starting asterisk, delete or rename /var/lib/asterisk/astdb.sqlite3 and blank out the ipaddr, port, regseconds and fullcontact fields in sippeers (in the test environment of course) then start asterisk and see if you can reproduce the issue.
Well, rtpcachefriends is totally broken as I’ve discovered. It doesn’t appear that it ever worked properly. Just leave it on.
Can you try the delete of astdb and clear of the database fields on a production pbx tonight?
The prod server is still at 13.5. If I delete those, the devices would likely re-register by morning. Is there any other testing I can do in test? Have you narrowed it down to anything?
I may have but I can’t reproduce the issue without using rtpcachefriends=no and since it’s broken already I can’t tell if I’ve done anything. Ok, don’t try anything on the prod server and I’ll keep investigating.
I have no problems with the taskprocessors, but I haven’t really had a problem on the test server ever with that. In order to check, I’ll have to load it in live briefly. Hopefully, I can do that tonight.
Even if it works, I can’t leave it there due to the rt vm errors. This I can reproduce.