Help finding a memory leak

I was able to make my test system flake out. I changed qualify to no for the gateway and cluster/nodes, then set rtcachefriends to no. Did a sip reload. Got a backtrace here:

https://drive.google.com/open?id=0BxUsrDyPCTFAaW1vX3Btb1N5aVk

Also ran core show locks before the backtrace:

https://drive.google.com/open?id=0BxUsrDyPCTFASVh6bVBQLU9SRGs

Now I have 2 stuck in the queue for on extension.
subp:12966@from-internal-sip-00000015 0 2 0 450 500

No locks, but here’s the current backtrace:

https://drive.google.com/open?id=0BxUsrDyPCTFAQTBQZXZfOXR2THM

I was just able to reproduce it.
Now I gotta fix it without breaking something else.

Sounds like my world. Thank you, and good luck.

So what versions of unixODBC and the MySQL connector for ODBC are you using? I downloaded unixODBC-2.3.4 and version 5.3.6 of the mysql-connector-odbc, but after having so many little issues and not being able to reproduce them in a test env, I’m a little hesitant.

This morning, I received some warnings, and asterisk exited.

[Sep 19 10:03:05] WARNING[797] res_odbc.c: SQL Execute returned an error -1: 08S01: [MySQL][ODBC 5.1 Driver][mysqld-5.5.49-0ubuntu0.14.04.1-log]Lost connection to MySQL server during query (104)
[Sep 19 10:03:05] WARNING[797] res_odbc.c: SQL Execute error -1! Verifying connection to asterisk [asterisk-connector]...
[Sep 19 10:03:05] WARNING[797] res_odbc.c: Connection is down attempting to reconnect...
...
[Sep 19 10:03:10] NOTICE[797] res_odbc.c: Connecting asterisk
[Sep 19 10:03:10] NOTICE[797] res_odbc.c: res_odbc: Connected to asterisk [asterisk-connector]
...
[Sep 19 10:03:10] WARNING[797] res_odbc.c: SQL Execute returned an error -1: 08S01: [MySQL][ODBC 5.1 Driver][mysqld-5.5.49-0ubuntu0.14.04.1-log]Lost connection to MySQL server during query (104)
[Sep 19 10:03:10] WARNING[797] res_odbc.c: SQL Execute error -1! Verifying connection to asterisk [asterisk-connector]...
[Sep 19 10:03:10] WARNING[797] res_odbc.c: Connection is down attempting to reconnect...
...
[Sep 19 10:20:20] Asterisk 13.5.0 built by root @ asterisk-twc01 on a x86_64 running Linux on 2016-08-18 01:31:35 UTC

I’m running 2.3.4 and 5.3.4 respectively. I wouldn’t change anything right now.

I do have something for you to try though.
After upgrading to 13.11.2 but before starting asterisk, delete or rename /var/lib/asterisk/astdb.sqlite3 and blank out the ipaddr, port, regseconds and fullcontact fields in sippeers (in the test environment of course) then start asterisk and see if you can reproduce the issue.

I’m doing it now, but only for the dynamic hosts, right?

update sippeers set ipaddr=’’,port=’’,regseconds=’’,fullcontact=’’ WHERE host=‘dynamic’

yep, sorry. dynamic only

I did that. Once it was restarted, I changed rtcachefriends to no, then ran sip reload.

[Sep 19 13:25:16] WARNING[6948]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a17' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:16] WARNING[6882]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a19' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6944]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a1b' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6955]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a1d' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6879]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a1f' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6927]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a21' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6915]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a23' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6947]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a25' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6932]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a27' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6945]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a29' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6917]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a2b' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6922]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a2d' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6935]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a2f' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6952]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a31' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6957]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a33' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6933]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a35' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6923]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a37' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6924]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a39' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6946]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a3b' task processor queue reached 500 scheduled tasks.
[Sep 19 13:25:17] WARNING[6956]: taskprocessor.c:887 taskprocessor_push: The 'subp:12966@from-internal-sip-00000a3d' task processor queue reached 500 scheduled tasks.
asterisk-whconfig01*CLI> core show taskprocessors

Processor                                      Processed   In Queue  Max Depth  Low water High water
app_voicemail                                          0          0          0        450        500
ast_msg_queue                                          0          0          0        450        500
CCSS_core                                              0          0          0        450        500
Sorcery                                                0          0          0        450        500
Sorcery-control                                        1          0          1        450        500
sorcery/bucket-00000000                                0          0          0        450        500
sorcery/file-00000001                                  0          0          0        450        500
stasis-core                                          399       3467         73        450        500
stasis-core-control                                 4286          0         73        450        500
subm:ast_bridge_topic_all-cached-0000000d             56          0          3        450        500
subm:ast_channel_topic_all-00000011                  206          0         17        450        500
subm:ast_channel_topic_all-cached-0000000c           307          0         19        450        500
subm:ast_channel_topic_all-cached-0000000e           306          0         19        450        500
subm:ast_device_state_topic-00000002                  48          0          2        450        500
subm:ast_device_state_topic-00000004                  47          0          2        450        500
subm:ast_device_state_topic-00000010                  46          0          2        450        500
subm:ast_parking-00000009                              3          0          1        450        500
subm:ast_presence_state_topic_all-00000005             1          0          1        450        500
subm:ast_security-0000000f                           189          0          1        450        500
subm:ast_system-00000008                              32          0         30        450        500
subm:ast_system-0000000a                              31          0          4        450        500
subm:cdr_engine-00000003                             367          0         19       4500       5000
subm:endpoint_topic_all-cached-00000007             5834          0         19        450        500
subm:manager_topic-00000006                         2430          0         19       2700       3000
subp:12966@from-internal-sip-0000001f                 51         32         79        450        500
subp:12966@from-internal-sip-00000023                 41       1732       1710        450        500
subp:12966@from-internal-sip-00000025                 45       1727       1711        450        500
subp:12966@from-internal-sip-00000026                 33       1738       1083        450        500
subp:12966@from-internal-sip-00000027                 32       1736       1083        450        500
subp:12966@from-internal-sip-00000029                 43       1726       1715        450        500


3670 taskprocessors

Missed the first message, in case it would be helpful. The stasis-core was the first on to receive the warning.

asterisk-whconfig01*CLI> sip reload
 Reloading SIP
  == Parsing '/etc/asterisk/sip.conf': Found
  == Using SIP CoS mark 4
[Sep 19 13:24:53] WARNING[6920]: taskprocessor.c:887 taskprocessor_push: The 'stasis-core' task processor queue reached 500 scheduled tasks.
asterisk-whconfig01*CLI> core show taskprocessors

Processor                                      Processed   In Queue  Max Depth  Low water High water
app_voicemail                                          0          0          0        450        500
ast_msg_queue                                          0          0          0        450        500
CCSS_core                                              0          0          0        450        500
Sorcery                                                0          0          0        450        500
Sorcery-control                                        1          0          1        450        500
sorcery/bucket-00000000                                0          0          0        450        500
sorcery/file-00000001                                  0          0          0        450        500
stasis-core                                          399        590         73        450        500
stasis-core-control                                 1409          0         73        450        500
subm:ast_bridge_topic_all-cached-0000000d             56          0          3        450        500
subm:ast_channel_topic_all-00000011                  206          0         17        450        500
subm:ast_channel_topic_all-cached-0000000c           307          0         19        450        500
subm:ast_channel_topic_all-cached-0000000e           306          0         19        450        500
subm:ast_device_state_topic-00000002                  46          0          2        450        500
subm:ast_device_state_topic-00000004                  45          0          2        450        500
subm:ast_device_state_topic-00000010                  44          0          2        450        500
subm:ast_parking-00000009                              3          0          1        450        500
subm:ast_presence_state_topic_all-00000005             1          0          1        450        500
subm:ast_security-0000000f                           163          0          1        450        500
subm:ast_system-00000008                              32          0         30        450        500
subm:ast_system-0000000a                              31          0          4        450        500
subm:cdr_engine-00000003                             367          0         19       4500       5000
subm:endpoint_topic_all-cached-00000007             1515          0         19        450        500
subm:manager_topic-00000006                          964          0         19       2700       3000
subp:12966@from-internal-sip-0000001f                  9         74         79        450        500
subp:12966@from-internal-sip-00000023                 11        325        311        450        500
subp:12966@from-internal-sip-00000025                 11        324        308        450        500
subp:12966@from-internal-sip-00000026                  9        325        326        450        500
subp:12966@from-internal-sip-00000027                  7        324        284        450        500
...

Well, rtpcachefriends is totally broken as I’ve discovered. It doesn’t appear that it ever worked properly. Just leave it on.
Can you try the delete of astdb and clear of the database fields on a production pbx tonight?

The prod server is still at 13.5. If I delete those, the devices would likely re-register by morning. Is there any other testing I can do in test? Have you narrowed it down to anything?

I may have but I can’t reproduce the issue without using rtpcachefriends=no and since it’s broken already I can’t tell if I’ve done anything. Ok, don’t try anything on the prod server and I’ll keep investigating.

Wanna try a patch?

https://gerrit.asterisk.org/#/c/3962/1

This should apply over 13.11 or the current git 13 branch.

Let me know hot it works.

Absolutely! Will load in test tomorrow. If all goes well, into live tomorrow evening. I’ll let you know. Thanks,

Just so I don’t mess it up, correct me if I’m wrong, 'cause I don’t do this often.

Copy the diff file to the /usr/src/asterisk-13.11.02/ directory, then run…
patch -p 1 -i d3bce352.diff

Correct?

I see there’s still some internal discussion on this patch. Should I wait?

Grab the latest version but no need to wait. At this point the only way to know if this actually works is for you to try it.

Working on it. Had to email one of our network admins to allow git to that address through the proxy. It’s caching and fails to authenticate.

Got it. Installed. Took longer to get than I thought today. Will have to test it.

I have no problems with the taskprocessors, but I haven’t really had a problem on the test server ever with that. In order to check, I’ll have to load it in live briefly. Hopefully, I can do that tonight.

Even if it works, I can’t leave it there due to the rt vm errors. This I can reproduce.