Hint Watchers increment indefinitely

I’m currently investigating a memory leak that I believe to be linked to Hints. Looking across our fleet of installs, the common pattern is that installs Sideboards that monitor extension state + Hints without endpoints. Exists in several versions, testing below is 16.2

Orange > 1000 handsets, no subscriptions. Blue < 200 endpoints, a number of hints without endpoints and a lot of subscriptions. End of graph is removal of hints without endpoints.

Running under valgrind + don’t optimise set my lab environment with several hundred hints and approximately 90 subscriptions to hints without endpoints and a couple with, results in the following after roughly 24 hours.

Memory start:
89ca75f1acd9 1.66% 613.5MiB / 31.26GiB 1.92% 0B / 0B 174MB / 0B 69

Memory finish:
89ca75f1acd9 1.60% 958.9MiB / 31.26GiB 3.00% 0B / 0B 176MB / 0B 64

Example of watcher growth:

6010@default        : PJSIP/805EC00F6D1F    State:Idle            Presence:not_set         Watchers  0
6011@default        : PJSIP/001565F655C7    State:Idle            Presence:not_set         Watchers  1
db6b57d1-84e3-47c3-a:                       State:Unavailable     Presence:not_set         Watchers  0
7002@default        :                       State:Unavailable     Presence:not_set         Watchers 70
7003@default        :                       State:Unavailable     Presence:not_set         Watchers 70
7000@default        :                       State:Unavailable     Presence:not_set         Watchers 17
7001@default        :                       State:Unavailable     Presence:not_set         Watchers 70
7006@default        :                       State:Unavailable     Presence:not_set         Watchers 70
7007@default        :                       State:Unavailable     Presence:not_set         Watchers 70
7004@default        :                       State:Unavailable     Presence:not_set         Watchers 70
6fa36fe5-2d79-4c1d-9:                       State:Unavailable     Presence:not_set         Watchers  0
7005@default        :                       State:Unavailable     Presence:not_set         Watchers 70
268d8d5f-2def-4714-8:                       State:Unavailable     Presence:not_set         Watchers  0
7008@default        :                       State:Unavailable     Presence:not_set         Watchers 69
7009@default        :                       State:Unavailable     Presence:not_set         Watchers 70

The attached valgrind.txt doesn’t appear to contain anything relevant. The current workaround is not populating hints that don’t have endpoints. However while the hardware could be configured not to subscribe to extensions without endpoints, our softphone solution doesn’t appear as flexible.

Doing a core restart gracefully frees the memory + resets the watcher count.

Setting a custom presence as the endpoint resolves the issue.

6010@default        : PJSIP/805EC00F6D1F    State:Idle            Presence:not_set         Watchers  0
6011@default        : PJSIP/001565F655C7    State:Idle            Presence:not_set         Watchers  1
db6b57d1-84e3-47c3-a: Custom:NA             State:Idle            Presence:not_set         Watchers  0
7002@default        : Custom:NA             State:Idle            Presence:not_set         Watchers  1
7003@default        : Custom:NA             State:Idle            Presence:not_set         Watchers  1
7000@default        : Custom:NA             State:Idle            Presence:not_set         Watchers  0
7001@default        : Custom:NA             State:Idle            Presence:not_set         Watchers  1
7006@default        : Custom:NA             State:Idle            Presence:not_set         Watchers  1
7007@default        : Custom:NA             State:Idle            Presence:not_set         Watchers  1
7004@default        : Custom:NA             State:Idle            Presence:not_set         Watchers  1
6fa36fe5-2d79-4c1d-9: Custom:NA             State:Idle            Presence:not_set         Watchers  0
7005@default        : Custom:NA             State:Idle            Presence:not_set         Watchers  1
268d8d5f-2def-4714-8: Custom:NA             State:Idle            Presence:not_set         Watchers  0
7008@default        : Custom:NA             State:Idle            Presence:not_set         Watchers  1
7009@default        : Custom:NA             State:Idle            Presence:not_set         Watchers  1

Memory after ~18 hours:

08dd79bf980c        2.73%               478.6MiB / 31.26GiB   1.50%               0B / 0B             15.1MB / 0B         79
08dd79bf980c        1.67%               417.8MiB / 31.26GiB   1.31%               0B / 0B             16.1MB / 0B         64

The documentation doesn’t appear to suggest that a hint without an endpoint is invalid, but leaking memory isn’t desirable.

In the case without a device specified, the hint still worked except it always produced an Unavailable state. This is perfectly fine and by itself would not cause the watchers count to go up. Watchers go up as the number of interested parties subscribe for interest on it. This only happens when something does that - the device state core won’t periodically subscribe or do it itself. Without actually seeing how everything is working together, can’t really comment further. I suspect that the problem isn’t here but may be these sideboards and how they behave when faced with that.

Interesting, I hadn’t considered that. All the Devices with Sideboards are Yealink T46G or T46S with an EXP40, in some cases 2. In my lab its a T46S with 2, but pretty much all the yealinks do BLF in the same manner (every single handset line can be a BLF, sideboards just add more). I’ll compare the the SIP messages for both scenarios and see what it all looks like.

And that has led me to an open bug report and I can confirm that I see the same behaviour.

SUBSCRIBE sip:x.x.x.95:5060 SIP/2.0
Via: SIP/2.0/UDP x.x.x.104:5060;branch=z9hG4bK975514473
From: "Phone-001" <sip:805EC0017273@somedomain.com.au>;tag=3191372314
To: <sip:6004@somedomain.com.au>;tag=0de9c0a5-7ffc-4116-9290-4bbac5f96259
Call-ID: 0_2914073161@x.x.x.104
CSeq: 3 SUBSCRIBE
Contact: <sip:805EC0017273@x.x.x.104:5060>
Authorization: <redacted>
Accept: application/dialog-info+xml
Max-Forwards: 70
User-Agent: Yealink SIP-T46S 66.83.0.10
Expires: 1800
Event: dialog
Content-Length: 0

SIP/2.0 500 Unhandled by dialog usages
Via: SIP/2.0/UDP x.x.x.104:5060;received=x.x.x.104;branch=z9hG4bK975514473
Call-ID: 0_2914073161@x.x.x.104
From: "Phone-001" <sip:805EC0017273@somedomain.com.au>;tag=3191372314
To: <sip:6004@somedomain.com.au>;tag=0de9c0a5-7ffc-4116-9290-4bbac5f96259
CSeq: 3 SUBSCRIBE
Server: Asterisk PBX 16.2.0
Content-Length:  0