PJSIP TLS transport memory leak?

On Saturday morning, I enabled the TLS transport in my PJSIP configuration and turned on SIPS for our Polycom phones and our provider trunks this weekend. I wanted to test TLS while we have a lower weekend call volume. Things are working, aside from some non-fatal errors. However, I am concerned that memory usage is gradually increasing since this change. Here is a graph from Observium showing memory usage the last 15 days:

I have never seen Asterisk consume memory like this.

I tried to investigate what was filling most of the memory, and it appears to be thousands of copies of the same CA certificates. I have less than 30 contacts online this weekend. Should I be concerned? Or should the dirty anonymous private pages eventually be released?

Find the PID…

# ps auxww | grep [/]usr/sbin/asterisk
root     20633  1.0 17.0 5182868 1003084 ?     S<l  Jan29  24:01 /usr/sbin/asterisk -C /var/taskeasy/gap/etc/asterisk.conf -p -f -g -U root -G asterisk

Summarize /proc/20633/smaps to see where the memory usage is:

# ~/bin/smaps-diag.pl 20633
[heap]:
  private        -   [clean]      29.3 M [dirty]
   shared        -   [clean]         -   [dirty]

[mmap]:
  private      1.1 M [clean]     916.3 M [dirty]
   shared      1.4 M [clean]         -   [dirty]

[stack]:
  private        -   [clean]      84.0 k [dirty]
   shared        -   [clean]         -   [dirty]

[vvar]:
  private        -   [clean]         -   [dirty]
   shared        -   [clean]         -   [dirty]

Sort maps by dirty pages, get the largest ones:

# pmap -x 20633 | sort -n -k 4 | tail
00007fafe4000000   65524   65524   65524 rw---   [ anon ]
00007fafd0000000   65532   65532   65532 rw---   [ anon ]
00007fafd4000000   65536   65536   65536 rw---   [ anon ]
00007fafec000000   65536   65536   65536 rw---   [ anon ]
00007fb0d0000000   65536   65536   65536 rw---   [ anon ]
00007fafb8000000   85872   85872   85872 rw---   [ anon ]
00007fafd8000000  131056  131056  131056 rw---   [ anon ]
00007fafc8000000  131060  131060  131060 rw---   [ anon ]
00007fafc0000000  131068  131068  131068 rw---   [ anon ]
total kB         5182868 1003844  972964

Find the page address range for the largest:

# cat /proc/20633/smaps | grep -B 9 'Private_Dirty:[[:space:]]\+131068'
7fafc0000000-7fafc7fff000 rw-p 00000000 00:00 0
Size:             131068 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:              131068 kB
Pss:              131068 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:    131068 kB

Dump that memory:

# gdb -p 20633 <<EOF

dump binary memory /var/spool/asterisk/tmp/dirty-shm 0x7fafc0000000 0x7fafc7fff000
EOF

Chop out binary data:

strings /var/spool/asterisk/tmp/dirty-shm > /var/spool/asterisk/tmp/dirty-shm-strings

105M of the 128M is plain text:

# ls -alh /var/spool/asterisk/tmp/dirty-shm*
-rw-r--r-- 1 root root 128M Jan 31 11:14 /var/spool/asterisk/tmp/dirty-shm
-rw-r--r-- 1 root root 105M Jan 31 11:15 /var/spool/asterisk/tmp/dirty-shm-strings

I looked at the plain text, and noticed that had SIP protocol text and a huge amount of CA certificates. I was curious to see how many times the same certificate repeated in the same memory, and they number in the tens of thousands:

# openssl crl2pkcs7 -nocrl -certfile /var/spool/asterisk/tmp/dirty-shm-strings | openssl pkcs7 -print_certs -text -noout | grep Subject: | sort | uniq -c
    772         Subject: C=GB, ST=Greater Manchester, L=Salford, O=Comodo CA Limited, CN=AAA Certificate Services
    773         Subject: C=GB, ST=Greater Manchester, L=Salford, O=Sectigo Limited, CN=Sectigo RSA Domain Validation Secure Server CA
      1         Subject: CN=0004F2173F38
   4761         Subject: CN=Polycom Equipment Issuing CA 1
  15537         Subject: CN=Polycom Equipment Issuing CA 2
  20294         Subject: CN=Polycom Equipment Policy CA
    771         Subject: CN=*.taskeasy.com
   1289         Subject: CN=us-west-or.sip.flowroute.com
   1281         Subject: CN=us-west-wa.sip.flowroute.com
   2447         Subject: C=US, O=DigiCert Inc, OU=www.digicert.com, CN=Thawte RSA CA 2018
   2572         Subject: C=US, O=Let's Encrypt, CN=R3
   1070         Subject: C=US, ST=California, L=San Francisco, O=Twilio, Inc., CN=*.pstn.twilio.com
    773         Subject: C=US, ST=New Jersey, L=Jersey City, O=The USERTRUST Network, CN=USERTrust RSA Certification Authority
      1         Subject: O=Polycom Inc., CN=0004F22DECCA
      1         Subject: O=Polycom Inc., CN=0004F258F269

All of that for just 28 contacts.

# asterisk -rx 'pjsip show contacts' | grep ^Objects
Objects found: 28

In the time I wrote this post, it has continued to increase by 67M:

# ~/bin/smaps-diag.pl 20633
[heap]:
  private        -   [clean]      29.3 M [dirty]
   shared        -   [clean]         -   [dirty]

[mmap]:
  private      1.1 M [clean]     983.9 M [dirty]
   shared      1.4 M [clean]         -   [dirty]

[stack]:
  private        -   [clean]      84.0 k [dirty]
   shared        -   [clean]         -   [dirty]

[vvar]:
  private        -   [clean]         -   [dirty]
   shared        -   [clean]         -   [dirty]

I guess maybe this should be a bug report instead.

After running for a day without TLS, Asterisk is definitely behaving better, and this is even with more contacts online and much higher call volume.

# ~/bin/smaps-diag.pl $(ps auxww | grep [/]usr/sbin/asterisk | awk '{print $2}')
[heap]:
  private        -   [clean]      27.5 M [dirty]
   shared        -   [clean]         -   [dirty]

[mmap]:
  private      2.4 M [clean]      74.5 M [dirty]
   shared     28.0 k [clean]         -   [dirty]

[stack]:
  private        -   [clean]      88.0 k [dirty]
   shared        -   [clean]         -   [dirty]

[vvar]:
  private        -   [clean]         -   [dirty]
   shared        -   [clean]         -   [dirty]

This is what the graph looks like. It actually went down a bit as people log off for the day.

The only difference between this and this past weekend is that all of my contacts and trunks are not using TLS and Asterisk isn’t storing all of those certificates repeatedly.

It looks like this may be a pjproject issue that was fixed in 2.10.

https://trac.pjsip.org/repos/ticket/2244

1 Like

My tests have proven that version 2.10 of pjproject did indeed fix this memory leak.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.