Memory leaks in rtp_engine.c

makdorf · February 16, 2018, 10:21am

Problem described in ASTERISK-27281 and I’ve faced the same issue.
I did some investigation and there is what I’ve found out.
When I remove lines which relate to the «realtime» and memory_cache from sorcery.conf then «rtp_engine.c» doesn’t allocate memory during the time. This is figured out in two pastes below

mem_alloc - sorcery memory cache and realtime dis. (0m. up)
mem_alloc - sorcery memory cache and realtime dis. (35m. up)

When I add lines which relate to the «realtime» and memory_cache to the sorcery.conf then memory allocation by «rtp_engine.c» starts to increase during the time. This is also figured out in two pastes below
mem_alloc - sorcery memory cache and realtime en. (0m. up)
mem_alloc - sorcery memory cache and realtime en. (1h. up)

During the test, the PBX didn’t receive or make a calls

My config:
Asterisk certified/13.18-cert2
15 endpoints
6 of them uses UDP transport (2 of them stored in realtime)
9 of them uses wss transport (all of them stored in realtime DB)

Asterisk and DB are working in different docker containers. Asterisk 15.2 and 14.6.2 have the same behavior.
@jcolp you are my hope =)

jcolp · February 16, 2018, 10:53am

Have you used the other CLI commands provided by MALLOC_DEBUG to show precisely what the allocations are?

Nothing comes to mind that would touch rtp_engine.c for the sorcery memory cache.

makdorf · February 16, 2018, 10:58am

No, I haven’t. But I’m ready for any suggestions due to I stuck

jcolp · February 16, 2018, 11:00am

memory show allocations rtp_engine.c

Would be the one to use. Afterwards you’ll need to file an issue[1] with configuration, full details, everything.

[1] https://issues.asterisk.org/jira

makdorf · February 16, 2018, 11:02am

there is huge output for memory show allocations rtp_engine.c

https://pastebin.com/3WmwNzi7

jcolp · February 16, 2018, 11:04am

And do you have many things in the memory cache? You can check using “memory cache show” by tab completing and getting information on the endpoint cache.

makdorf · February 16, 2018, 11:05am

do you mean sorcery memory cache show?

jcolp · February 16, 2018, 11:05am

Yes, I do. It’s early in the morning.

makdorf · February 16, 2018, 11:09am

AORs:

CLI> sorcery memory cache show res_pjsip/aor
Sorcery memory cache: res_pjsip/aor
Number of objects within cache: 15
Maximum allowed objects: 100
Number of seconds before object expires: 1800
Number of seconds before object becomes stale: 1500
Expire all objects on reload: On

Auth:

CLI> sorcery memory cache show res_pjsip/auth
Sorcery memory cache: res_pjsip/auth
Number of objects within cache: 15
There is no limit on the maximum number of objects in the cache
Number of seconds before object expires: 1800
Number of seconds before object becomes stale: 200
Expire all objects on reload: On

Endpoint:

CLI> sorcery memory cache show res_pjsip/endpoint
Sorcery memory cache: res_pjsip/endpoint
Number of objects within cache: 15
Maximum allowed objects: 100
Number of seconds before object expires: 600
Number of seconds before object becomes stale: 60
Expire all objects on reload: On

Registration:

CLI> sorcery memory cache show res_pjsip/registration
Sorcery memory cache: res_pjsip/registration
Number of objects within cache: 0
Maximum allowed objects: 100
Number of seconds before object expires: 600
Number of seconds before object becomes stale: 60
Expire all objects on reload: On

Transport:

CLI> sorcery memory cache show res_pjsip/transport
Sorcery memory cache: res_pjsip/transport
Number of objects within cache: 3
There is no limit on the maximum number of objects in the cache
Object expiration is not enabled - cached objects will not expire
Object staleness is not enabled - cached objects will not go stale
Expire all objects on reload: On

Identify:

CLI> sorcery memory cache show res_pjsip/identify
Sorcery memory cache: res_pjsip/identify
Number of objects within cache: 2
Maximum allowed objects: 100
Number of seconds before object expires: 3600
Number of seconds before object becomes stale: 60
Expire all objects on reload: On

jcolp · February 16, 2018, 11:10am

You’ll have to give further context. Is that after startup? After you believe that memory is leaking?

makdorf · February 16, 2018, 11:14am

If I got your point right it is after 2 hours uptime.

jcolp · February 16, 2018, 11:14am

I think there is a leak, but it’s not because of the memory cache itself. It just makes it easier to see. Please file an issue on the issue tracker[1].

[1] https://issues.asterisk.org/jira

makdorf · February 16, 2018, 11:16am

Okay, I’ll do it and I’ll post ticket number here.

makdorf · February 16, 2018, 12:02pm

I created new issue ASTERISK-27679

makdorf · February 23, 2018, 10:53am

@jcolp hello, again. There are memory show allocations {xmldoc,pbx,stringfields,astobj2,media_index,format_cap}.c
xmldoc.txt (451.9 KB)
pbx.txt (258.1 KB)
stringfields.txt (293.6 KB)
astobj2.txt (277.8 KB)
media_index.txt (588.5 KB)
format_cap.txt (500.6 KB)

I’ve posted it here for the one reason - I’ve applied the patch which provided by your team in Jira and when I open top I see that the RES counter and MEM% are increased during the time. Not so fast as before patch but anyway the issue is here.

The allocations counter for these modules is rise more than others. What additional steps I should do to help you to find a solution?

jcolp · February 23, 2018, 1:42pm

You need to provide context for what you’ve done when providing the information, or else it’s impossible to know whether something is abnormal or not. The specific scenario you’ve run, how much, etc and bugs always need to go through the issue tracker[1] but you’ll need to provide the information I mentioned.

[1] https://issues.asterisk.org/jira