Hi there!
I am struggling with Asterisk memory issues for quite long now, the problem has different nature: sometimes memory gets fully utilized in a matter of couple days, sometimes in weeks, one in common, it’s never freed on itself - the service should be restarted, or OOM killer will jump in and do its job.
At first, affected machines were at 4GB RAM, after adding +4, it just delayed the problems, but did not solve it, basically it just bought us a bit more time before we have to restart the service manually. Upgrade to the latest v16.29.0 did not help as well. It’s fairly hard to reproduce the same behaviour on a local instance, so this is a problem specific only to production.
I know that I can re-build Asterisk package with MALLOC_DEBUG flag to retrieve more specific info, but since this will concern production machines it might be problematic, of course, I can get more info/configs on demand, but if we could avoid re-building package - would be great.
$ ps aux --sort -rss
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
asterisk 17914 4.9 83.5 9016412 6692700 ? Ssl лис18 166:39 /usr/sbin/asterisk -f -C /etc/asterisk/asterisk.conf
...
P.S. I don’t think that setting cron job for restarting Asterisk service or dropping cache is a good idea, rather an ugly hack, so I’m looking forward to any suggestions.
I think for anyone to progress this, you will need to reproduce it on a supported version of Asterisk, and provide details of what you are doing on your system that is unusual. Asterisk 16 has been security fixes only for over a month, now.
I don’t understand the reference to “dropping cache”. As far as I know, cache could only really apply to the the system cache, which deliberately grows to use most of the otherwise free memory, and does not represent a memory use problem. However, your ps does seem to show a very large usage for Asterisk, itself, although I have no idea of the provenance of the graphs.
I don’t think that I could emphasize anything unusual on Asterisk itself.
Production application is passing calls to it, using different modules, no Asterisk’s sorcery cache is configured.
Click here to see modules
I stripped modules description in order to comply with 32000 char limit.
I don’t understand the reference to “dropping cache”.
I meant thissync; echo 3 > /proc/sys/vm/drop_caches.
although I have no idea of the provenance of the graphs.
These are from 2 production Asterisk machines that should help to see that memory spikes are not consistent over time, which, I think, might be tied to the number of proccessed calls, but something is not being cleaned up after call ends, and that’s why memory keeps growing, but barely being freed.
I thought 16.x is not that old after all, as you pointed out, it reached security maintenance only about month ago.
That’s accounted to the system, not Asterisk, and, by design, grows to fill nearly all available memory. It represents memory contents that is available for reloading from the disk, so can be thrown away, when an application needs more writeable space. Windows does the same thing.
It’s certainly better if you upgrade from 16, but in reality the differences between 16 and 18 are not as many as the numbers might imply, so some information from it is better than none if you can’t upgrade.
MALLOC_DEBUG is probably the way to go. Dump the output of “memory show summary” to a text file and then do so again periodically, or after you think you’ve encountered a leak. If you diff the files it should provide some context as to where it is, because one module may be increasing over time.
Once you’ve done that, you can narrow it down further using “memory show allocations” and grep the module name and then diff at different points in times to show you where the leaked memory was allocated.
At a minimum, that’s probably what’s needed to report an issue.