We have some issues with asterisk 1.6.2.18-19, that I hope someone can help us with.
Customer 1 had .18 from the beginning, but we experienced memory leaks and was glad to read that .19 took care of those issues - we upgraded to .19.
The thing that happened upgrade was that the CPU maxed out after a while and then causing alot of errors. (Used more CPU from the beginning than .18).
We reverted to .18 again and scheduled regular restarts of asterisk, to avoid memory leak problems.
We log the activity of asterisk in top every 15 minutes. In one case after a unplanned restart we actually had two asterisk processes running (this only happened once though). To solve this we had to manually restart asterisk.
Customer 2 has .19, but has fewer clients and no mixmonitor recording . the CPU is somewhat high, but does not max out.
Any ideas? We are now looking at 1.8.5.0 - not sure if that solves out problem, but we have some indications that it might be a solution.
Below are the main issues in more details.
Issues 1.6.2.18:
Memory leak: Asterisk leaks memory over time, which results in untimely restarts.
Although this log doesn’t show a restart, it has occured.
A cron job restarts asterisk at 12:40 and a cronjob takes a top snapshot every 15 minutes.
Notice that the mem column slowly grows over time.
Wed Jul 6 12:45:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 476m 19m 6636 S 0 1.0 0:00.84 asterisk
Wed Jul 6 13:00:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 476m 19m 6680 S 0 1.0 0:02.06 asterisk
Wed Jul 6 13:15:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 615m 50m 7340 S 32 2.5 2:25.31 asterisk
Wed Jul 6 13:30:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 710m 90m 7364 S 30 4.5 7:11.83 asterisk
Wed Jul 6 13:45:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 736m 129m 7368 S 30 6.4 12:10.48 asterisk
Wed Jul 6 14:00:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 762m 169m 7372 S 24 8.5 16:59.23 asterisk
Wed Jul 6 14:15:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 789m 211m 7364 S 20 10.5 22:05.06 asterisk
Wed Jul 6 14:30:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 815m 250m 7360 S 22 12.5 27:12.60 asterisk
Wed Jul 6 14:45:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 839m 289m 7356 S 24 14.4 32:05.90 asterisk
Wed Jul 6 15:00:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 853m 323m 7356 S 12 16.1 36:35.96 asterisk
Wed Jul 6 15:15:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 840m 325m 7340 S 4 16.2 37:25.60 asterisk
Wed Jul 6 15:30:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 879m 367m 7340 S 26 18.3 41:43.53 asterisk
Wed Jul 6 15:45:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 954m 391m 7320 S 26 19.5 46:29.26 asterisk
Wed Jul 6 16:00:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 953m 400m 7312 S 26 20.0 51:13.64 asterisk
Wed Jul 6 16:15:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 953m 406m 7312 S 26 20.3 55:28.70 asterisk
Wed Jul 6 16:30:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 953m 411m 7308 S 18 20.5 59:43.93 asterisk
Wed Jul 6 16:45:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24954 root 20 0 953m 416m 7276 S 22 20.7 62:34.02 asterisk
|
|
------- An unplanned restart happens between these two points in time.
|
|
Wed Jul 6 17:00:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 root 20 0 501m 26m 7328 S 6 1.3 0:21.56 asterisk
Wed Jul 6 17:15:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 root 20 0 572m 36m 7348 S 6 1.8 1:28.05 asterisk
Wed Jul 6 17:30:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 root 20 0 579m 49m 7356 S 10 2.5 2:40.68 asterisk
Wed Jul 6 17:45:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 root 20 0 589m 62m 7364 S 10 3.1 3:55.15 asterisk
Wed Jul 6 18:00:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 root 20 0 594m 73m 7372 S 6 3.6 5:09.73 asterisk
Wed Jul 6 18:15:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 root 20 0 602m 83m 7372 S 6 4.2 6:21.94 asterisk
Wed Jul 6 18:30:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 root 20 0 611m 97m 7372 S 12 4.8 7:37.94 asterisk
Wed Jul 6 18:45:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 root 20 0 618m 108m 7364 S 6 5.4 8:53.90 asterisk
Wed Jul 6 19:00:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 root 20 0 624m 118m 7364 S 2 5.9 10:06.03 asterisk
Wed Jul 6 19:15:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 root 20 0 625m 118m 7352 S 0 5.9 10:25.13 asterisk
Wed Jul 6 19:30:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 root 20 0 627m 124m 7344 S 8 6.2 11:08.39 asterisk
Wed Jul 6 19:45:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 root 20 0 628m 126m 7340 S 6 6.3 12:20.40 asterisk
Wed Jul 6 20:00:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 root 20 0 692m 128m 7340 S 8 6.4 13:30.57 asterisk
Wed Jul 6 20:15:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 root 20 0 692m 131m 7340 S 4 6.5 14:40.06 asterisk
Wed Jul 6 20:30:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 root 20 0 692m 133m 7336 S 10 6.6 15:45.51 asterisk
Wed Jul 6 20:45:01 CEST 2011
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 root 20 0 692m 135m 7336 S 8 6.8 16:56.46 asterisk
Issues 1.6.2.19:
CPU Maxes out: We tried using 1.6.2.19 at two different customers. The only difference between these two, is that one (customer 1) uses MixMonitor to record all calls. At that customer, we had a significantly higher CPU usage compared to 1.6.2.18 and after a while both cores maxed out, spewing a lot of ERROR[26536] res_timing_timerfd.c: Read error: Bad file descriptor messages in the logs.
While we haven’t had the same amount of trouble with 1.6.2.19 at the other customer, we still can spot a bunch of bad file descriptor messages as well.
To minimize our problems, we restart asterisk twice a day using a cron script, once at 08:00 and again at 12:40, despite this we have hit the described issues.