I have some possibly basic questions about the stasis/m:manager:core task processor and the output from
manager show eventq.
Work incident that prompted the questions below
At work we’ve recently started using AMI over HTTP to pause and unpause members of our support queue. This has basically worked great so far, so yesterday we tried expanding the scope of our AMI usage by showing pause/unpause status in an internal web app, which worked by occasionally polling our backend to check on the “QueueStatus” of the “Member” using the web app. In aggregate this worked out to about 1 req/sec to asterisk at peak usage, which was perhaps too much.
Over the course of the day we noticed an increasing number of warnings about how “The ‘stasis/m:manager:core-00000006’ task processor queue reached” so many tasks, and asterisk also started responding with 503s to our telco partner when customers attempted to call our support queue. We turned off the paused/unpaused polling in our web app, but it didn’t seem to have a big effect on the task processor depth; running
core show taskprocessors like stasis/m:manager would routinely show 1000-4000 tasks “In Queue”, even hours after turning off the polling.
We were getting more and more 503s and weren’t sure what else to do, so we did a
core restart now and then everything started working again. Later in the day we still saw a fair number of 503s (nowhere near as many as earlier in the day, but far more than we ordinarily have), and we continued to see a large number of warnings about
What is the expected output of
manager show eventq? Running this just now on our production asterisk server (mid-morning in our timezone) produced 27M of output, produced another one of these warnings in our logs about how the “The ‘stasis/m:manager:core-00000006’ task processor queue reached 3000 scheduled tasks”, and even resulted in us sending ~20 503 SIP responses to customers trying to call our queue. Why are these events still in the eventq, or am I misunderstanding something? I’m not sure why they’re all being stored there
Also, is there any way to avoid these 503s? As far as I can tell, our asterisk server never seemed to be too stressed during our issues yesterday (fairly normal CPU and memory usage), and e.g. just now when I ran
manager show eventq we were nowhere near peak traffic.
Finally, is there an asterisk command to show some info on which tasks are occupying
stasis/m:manager:core, rather than just the number of tasks? I would love to understand better what it’s struggling with!