Confused about stasis/m:manager:core warnings, `manager show eventq`, and AMI

Hi everyone,

I have some possibly basic questions about the stasis/m:manager:core task processor and the output from manager show eventq.

Work incident that prompted the questions below

At work we’ve recently started using AMI over HTTP to pause and unpause members of our support queue. This has basically worked great so far, so yesterday we tried expanding the scope of our AMI usage by showing pause/unpause status in an internal web app, which worked by occasionally polling our backend to check on the “QueueStatus” of the “Member” using the web app. In aggregate this worked out to about 1 req/sec to asterisk at peak usage, which was perhaps too much.

Over the course of the day we noticed an increasing number of warnings about how “The ‘stasis/m:manager:core-00000006’ task processor queue reached” so many tasks, and asterisk also started responding with 503s to our telco partner when customers attempted to call our support queue. We turned off the paused/unpaused polling in our web app, but it didn’t seem to have a big effect on the task processor depth; running core show taskprocessors like stasis/m:manager would routinely show 1000-4000 tasks “In Queue”, even hours after turning off the polling.

We were getting more and more 503s and weren’t sure what else to do, so we did a core restart now and then everything started working again. Later in the day we still saw a fair number of 503s (nowhere near as many as earlier in the day, but far more than we ordinarily have), and we continued to see a large number of warnings about stasis/m:manager:core tasks.

Questions

What is the expected output of manager show eventq? Running this just now on our production asterisk server (mid-morning in our timezone) produced 27M of output, produced another one of these warnings in our logs about how the “The ‘stasis/m:manager:core-00000006’ task processor queue reached 3000 scheduled tasks”, and even resulted in us sending ~20 503 SIP responses to customers trying to call our queue. Why are these events still in the eventq, or am I misunderstanding something? I’m not sure why they’re all being stored there :thinking:

Also, is there any way to avoid these 503s? As far as I can tell, our asterisk server never seemed to be too stressed during our issues yesterday (fairly normal CPU and memory usage), and e.g. just now when I ran manager show eventq we were nowhere near peak traffic.

Finally, is there an asterisk command to show some info on which tasks are occupying stasis/m:manager:core, rather than just the number of tasks? I would love to understand better what it’s struggling with!

I would really really really suggest not using the HTTP AMI interface. It does not see virtually any use, hasn’t been touched since it was really created, and is just not how people use AMI.

As for examining the stasis queue - there isn’t a way currently. And the eventq for manager should decrease, but I don’t know how that works with HTTP.

1 Like

Ah, interesting! Shoot, that’s good to know—I’ll switch us over to the TCP interface and see if things look any better.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.