Hello. I’m compiling some information regarding warnings from the Stasis Task Processor we have received during some tests. We have a server application connected to our Asterisk instance via ARI, and request a dozen hundred or so originates over the span of a couple of minutes. We eventually reach the high water value:
Are there other consequences we should be wary about when we reach the high water value? The blog post mentions an increase in CPU and memory load. Is it possible to actually lose messages from Asterisk if the queue goes after a certain value as well, for example? Or be unable to successfully perform other requests until the queue is emptied enough?
We use pretty much vanilla configurations for the most part. Is it safe to proceed with the mindset that adopting custom values for the stasis threadpool and ensuring unused modules are disabled help against reaching this specific high water, then? (It seems a fairly obvious question, but it goes a way into helping to better grasp what I’m dealing with here.)
It’s not possible to lose messages but could it affect other requests, possibly. It really depends on what your application does.
Before you start tweaking stasis threadpool stuff, take a look at your ARI application. It sounds like it’s not keeping up with the call rate you’re attempting. One possible solution is to run multiple instances of your application using an “ari proxy” that can distribute requests from Asterisk to multiple app instances. There are multiple implementation of an ari proxy service but a google search will bring them up.
If you do decide to tweak the threadpool just remember that there are still a finite number of resources so make small controlled changes and keep track of what the effects are.
I see the queue message sometimes for relatively small systems, where not really a lot is going on.
One of these systems has currently about 200 registered phones and it always shows the message at Asterisk startup and not afterwards (only a few MWI, etc messages), if the virtual machine is running on a single processor. There are no messages for 4 or more CPUs. Since there is no real load on these systems, I think it could be that waiting CPUs may sometimes be ultimately responsible, but this is only a guess. It would imply that one should check the CPU resources before doing anything else.
It would be interesting to know, whether my assumption makes any sense.
Yeah, one CPU really isn’t enough to handle anything more than a few phones. It’s not a matter of CPU cycle availability but of parallel thread availability. In such a case, you can set the threadpool as high as you want but there’s still only 1 CPU thread doing all the work.
Thank you for the answers. I will look into ways to verify the capacity on our end too.
To clarify on the points raised, the architecture that was tested to yield these results is effectively a dialer-type application. It uses ARI to send the originates with parametrized variables and consumes UserEvents in the dialplans it makes use of in order to register the call results according to our system’s specifications. The machines running Asterisk have 8 cores and are dedicated to running it.
It is actually being modified right now in another branch to make primary use of AMI instead, and we will have a proxy layer and a load balancer between the requester and Asterisk instances. We will have ARI involved as well, but only on dialplans effectively making use of Stasis(), so I’m led to think it will have less of an impact in its Task Processor. In the long run, we hope to have everything running using Kubernetes and scale as needed.
On that matter, are there any concerns I should raise to the rest of the dev team here regarding this throughput via AMI? We happen to have some people with experience creating applications with a robust amount of simultaneous calls using just AMI, just not in our current development toolkit.