Stasis bridging limitations

Our Stasis App uses ARI bridging to create voice conferences. As more participants are added, we see heavy CPU usage during mute/unmute operations. With 60 participants in a conference, TOP shows 90%+ (and sometimes over 100%). CPU loading does fall afterwards, but unpredictably - so the loading can remain uncomfortably high after such an operation.

It would be helpful to know a couple of things about the way the bridging is implemented on the processor

Are we skirting disaster here, or is this a normal pattern for garbage collection? Can we assume that we are operating within safe levels, and resources will be released on demand?

Presumably each bridge must run on a single thread. If we increase the processor speed will this help us? Will adding a voice compression card help us? Would it be sensible to break each conference into sub-conferences, so that several ‘child’ bridges each have an originated call to a ‘master’ bridge (we’d have more co-ordination code, but it seems possible)?

Bridging is indeed done on a single thread, splitting up participants into separate bridges would spread them across things.

You’ll need to be specific about muting though - what precisely are you doing?

We are doing a group mute operation, which consists of issuing a one-way mute to a group (usually most) of the callers, so that there is background silence for a single speaker to be heard. This is implemented using multiple POST /channels/{channelId}/mute, issued from a .Net parallel for loop. (It would be nice to have a POST whose body could contain a list of channels - a batch mute).

We can definitely look at splitting into linked subconferences. Better hardware support wouldn’t help?

The problem is likely that flood of mutes, not the bridge itself. Profiling and deeper investigation would need to be done to confirm but under the normal bridge case where it doesn’t occur I haven’t heard of anyone experiencing the same.

Would the flood of mutes result in the persistence of very high CPU usage long after the commands were actually sent? Here’s an example.

  1. We have approx. 50 muted participants, 10 unmuted. CPU is 66%, until
  2. After a minute, we unmute the 50 participants (everyone now unmuted). CPU runs at 85%, until
  3. After a couple of minutes we mute everyone again. CPU now runs at 68%.

So the CPU is not changing with the commands being issued, but rather with the state into which the channels have been placed.

I guess the real question is more to do with the danger of crashing Asterisk. Back to the original question - is a reading of 90% dangerous in a situation like this? Like I said, I have seen CPU usage in TOP of over 100%. Is Asterisk capable of handling peaks by smoothing out resource use, or are we really running close to our limits. We’d be willing to install a compression card, or get a faster server, if this would make a significant difference.

It depends, I haven’t profiled or looked at that specific use case and the ramifications of what it is doing so I can’t really answer in detail. Asterisk itself isn’t responsible for resource usage and allocation, it’s up to the scheduling of the underlying system. You can also exceed 100% if there are multiple CPUs or cores in the system. 100% could be an entire core used while others are free.

You can narrow it down yourself some though. If there’s 60 channels in a bridge unmuted and CPU usage is still high then muting isn’t the cause.

You keep bringing up a compression card - are you using G729?

Thank you for all this. We’re not using G.729, but my thinking was that if we were, and we had a compression card, it might free up the system CPU to handle more of the Asterisk admin. At the moment, we are thinking about the subconferences idea, which seems like a way to run one conference across a number of cores.