Performance CPU confbridge clean system

Hi all, this is my first time at the community. Until today never had relevant issues with asterisk systems that I operate, but recently (and probably under special circunstances… covid19) I’m experiencing some performance issues with confbridge solution.

Previously we have an active-standby cluster for all asterisk stuff (audioconference, voicemail, recording, “intelligent” dialing, voice token…). Due to intensive use of audioconference, system recently starts to generate dumps… so we deploy a virtual dedicate server to use only for that.

4 Core (Intel® Xeon® Gold 6252 CPU @ 2.10GHz)
8GB of RAM

Fresh clean asterisk 16.3 certified and only confbridges and trunks with another internal pbx. No more running on the server, no database connections, no dependencies from other systems… but still growing on load and cpu usage is still a plane around 100% with 50 calls causing bad audio experience. A single call to an empty confbridge without more participants (only music on hold) and sounds terrible.

Log shows a lot of warnings…

WARNING[32418][C-00000476]: channel.c:1124 __ast_queue_frame: Exceptionally long voice queue length queuing to CBAnn/2005-00000074;1

And some locks…

=======================================================================
=== certified/16.3-cert1
=== Currently Held Locks

===
=== <lock#> (): (times locked)

=== Thread ID: 0x7f4ca1f0f700 LWP:13757 (pbx_thread started at [ 4699] pbx.c ast_pbx_start())
=== —> Lock #0 (bridge_channel.c): MUTEX 223 ast_bridge_channel_lock_bridge bridge 0x7f4cc0061de0 (1)
[0x5139fb] asterisk lock.c:272 __ast_pthread_mutex_lock()
[0x45f908] asterisk astobj2.c:241 __ao2_lock()
[0x48697c] asterisk bridge.h:423 _ast_bridge_lock()
[0x486d86] asterisk bridge_channel.c:224 ast_bridge_channel_lock_bridge()
[0x488131] asterisk bridge_channel.c:650 bridge_channel_write_frame()
[0x48d21b] asterisk bridge_channel.c:2625 bridge_handle_trip()
[0x48d61b] asterisk bridge_channel.c:2760 bridge_channel_wait()
[0x48dde1] asterisk bridge_channel.c:2909 bridge_channel_internal_join()
[0x471fe5] asterisk bridge.c:1749 ast_bridge_join()
[0x7f4ce36af807] app_confbridge.so app_confbridge.c:2595 confbridge_exec()
[0x5451e9] asterisk pbx_app.c:492 pbx_exec()
[0x530e71] asterisk pbx.c:2927 pbx_extension_helper()
[0x534cc8] asterisk pbx.c:4177 ast_spawn_extension()
[0x53597a] asterisk pbx.c:4351 __ast_pbx_run()
[0x537124] asterisk pbx.c:4674 pbx_thread()
[0x5c8933] asterisk utils.c:1249 dummy_start()
[0x7f4d44c172de] libpthread.so.0 :0 __libpthread_freeres()
[0x7f4d43de2133] libc.so.6 :0 clone()
=== -------------------------------------------------------------------

=== Thread ID: 0x7f4ca06a5700 LWP:13830 (pbx_thread started at [ 4699] pbx.c ast_pbx_start())
=== —> Waiting for Lock #0 (bridge_channel.c): MUTEX 223 ast_bridge_channel_lock_bridge bridge 0x7f4cc0061de0 (1)
[0x5139fb] asterisk lock.c:272 __ast_pthread_mutex_lock()
[0x45f908] asterisk astobj2.c:241 __ao2_lock()
[0x48697c] asterisk bridge.h:423 _ast_bridge_lock()
[0x486d86] asterisk bridge_channel.c:224 ast_bridge_channel_lock_bridge()
[0x488131] asterisk bridge_channel.c:650 bridge_channel_write_frame()
[0x48d21b] asterisk bridge_channel.c:2625 bridge_handle_trip()
[0x48d61b] asterisk bridge_channel.c:2760 bridge_channel_wait()
[0x48dde1] asterisk bridge_channel.c:2909 bridge_channel_internal_join()
[0x471fe5] asterisk bridge.c:1749 ast_bridge_join()
[0x7f4ce36af807] app_confbridge.so app_confbridge.c:2595 confbridge_exec()
[0x5451e9] asterisk pbx_app.c:492 pbx_exec()
[0x530e71] asterisk pbx.c:2927 pbx_extension_helper()
[0x534cc8] asterisk pbx.c:4177 ast_spawn_extension()
[0x53597a] asterisk pbx.c:4351 __ast_pbx_run()
[0x537124] asterisk pbx.c:4674 pbx_thread()
[0x5c8933] asterisk utils.c:1249 dummy_start()
[0x7f4d44c172de] libpthread.so.0 :0 __libpthread_freeres()
[0x7f4d43de2133] libc.so.6 :0 clone()
=== — ---> Locked Here: bridge_channel.c line 223 (ast_bridge_channel_lock_bridge)
=== -------------------------------------------------------------------

=== Thread ID: 0x7f4c9fa70700 LWP:13838 (pbx_thread started at [ 4699] pbx.c ast_pbx_start())
=== —> Waiting for Lock #0 (bridge_channel.c): MUTEX 223 ast_bridge_channel_lock_bridge bridge 0x7f4cc0061de0 (1)
[0x5139fb] asterisk lock.c:272 __ast_pthread_mutex_lock()
[0x45f908] asterisk astobj2.c:241 __ao2_lock()
[0x48697c] asterisk bridge.h:423 _ast_bridge_lock()
[0x486d86] asterisk bridge_channel.c:224 ast_bridge_channel_lock_bridge()
[0x488131] asterisk bridge_channel.c:650 bridge_channel_write_frame()
[0x48d21b] asterisk bridge_channel.c:2625 bridge_handle_trip()
[0x48d61b] asterisk bridge_channel.c:2760 bridge_channel_wait()
[0x48dde1] asterisk bridge_channel.c:2909 bridge_channel_internal_join()
[0x471fe5] asterisk bridge.c:1749 ast_bridge_join()
[0x5c5996] asterisk threadstorage.h:178 ast_threadstorage_get()
[0x514455] asterisk lock.c:478 __ast_pthread_mutex_unlock()
asterisk ()
res_timing_timerfd.so ()
res_timing_timerfd.so ()
=== — ---> Locked Here: bridge_channel.c line 223 (ast_bridge_channel_lock_bridge)
=== -------------------------------------------------------------------

=== Thread ID: 0x7f4ca05ab700 LWP:13839 (pbx_thread started at [ 4699] pbx.c ast_pbx_start())
=== —> Waiting for Lock #0 (bridge_channel.c): MUTEX 223 ast_bridge_channel_lock_bridge bridge 0x7f4cc0061de0 (1)
[0x5139fb] asterisk lock.c:272 __ast_pthread_mutex_lock()
[0x45f908] asterisk astobj2.c:241 __ao2_lock()
[0x48697c] asterisk bridge.h:423 _ast_bridge_lock()
[0x486d86] asterisk bridge_channel.c:224 ast_bridge_channel_lock_bridge()
[0x488131] asterisk bridge_channel.c:650 bridge_channel_write_frame()
[0x48d21b] asterisk bridge_channel.c:2625 bridge_handle_trip()
[0x48d61b] asterisk bridge_channel.c:2760 bridge_channel_wait()
[0x48dde1] asterisk bridge_channel.c:2909 bridge_channel_internal_join()
=== — ---> Locked Here: bridge_channel.c line 223 (ast_bridge_channel_lock_bridge)
=== -------------------------------------------------------------------

=== Thread ID: 0x7f4cd84b7700 LWP:13869 (pbx_thread started at [ 4699] pbx.c ast_pbx_start())
=== —> Waiting for Lock #0 (bridge_channel.c): MUTEX 223 ast_bridge_channel_lock_bridge bridge 0x7f4cc0061de0 (1)
[0x5139fb] asterisk lock.c:272 __ast_pthread_mutex_lock()
[0x45f908] asterisk astobj2.c:241 __ao2_lock()
[0x48697c] asterisk bridge.h:423 _ast_bridge_lock()
[0x486d86] asterisk bridge_channel.c:224 ast_bridge_channel_lock_bridge()
[0x488131] asterisk bridge_channel.c:650 bridge_channel_write_frame()
[0x48d21b] asterisk bridge_channel.c:2625 bridge_handle_trip()
[0x48d61b] asterisk bridge_channel.c:2760 bridge_channel_wait()
[0x48dde1] asterisk bridge_channel.c:2909 bridge_channel_internal_join()
[0x471fe5] asterisk bridge.c:1749 ast_bridge_join()
[0x7f4ce36af807] app_confbridge.so app_confbridge.c:2595 confbridge_exec()
[0x5451e9] asterisk pbx_app.c:492 pbx_exec()
[0x530e71] asterisk pbx.c:2927 pbx_extension_helper()
[0x534cc8] asterisk pbx.c:4177 ast_spawn_extension()
[0x53597a] asterisk pbx.c:4351 __ast_pbx_run()
[0x537124] asterisk pbx.c:4674 pbx_thread()
[0x5c8933] asterisk utils.c:1249 dummy_start()
[0x7f4d44c172de] libpthread.so.0 :0 __libpthread_freeres()
[0x7f4d43de2133] libc.so.6 :0 clone()
=== — ---> Locked Here: bridge_channel.c line 223 (ast_bridge_channel_lock_bridge)
=== -------------------------------------------------------------------

=======================================================================

Someone with similar problems before? Any ideas or suggestions are really welcome.

Thanks for your help

Those locks are expected, they’re done in the normal operation of media flow inside of bridges. Do you always have DEBUG_THREADS enabled?

As well the underlying performance characteristics of the virtualization platform can have consequences on ConfBridge. If the timing of the system drifts and is not consistent it can cause audio issues in conference bridges.

I’ve enabled DEBUG_THREADS in the new server only to be more agile te get detailed info if dumps come back again as in previous server.

Don’t have direct control over the virtualization platform, and I’m expecting to increase cpu capacity into 8 core, but need to confirm if that’s only a “kick the problem forward” or there’s another problem that should be resolve from my side.

You refer to the timing of the system drifts… Can I make some SO configuration or virtualization platform suggestion to mitigate?

I don’t have data or suggestion in regards to that, perhaps someone else who runs such a platform can give tips and tricks.

I will say that increasing CPU cores inside the VM may not actually help. ConfBridge works by waking up every 20 milliseconds, getting audio from each participant, and sending it out. If the VM isn’t able to sustain that then the audio can be mixed sporadically. Adding CPU alone likely wouldn’t change that, and each conference bridge uses only a single thread (thus CPU) to do such an operation. It can also be problematic if it takes longer than 20 milliseconds to do all of that work.

For other types of work this doesn’t matter in practice (someone browsing a website wouldn’t notice) but for realtime applications it matters.

1 Like

That’s my point of view, just adding cpus is getting into a loop of neverending upsizing. That’s because I’m looking for another options…

Increase that 20ms when confbridge gets audio from participants is possible? (that applies to the only 1 participant with music on hold confbridge? That’s one of the reasons that I’m focus on performance)
Give asterisk even more priority on the system?
Ask virtualization operators to “reserve” cpu capacity only for that virtual machine?

Any other suggestions that can I tune on asterisk configuration (read about disable rtp checksum) are welcome.

As Asterisk is a real time application, you’d want dedicated unshared resources for the best result.

Disabled debug flags on recompiling and having 8 cpus. Just a test call between old and new asterisk going directly to a confbridge

[Mar 30 23:58:35] WARNING[31280][C-00000091]: channel.c:1124 __ast_queue_frame: Exceptionally long voice queue length queuing to PJSIP/cucm1-00000090

CPU usage under 1%
load system under 1%

¿Where is the problem with that on a fully unload fresh compiled system?

Maybe is an obvious parameter, but after few days of work I have little options to try

Thanks in advance

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.