Been doing some load testing on Asterisk 22.8.2 and wanted to share what I found, and also pick your brains on whether theres anything I can tune to squeeze more out of it.
My setup:
-
Asterisk 22.8.2 running in a Docker container — 4 vCPU, 4 GB RAM
-
Load testing from a separate container — 8 vCPU, 8 GB RAM
-
Both containers on the same Docker network
-
Using SIPp for generating load
The call flow is: SIPp sends INVITE → hits extensions.conf → enters a Stasis app → ExternalMedia channel gets created → audio streams over WebSocket.
What I’m testing:
I’m trying to figure out how many concurrent calls Asterisk can handle when every call has an ExternalMedia channel streaming audio over websocket.
SIPp command I’m using:
sipp <asterisk_ip>:49868 -sf uac.xml -s +21124656 -i <sipp_ip> -p 5060 -r 10 -m 150 -trace_err -trace_msg -trace_logs
(10 calls/sec, 150 total target)
What I observed:
Things work fine up to about 90ish calls. Smooth audio, no issues. But once I cross ~100 concurrent calls and try to place a real call alongside the load, the audio on the real call starts lagging — like it slows down for about 4-5 seconds, then catches back up and plays normally after that. Almost like a buffer hiccup.
Checked resource usage and it lines up — at 100 calls the CPU pegs at 400% (all 4 vCPUs maxed). When it briefly spikes to 410-416%, that’s exactly when the audio degrades.
So from what I can tell, with ExternalMedia over WebSocket in this setup, Asterisk tops out around 90 concurrent calls on 4 vCPU before audio quality starts suffering.
Questions for the community:
-
Does this number seem about right to people who’ve worked with ExternalMedia at scale? Or should I be getting more out of 4 vCPUs?
-
Are there any Asterisk config tweaks that could help here? I’m thinking taskprocessors, thread pool settings in stasis, or anything related to the WebSocket handling.
-
Would adjusting res_http_websocket settings make any difference? Or is this purely a CPU-bound situation?
-
Has anyone experimented with timerfd vs other timer backends for better performance under load?
-
Is there anything in the Stasis/ARI side that I should be tuning — like the number of ARI worker threads?
I haven’t touched much on the config side yet, mostly running defaults. Would love to hear from anyone who has pushed ExternalMedia further or done similar benchmarking.
Appreciate any pointers. Happy to share more details about my dialplan or Stasis app setup if that helps.
1 Like
Off the top of my head…
Is the websocket app running in the same container as Asterisk? Does it do more than just echo?
What codec is being used on both the SIP and WebSocket call legs?
ExternalMedia is only in play during call setup and it’s just a convenience wrapper at that so by itself, it adds no real overhead.
To understand more about where the bottlenecks might be there are a few things you can try.
- If the websocket app is running in the same container, move it someplace else.
- Make the codecs on both legs of the call the same.
- To eliminate the Websocket bits, set up a sipp UAS outside the Asterisk container and use that. You’d have to change your app to use the ARI channel/originate call instead of externalMedia. Just a reminder, you don’t have to use externalMedia to use chan_websocket.
Those are just suggestions for investigation.
timerfd is about the best you’re going to get.
Thanks for the suggestions, appreciate the direction.
To answer your questions:
WebSocket app — it’s running on a separate container, not on the same one as Asterisk. So that shouldn’t be adding to the CPU load on the Asterisk side.
Codecs — this is probably the key piece I should’ve mentioned upfront. The incoming SIPp INVITEs come in as ulaw, but the ExternalMedia channel is set up with slin16. So yes, there is transcoding happening on every call — ulaw to slin16. That codec translation is definitely eating CPU cycles.
The thing is, in my actual production scenario this transcoding is unavoidable. Calls come in as ulaw from the carrier side, and the external websocket app needs slin16 to do its processing. So I can’t really make both legs the same codec — the whole point of the ExternalMedia channel here is to get linear audio out to the websocket app.
Regarding the SIPp UAS suggestion — I understand the idea of eliminating the websocket bits to isolate the bottleneck, but in my case the transcoding is part of the real-world flow I’m trying to benchmark. If I bypass it, the numbers won’t reflect what actually happens in production. I’m specifically trying to find the ceiling for this exact call path: SIP (ulaw) → Asterisk → ExternalMedia (slin16) → WebSocket app.
timerfd — good to know that’s already the best option there.
So it sounds like the transcoding from ulaw to slin16 on every call is probably the biggest contributor to the CPU hitting 400%. Does that match what others have seen? Roughly 90-100 concurrent calls with per-call transcoding on 4 vCPU seems like it could be in the right ballpark?
If the transcoding is the main bottleneck, I’m guessing the only real options to scale further would be:
-
Throw more CPU at it (vertical scaling)
-
Spread calls across multiple Asterisk instances (horizontal scaling)
-
Or somehow get the carrier side to send a codec that’s closer to what the websocket app needs, which isn’t really in my control
Am I missing anything else config-wise that could reduce the transcoding overhead? Or is this just the reality of per-call codec translation at this scale?
Sorry, I should have been clearer… I wasn’t suggesting changing the codecs in production but just as a test to get a better idea of what’s contributing to the utilization. If you eliminate the transcoding and are still seeing high CPU utilization then that gives you something else to look at.
You can also use the Linux “perf” command to get a picture of where CPU cycles are being spent…
Start asterisk with perf record like so…
$ sudo /usr/bin/nice -n -15 \
perf record -q -D 5000 -e instructions \
--latency --call-graph fp -o "/tmp/perf.out" \
-- /usr/sbin/asterisk -fcg
The -D 5000 tells perf to wait 5 seconds before starting to record so the startup activities don’t skew the results. Run man perf-record to see what the rest of the options mean.
Run your test scripts and stop asterisk.
When asterisk has stopped, generate the report with…
$ sudo perf report --force -i "/tmp/perf.out" \
--call-graph=none -c asterisk \
--percentage relative > "/tmp/perf-teport.txt"
This will give you an accounting of instructions executed by each function, including any functions it may call. You can add the --no-children option to perf report to see only instructions executed by a function not including functions it may call.
You can also use grep, sed or awk to filter out the kallsyms and libc.so functions so you only see the asterisk functions.
I’m actually working on a page for the docs site with info on how to diagnose high CPU issues. I should have it published by the end of next week.
3 Likes
Thanks i will check this and get back to you
1 Like