I have been searching for days… documents, conf files, forums, … and could find nothing.
Please, how / where do you set the low/high water values that impact the pjsip/distributor?
We are a busy system, but above about 130 calls we start to often queue on pjsip/distributor tasks [only], resulting in some call quality issues.
We really need to set the high water value to higher than 500.
You don’t. Those values are to trigger alerts, with the only place that actually listens for the alerts and does anything being PJSIP which is configurable[1]. They don’t control the size of queues or directly have any impact on call quality.
The number of threads IS configurable in pjsip.conf as well[2]. Whether that does anything for your situation, I don’t know. If the distributor threads are getting backed up it’s usually because of realtime, or DNS.
Ah, I did forget that the pool of taskprocessors for PJSIP is fixed[1]. Changing that is a bit of a sledgehammer though without understanding why the distributors are so busy and piling up.
My understanding was that if pjsip queues, it will stop processing, resulting in issues.
*The call quality issues are only when we see queuing are audio drops, warbles, clicks. Once queues are cleared, audio is back to normal. Queuing seems to relate to a CPU spike ( to a load of 15-25, running on a 12 core bare metal server). The spike also seems to relate to receiving approx 8-10 calls within a few seconds. If we get another 8-10 calls within a few seconds, before the prior spike load can come down, it causes a jump to a load of 25-35.
If we do not see the 8-10 calls within a few seconds, our load remains around 1-2.
We have
taskprocessor_overload_trigger=global
threadpool_max_size=300
pjsip/distributor is the only place I have managed to see queuing occur.
The above taskprocessor report was based on a week running, so you can see we do not go crazy above the limits, but enough to experience call issues when we get 8+ calls within a few seconds.
If the distributor pool cannot be increased and the high water value cannot be increased…
is it not a risk setting taskprocessor_overload_trigger=none ?
We are running a 6 core / 12 thread bare metal server with 32G memory.
Memory usage has never exceeded 15G and CPU 15 min avg varies from 2 to 15 (with 1 min avg from 2 to 35).
There is inherently risk in setting it to none as the queues then have no limit. The only other option is to determine why the queues are growing so large in the first place.
There is no magical CLI options or thing that will outright tell you, no. It requires someone doing work to orchestrate things, understand what is going on.