Ast_queue_frame: Exceptionally long voice queue length queuing to Local

dsiemens · February 25, 2020, 7:35pm

I have a issue where I’m getting into a state that asterisk become unavailable. One log entry that i see is ast_queue_frame: Exceptionally long voice queue length queuing to Local.

I’m in process to replace chan_sip with pjsip. I’ve encountered this on Rhel 7, with multiple versions of asterisk from the latest 13, to 16.7.

Is there anything I can do to avoid getting into a state where the system goes down.

jcolp · February 25, 2020, 8:01pm

Not really, this indicates either system overload or a deadlock has happened whereby a channel is blocked. Without determining what specifically in your environment triggers it, or isolating and resolving the issue, not exactly anything specific that can be said.

dsiemens · February 25, 2020, 8:58pm

So what can I do to help log the event to at least narrow it down. Is it something in purely in dialplan or OS?

jcolp · February 25, 2020, 9:37pm

It would be somewhere in Asterisk, based on something you are doing. A backtrace at the time[1] would show what is going on, and would need to be looked at by someone familiar with the codebase.

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

logan · February 26, 2020, 12:02pm

Hello,
We have exactly the same pb on the version 16.7.0 since few days.
The message ‘Exceptionally long voice queue length queuing to Local’ occur each time just after the bridge message.

The pb occur with 15 or 100 CPS.

We have temporraly fixed the pb to configure the CDR in batch mode, but we don’t understand the link between both.

fsilvestre · February 26, 2020, 12:40pm

High process and/or low IOPS on server caused by webserver and local database could be your problem.
Try increase your server power or move database and web to another server. It will reduce the server consumption.

dsiemens · February 26, 2020, 1:28pm

I’m running my database already on different nodes. I’m also using realtime.

When you changed cdr to batch mode did that make any difference? This is a race condition of some sort that is generated when something is exercised that leads down the dark path.

logan · February 26, 2020, 1:44pm

Yes no error when we are in batch CDR mode with 200 CPS.
Again we don’t undersand the link because like you our CDR are stored to another server and we don’t have web server on this machine only AS*

24 Core Cpu use at 5%
idl time = 90%
2 PDU of 1100watts

so is not a consumption of the server because he do nothing.

dsiemens · February 26, 2020, 2:09pm

Curious as to what got you to change to batch mode? I see the issues when I get to about 30cps and I see no load. My hosts is 2vcpus’ usage about 12%. The cluster is in vmware in aws and the underling hardware would be i3 bare metal equivalents with the vsan and all ssd storage in them.

So the hardware really is pretty stout.

jcolp · February 26, 2020, 2:18pm

When realtime or database is involved with Asterisk it can block critical paths, resulting in problems. For example in chan_sip UDP traffic is single threaded, and by default ODBC is single threaded if that is used, so the result being chan_sip is blocked when it has to query the database. If that slows down even for a moment that can cause issues. This also occurs elsewhere such as CDR which can block the entire CDR handling while records are being stored if batch is not used.

I don’t recommend tightly coupling a database to Asterisk unless you know the precise characteristics of the database and can do performance analysis in combination with Asterisk when used.

logan · February 26, 2020, 2:48pm

With chan_sip we discover that in overload case the bye message can be not sent correctly and on time as long as the cdr are injected in the database. And can cause some billing difference.

logan · February 26, 2020, 2:56pm

Hello Jcolp,

On our case we don’t use realtime or database, it’s a basic call generate PJSIP call by AMI and connect to a local channel when the call is answered.
We use an external database only for the CDR.

But since we move in batch mode the CDR, we don’t have anymore this message. “Exceptionally long voice queue…”
Each time the message appears when asterisk start to bridge both legs, but we don’t understand the link with the batch mode cdr at this level in the process call…

dsiemens · February 26, 2020, 2:57pm

I did some searching

Came accross this,
https://blogs.asterisk.org/2016/02/17/odbc_gutting/

So pooling might be an issue.

dsiemens · February 26, 2020, 3:47pm

I also seen this
https://blogs.asterisk.org/2016/06/15/asterisk-odbc-connections/

So they are suggesting in the second that pooling is in asterisk, not to use odbc Which one is right?

jcolp · February 26, 2020, 4:49pm

You can’t use pooling in ODBC any longer. You can only use pooling in Asterisk, and that is the current supported mechanism.

dsiemens · February 26, 2020, 4:55pm

I have pooling on in asterisk

I’m using rhel 7 and the unixODBC is 2.3.1 which in the res_odbc.conf file suggest you shouldn’t use max_connections above 1. so I’m using 1.

So that would suggest correct setup based on the version of unixODBC. but a major benefit to upgrade.

Here is the configuration path

[configuration]
enabled => yes
dsn => asterisk.configuration
username => myuser
password => mypass
pre-connect => yes
pooling => yes
limit => 32
idlecheck => 20
sanitysql => select 1
share_connections => yes
isolation => repeatable_read
forcecommit => yes
dispositionstring=yes
loguniqueid=yes

jcolp · February 26, 2020, 4:56pm

Having multiple connections can be beneficial, but it is still dependent on the performance of the underlying database itself. If your queries are slow that can still block the system. If you have multiple connections and those end up slow as well, you can then end up blocking it even further or in different ways.

dsiemens · February 27, 2020, 6:44pm

I tried the batch load of cdrs but its not that.

Will continue with it however to see if I can find something else.

dsiemens · March 3, 2020, 8:23pm

I moved my vms to a different cluster

Had a event

Different error
taskprocessor.c:1108 in taskprocessor_push: The ‘stasis/p:channel:all-00002fe5’ task processor queue reached 500 schedule

system · April 2, 2020, 8:23pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Channel.c: Exceptionally long voice queue length queuing to Local Asterisk Support	8	588	July 24, 2019
Exceptionally long voice > queue length queuing Asterisk Support	16	12928	April 23, 2017
Warning Exceptionally long voice queue length Asterisk Support	13	3785	May 26, 2021
Exceptionally long queue length queuing to Asterisk Support	1	298	March 1, 2021
Exceptionally long queue length queuing to PJSIP/ Asterisk SIP	9	832	July 24, 2020

Ast_queue_frame: Exceptionally long voice queue length queuing to Local

Related topics