Exceptionally long queue length queuing to PJSIP/

jonasswiatek · June 18, 2020, 4:39pm

I’ve just experienced an Asterisk server becoming completely unresponsive, as in every single service on it became available. Not only Asterisk, but also a set of other services used for a variety of other stuff - I was also unable to SSH into it.

The only thing I found in the logs were these (repeated many, many times):

[Jun 18 15:33:56] WARNING[13337] channel.c: Exceptionally long queue length queuing to PJSIP/registrar-0000d96a

The Asterisk instance is configured to core dump, but no dump was generated.

What I’m pondering is, since the instance was completely dead, including non-asterisk related services, if this was Asterisk that somehow managed to pull down the instance, or if those warnings was a result of some other failure on the instance?

skennedy · June 19, 2020, 12:40am

Is this a virtual environment? If so, what kind of resources are on the host, and what did you assign to the guest(s) on the host?

jonasswiatek · June 19, 2020, 3:08pm

Ah yes vital information.
It’s hosted on AWS, running on a c5.xlarge instance. That’s 4 cores and 8GB of ram. These servers host at max around 30 calls. Was serving 16 when it blew up.

I’m running 5 of these, and I’ve only seen this happen on this particular instance.

So just want some input on how I could gather more information in case this is something that happened to asterisk, or if the consensus might be that perhaps the physical hardware just suffered some sort of disruption. The only reason I’m concerned is that I saw another crash on this exact instance last week as well.

Even if Asterisk perhaps deadlocked itself, I don’t believe this would take down the entire OS with it? Just looking for experience or insight.

skennedy · June 19, 2020, 3:11pm

I’ve seen the behavior you describe in virtual environments where the host CPU is oversubscribed. In esx it’s the costop and wait stats that give you an indication that this is the problem ( not sure if that’ll be the same in AWS ).

Right off the bat I’d pull back to a single core for your instance, see if that helps.

jonasswiatek · June 24, 2020, 10:43am

So I’ve done some more digging after having had the issue again yesterday, and it seems there is a memory leak under some pretty brutal circumstances.

It seems we have a user with a UAC (MicroSIP) which some times goes absolutely haywire, and in respond to it’s INVITE (From Asterisk), will just explode and send it’s 180 Ringing Reply on a loop. We’re talking thousands of them. I can’t even retrieve them all from Homer, it just cuts off after 100. But those 100 was sent within the timespan of 17 milliseconds.

But judging from the network monitoring we’re talking megabytes upon megabytes of 180 Ringing replies. Asterisk shot up it’s memory usage by 2GB each times this happened, until I suppose it just used all of the available memory and just locked up the system some how. The memory usage doesn’t go down even after several hours.

While this happens it seems that Asterisk starts printing these lines (hundreds per second):
Exceptionally long queue length queuing to PJSIP/registrar-0000d96a

The channel noted there is the channel belonging to the outbound call being placed to the UAC that craps out and just floods 180 Ringing Replies back.

I’ve moved the offending customer to a separate Asterisk 16.11.1 instance (the crashes has happened on 16.6.1).

Is there anyone who can provide some guidance on what I should do to gather up more information or what I can provide to create a proper issue ticket for this? Unless it’s already been addressed in 16.11.1, this is seems like it could be a DoS attack vector that should be mitigated some how.

jcolp · June 24, 2020, 10:56am

Some kind of packet capture, logs, everything. Issues should be filed on the issue tracker[1]. I’m not really sure though that there is a way to mitigate such a problem. You ultimately end up consuming resources to process such things, even to the amount that you have to process to then decide to block. The only thing we could possibly do is put an option to allow a fixed size of work queue items, but then the result of that is you then potentially drop legit SIP traffic so you’re still going to potentially have problems there. There’s no real “aha!” fix for such things.

[1] https://issues.asterisk.org/jira

jonasswiatek · June 24, 2020, 11:24am

Oh I agree, while the flooding happens it is what it is. I’m working on getting our Kamailio server in front of Asterisk to drop these replies in some way.

However, each flood of replies, Asterisk had permanently increased it’s memory foot print significantly, and it didn’t go down. When it happened yesterday I managed to isolate this Asterisk server before it actually crashed out. Normally Asterisk uses something like 5MB of ram. After I’d isolated it, it was sitting at 6GB usage, and it didn’t go down. I restarted it and re-instated it in it’s cluster this morning, 12 hours after I’d isolated it.

So I’m more concerned there is a memory leak that happens when this “attack” occurs.

Anyways, I’ll create an issue on the issue tracker! Is there some way I can provide you guys with an pcap privately? The one I’ve got contains IP addresses of our customers and stuff, which I don’t want to share publicly.

jcolp · June 24, 2020, 11:46am

PCAPS can be sent to asteriskteam@digium.com

jonasswiatek · June 24, 2020, 11:53am

Issue created ASTERISK-28962

system · July 24, 2020, 11:57am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ast_queue_frame: Exceptionally long voice queue length queuing to Local Asterisk Support	19	4879	April 2, 2020
Help with Asterisk deadlock (possible bug) Asterisk Support	8	2370	February 11, 2016
Exceptionally long voice queue length queuing Asterisk Support	3	472	July 5, 2019
PJSIP Call processing issue Asterisk SIP	11	450	April 13, 2022
Warning Exceptionally long voice queue length Asterisk Support	13	3706	May 26, 2021

Exceptionally long queue length queuing to PJSIP/

Related topics