Outage after taskprocessor_push: The 'subm:rtp_topic-000000aa' task processor queue reached 500 scheduled tasks

dumblebee · August 24, 2018, 9:51am

Hi!

I am facing this error on my production setup for a 95 endpoint “callcenter”:
“taskprocessor.c:888 taskprocessor_push: The ‘subm:rtp_topic-000000aa’ task processor queue reached 500 scheduled tasks”

After this error, no registrations are possible, effectively dropping all endpoints. New incoming calls from the SBC (IP Auth) succeed into asterisk but then fail because the endpoint is not registered.

Environment:

Debian Stretch
version 13.18.2
mostly extensions.ael
PJSIP only
HDD is not full
8G RAM, 2G Swap
4 vCores on “Intel Xeon E312xx (Sandy Bridge, IBRS update)”

Is this an overload problem or a known bug?

This setup was fine since Jan 2018 but I noticed the same problem yesterday and today at approx. the same time. The only change during this time was a BIOS update to mitigate intel flaws on the hypervisor. VMs since use a secured version of QEMU-KVM.

Thank you very much.

jcolp · August 24, 2018, 9:55am

There is a blog post[1] talking about what a task processor queue reached message means.

[1] https://blogs.asterisk.org/2016/07/13/asterisk-task-processor-queue-size-warnings/

dumblebee · August 24, 2018, 10:09am

I found and read this blog post but I have to admit, I am not sure how to proceed.

This problem seems to be similar to:

Core-Dumps are enabled but this is no crash, so I am unable to show any backtrace (DONT_OPTIMIZE and BETTER_BACKTRACES are set).

I will try to create a manual core dump when this happens again.

Do you have a hint for me in the mean time?

jcolp · August 24, 2018, 10:11am

I’d first suggest upgrading to the latest version, as we do fix and tweak things. Secondly you have to determine what is causing the system to be slow on processing and why.

dumblebee · August 24, 2018, 10:26am

Ok, I am already preparing 13.21-cert2 but it is not production ready with my adjustments yet.
I will monitor the VMs behaviour and add more RAM and CPUs to it as the hardware is dedicated to this VM (it’s just a VM to be able to migrate easy between hosts).

This is just a workaround but might lower the occurrence in the meantime.

Thanks for your feedback!

dumblebee · August 24, 2018, 10:38am

After some further debugging with journalctl, I noticed that both mysql servers where backed up (lvm snapshot) 10 minutes before the outage. One is for endpoints and main tables (replicated) and one is with local-only stoarge for CDRs. This might introduced lags during prime time.

ODBC/realtime also produce blocked tasks, is this assumption correct?

jcolp · August 24, 2018, 10:39am

Yes, that can cause things to get blocked.

Topic		Replies	Views
13.10.0 pjsip deadlock Asterisk Support	21	5610	June 7, 2017
Random 503 errors on register & taskprocessors? Asterisk Support	10	523	November 4, 2021
Taskprocessor.c: The 'stasis/m:rtp:all-' task processor queue reached 500 scheduled tasks again Asterisk Support	8	693	June 5, 2024
Task processor queue overload Asterisk Support	5	880	August 17, 2020
Manager, task processor Asterisk Support	0	727	May 10, 2018

Outage after taskprocessor_push: The 'subm:rtp_topic-000000aa' task processor queue reached 500 scheduled tasks

Related topics