Stasis task processor queue warnings and channel.c FRACK

dixoncb · July 28, 2017, 6:09am

When I issue a MUTE to a number of channels in my Stasis application, I’m seeing TP queue warnings:

[Jul 28 06:09:09] WARNING[22694][C-00000077] taskprocessor.c: The ‘subm:voice_2-0000005f’ task processor queue reached 500 scheduled tasks again.

Often followed by

[Jul 28 06:22:37] ERROR[22806][C-0000001a] channel.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f5e3c198738 (0)

This happens when I’m muting/unmuting just a small number of channels (>15).

The application is driven by an attached .Net app, and makes heavy use of batch conference bridge moves and mute/unmute commands. We’ve had some issues stabilising the interoperation of the two boxes, but these appear to have been ameliorated by increading websocket_write_timeout very significantly. We are left with this residual and nagging issue.

Attachments:
A_procs.txt the taskprocessor list from immediately after the issue (10.7 KB)
A_messages.txt from the messages log(25.2 KB)
A_backtrace.txt a backtrace from the time of the issue(376.0 KB)

The Stasis application (voice_2) is the busiest queue. I’m running the test using SIPP with 120 clients (well above operational levels), but without the RTP traffic we’d normally see in everyday use. I’m running on a fast quad-core system, but as an aside, it would be interesting to know if adding further processing power (cores, memory) would help performance without separating the Stasis app into different instances.

jcolp · July 31, 2017, 2:09pm

What version of Asterisk are you using? There was a bug[1] that was fixed which would cause that FRACK to occur if mute was done multiple times on the same channel.

[1] https://issues.asterisk.org/jira/browse/ASTERISK-27016

dixoncb · August 10, 2017, 12:24pm

Hi. Sorry for the delay. We are using 14.3. Should we upgrade?

jcolp · August 10, 2017, 1:01pm

Yes, if it’s the same issue then that will solve it.

dixoncb · August 16, 2017, 3:24pm

Thank you. We certainly can’t get it to crash in the same way after the upgrade.

We are however still seeing task processor queue warnings under certain conditions when we mute callers. All our callers are in bridges. When there are less than 10 callers in a bridge, we see no warnings, but as the numbers increase, simply muting/unmuting 10 participants at a time cause the warnings. The message is

WARNING[XXXX][C-XXXXXXXXX] : taskprocessor.c:888 taskprocessor_push: The ‘subm_voice_2-0000005f’ task processor queue reached 500 scheduled tasks again.

Does this mean that each of the tasks was executed, albeit with a delay, or does it mean that some of the mute/unmute requests were not fulfilled?

Thank you once more.

jcolp · August 16, 2017, 3:51pm

It means that requests will be executed, eventually, but at the point that one was queued the queue was 500.