Just not in such a way that we can reproduce. This post seems to have an identical issue without resolution: Crash at taskprocessor.c:1171
So it happens about once a month with the following output from ast_coredumper
Thread 1 (Thread 0x7fb11e7e1700 (LWP 2955687)):
#0 0x000055710463aee0 in taskprocessor_push (t=<optimized out>, tps=0x7fb1904ea730) at taskprocessor.c:1239
#1 ast_taskprocessor_push (tps=0x7fb1904ea730, task_exe=<optimized out>, datap=0x7fb1a814c5f0) at taskprocessor.c:1245
#2 0x00007fb1e081af5f in chan_pjsip_hangup (ast=0x7fb1a80d4b70) at chan_pjsip.c:2578
#3 0x000055710452aaaa in ast_hangup (chan=chan@entry=0x7fb1a80d4b70) at channel.c:2612
#4 0x00007fb1e1e8f520 in wait_for_answer (in=in@entry=0x7fb1c46de010, out_chans=out_chans@entry=0x7fb11e7dcf70, to=to@entry=0x7fb11e7dcf38, peerflags=peerflags@entry=0x7fb11e7ddae0, opt_args=opt_args@entry=0x7fb11e7dd2f0, pa=pa@entry=0x7fb11e7dd390, num_in=<optimized out>, result=<optimized out>, dtmf_progress=<optimized out>, mf_progress=<optimized out>, mf_wink=<optimized out>, sf_progress=<optimized out>, sf_wink=<optimized out>, hearpulsing=<optimized out>, ignore_cc=<optimized out>, forced_clid=<optimized out>, stored_clid=<optimized out>, config=<optimized out>) at app_dial.c:1426
So the crashing line would appear to be
tps->listener->callbacks->task_pushed(tps->listener, was_empty);
And when inspecting the coredump, I see that the taskprocessor has already been cleaned up:
{ stats = {max_qsize = 3, _tasks_processed_count = 7},
local_data = 0x0,
tps_queue_size = 0,
tps_queue_low = 2250, tps_queue_high = 2500,
tps_queue = {first = 0x0, last = 0x0},
listener = 0x0,
thread = 18446744073709551615,
executing = 0,
high_water_warned = 0, high_water_alert = 0,
suspended = 0,
subsystem = 0x7f6dc0211e52 "",
name = 0x7f6dc0211e20 "pjsip/outsess/compass-proxy-00.00.00.00-000e91" (edited),
<incomplete sequence \340>}
We are running asterisk 20.5. I don’t see mention of the issue in the changelog for newer releases, nor changes around the code making the bad attempt to access the torndown tps, and the issue seems to be around in some form since asterisk 16.
I’ll add info as I research it, as I’m kind of new to the asterisk code, and know there are some points I can explore more. We want to deploy some patch soon though, to see if it might improve things, whether that’s locking the TPS until after the listener has been accessed, adding some missing reference increment, or something else. I’m not quite sure where in the process the TPS is being destroyed yet.
I’ll also add an issue to Github soon if that’s welcome, I know we originally attempted to share this with ASTERISK-28834: Segfault in taskprocessor_push but I suppose that didn’t get migrated over.