Segfault on Asterisk 16.18.0

Hi!

I get a segmentation error in Asterisk when I try more than one call. I’m testing vosk-asterisk (https://github.com/alphacep/vosk-asterisk) but the error I see is too generic and I can’t associate it with vosk… I Really don’t know how to read a coredump but i would appreciate if someone could point me to in the right direction to find out what is causing the segfault.

This is the backtrace I get after executing ast_coredumper:

core-thread1.txt:

!@!@!@! thread1.txt !@!@!@!

$1 = {si_signo = 6, si_errno = 0, si_code = -6, _sifields = {_pad = {472, 0 <repeats 27 times>}, _kill = {si_pid = 472, si_uid = 0}, _timer = {si_tid = 472, si_overrun = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 472, si_uid = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 472, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0x1d8, _addr_lsb = 0, _addr_bnd = {_lower = 0x0, _upper = 0x0}}, _sigpoll = {si_band = 472, si_fd = 0}}}
Signal        Stop	Print	Pass to program	Description
SIGABRT       Yes	Yes	Yes		Aborted

Thread 1 (Thread 0x7fb2268dd700 (LWP 508)):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
        set = {__val = {134238211, 140403390683590, 140403127732955, 140403127732368, 140406703357953, 140402816483560, 140402816483560, 140402816483560, 140402816483560, 140402816483562, 140402816484072, 140402816483560, 140402816484072, 0, 0, 0}}
        pid = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
#1  0x00007fb236349537 in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x71404e85e, sa_sigaction = 0x71404e85e}, sa_mask = {__val = {909777350, 0, 0, 94431789537109, 94431789201464, 1799591297023, 140403390730362, 8192, 140402816450688, 28, 7, 480, 466, 94431789222725, 94431789238880, 94431789237990}}, sa_flags = 7, sa_restorer = 0x55e29d408b45}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007fb2363a2768 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fb2364b0e2d "%sn") at ../sysdeps/posix/libc_fatal.c:155
        ap = {{gp_offset = 24, fp_offset = 21986, overflow_arg_area = 0x7fb2268dcae0, reg_save_area = 0x7fb2268dca70}}
        fd = <optimized out>
        list = <optimized out>
        nlist = <optimized out>
        cp = <optimized out>
#3  0x00007fb2363a9a5a in malloc_printerr (str=str@entry=0x7fb2364b3230 "double free or corruption (!prev)") at malloc.c:5347
No locals.
#4  0x00007fb2363ab09c in _int_free (av=0x7fb20c000020, p=0x7fb20c130850, have_lock=<optimized out>) at malloc.c:4317
        size = 144
        fb = <optimized out>
        nextchunk = 0x7fb20c1308e0
        nextsize = <optimized out>
        nextinuse = <optimized out>
        prevsize = <optimized out>
        bck = <optimized out>
        fwd = <optimized out>
        __PRETTY_FUNCTION__ = "_int_free"
#5  0x00007fb2366f7519 in json_delete () from /usr/lib/x86_64-linux-gnu/libjansson.so.4
No symbol table info available.
#6  0x000055e29d2576e9 in __ao2_ref (user_data=0x7fb20c3cb4d0, delta=-1, tag=0x55e29d408b45 "", file=0x55e29d43f569 "stasis_message.c", line=137, func=0x55e29d43f580 <__PRETTY_FUNCTION__.0> "stasis_message_dtor") at astobj2.c:614
        obj = 0x7fb20c3cb4b8
        obj_mutex = <optimized out>
        obj_rwlock = <optimized out>
        obj_lockobj = <optimized out>
        current_value = 0
        ret = <optimized out>
        weakproxy = <optimized out>
        lock_state = <optimized out>
        __PRETTY_FUNCTION__ = "__ao2_ref"
#7  0x000055e29d2576e9 in __ao2_ref (user_data=0x7fb20c25e2d8, delta=-1, tag=0x55e29d408b45 "", file=0x55e29d43ce50 "stasis.c", line=1266, func=0x55e29d43d610 <__PRETTY_FUNCTION__.45> "dispatch_exec_async") at astobj2.c:614
        obj = 0x7fb20c25e2c0
        obj_mutex = <optimized out>
        obj_rwlock = <optimized out>
        obj_lockobj = <optimized out>
        current_value = 0
        ret = <optimized out>
        weakproxy = <optimized out>
        lock_state = <optimized out>
        __PRETTY_FUNCTION__ = "__ao2_ref"
#8  0x000055e29d3653d1 in dispatch_exec_async (local=<optimized out>) at stasis.c:1266
        sub = <optimized out>
        message = 0x7fb20c25e2d8
        __PRETTY_FUNCTION__ = "dispatch_exec_async"
#9  0x000055e29d382c50 in ast_taskprocessor_execute (tps=tps@entry=0x55e29fedc3b0) at taskprocessor.c:1235
        local = {local_data = 0x55e29fedc240, data = 0x7fb20c25e2d8}
        t = 0x7fb20c0e8370
        size = <optimized out>
        __PRETTY_FUNCTION__ = "ast_taskprocessor_execute"
#10 0x000055e29d382ce6 in default_tps_processing_function (data=data@entry=0x55e29fedc350) at taskprocessor.c:209
        listener = 0x55e29fedc350
        tps = 0x55e29fedc3b0
        pvt = 0x55e29fedc2d0
        sem_value = 32690
        res = <optimized out>
        __PRETTY_FUNCTION__ = "default_tps_processing_function"
#11 0x000055e29d391d93 in dummy_start (data=<optimized out>) at utils.c:1299
        __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {0, 8893875941801763512, 140736685560270, 140736685560271, 140403127734144, 516096, 3444153936049382072, 8893871115461501624}, __mask_was_saved = 0}}, __pad = {0x7fb2268dced0, 0x0, 0x0, 0x0}}
        __cancel_routine = <optimized out>
        __cancel_arg = 0x7fb2268dd700
        __not_first_call = <optimized out>
        ret = <optimized out>
        a = {start_routine = 0x55e29d382ca6 <default_tps_processing_function>, data = 0x55e29fedc350, name = 0x7fb2268dce30 ""}
        __PRETTY_FUNCTION__ = "dummy_start"
#12 0x00007fb23668aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140403127736064, -3411376823876148552, 140736685560270, 140736685560271, 140403127734144, 516096, 3444153936022119096, 3444187235992642232}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#13 0x00007fb236421def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

It crashed on freeing a stasis message. That’s the extent of what can be said. You’d need to reproduce the issue on the latest version without a third party module to have it be accepted in the issue tracker, as we can’t look at external code.

I understand.
But, if somebody could tell me how can I know/print/log that stasis message to try to know what is provoking the error…

You’d have to open the core dump in gdb and do:

thread 1
frame 8
print *message

And then it might possibly be valid and give information. Your backtrace shows lots of optimization so it may or may not work/be useful. This is directly looking at the state of the memory though, so you have to have an understanding of how things are structured in order to understand the actual result.

1 Like

Thank you so much.

I can see the message, seems like a voice frame:

$1 = {timestamp = {tv_sec = 1629199122, tv_usec = 985073}, type = 0x55e2a023a438, eid_ptr = 0x7fb20c25e300, data = 0x7fb20c3cb4d0, eid = {eid = "^y\276\365\310", <incomplete sequence \356>}}

Would it be possible to do print ‘data’ in gdb?
I have to invest more time learning gdb… :upside_down_face:

You can cast data to the actual type, but you would need to know the intended destination to know that type.

1 Like

Stasis messages are not voice frames. Type is the type of the message, and would be the best place to look for knowing the type:

print message->type

You really do need gdb knowledge and Asterisk internals knowledge for this stuff.

1 Like

Yep, it’s what I’m seeing… I need to work more with gdb and read more asterisk code :slight_smile:

Btw, I could not print data but could print message->type->name and I get “agi_exec_start_type”.
It seems that it has nothing to do with the vosk module but with the execution of an AGI script…

That may or may not be true. That’s where the message came from, that doesn’t mean the creation of that message is the problem. It could be memory corruption elsewhere (such as the vosk module) which corrupted the memory that components of the message used.

1 Like

This isn’t actually a segfault! It is an abort, which means user space code has explicitly declared a crash. Specifically a memory free request did so, which generally means that the memory management meta data is corrupt, which, in turn is most often the fault of a thread other than the the one issuing the abort having corrupted memory.

The actual error message is “double free or corruption (!prev)”.

1 Like

Thank you both for your time!
It seems like a dead end to me. I guess it’s a matter of review pointers and allocs in the vosk module.

It may or may not be. My experience with external code that hasn’t gone through our code review process is that in many cases they can have issues. In other posts people stopped using it and issues went away. I can’t say with any certainty that is the case here.

1 Like

It seems like a dead end to me.

If you are able to reproduce this problem I can certainly take a look.

You need to report an issue on vosk-asterisk github project

1 Like

From the issue it appeared the author modified asterisk sources himself too. Should be resolved too in

I’d add that the title is wrong. This is a user space code initiated abort, not a segmentation fault.