we’re using Asterisk 20.9.2 in a Almalinux 8.9 server (elevated from Centos 7).
From time to time Asterisk crashes, always with the same error.
Once that happens it’s imposible to get it working again. Doesn’t matter if we restart asterisk, even the server. It starts and stop a few seconds later.
The gdb bt reports this:
(gdb) bt
#0 0x00007f552afd0efa in ast_strlen_zero (s=0x650072005f0065 <error: Cannot access memory at address 0x650072005f0065>) at /usr/src/asterisk-20.9.2/include/asterisk/strings.h:67
#1 0x00007f552afd0efa in find_request_serializer (rdata=0x7f54cc148318) at res_pjsip/pjsip_distributor.c:131
#2 0x00007f552afd0efa in distributor (rdata=0x7f54cc148318) at res_pjsip/pjsip_distributor.c:518
#3 0x00007f552afd0efa in distributor (rdata=0x7f54cc148318) at res_pjsip/pjsip_distributor.c:482
#4 0x00007f55a752c9a7 in pjsip_endpt_process_rx_data (endpt=endpt@entry=0x1667aa8, rdata=rdata@entry=0x7f54cc148318, p=p@entry=0x7f5514c78910, p_handled=p_handled@entry=0x7f5514c788ec) at ../src/pjsip/sip_endpoint.c:938
#5 0x00007f55a752cb16 in endpt_on_rx_msg (endpt=0x1667aa8, status=<optimized out>, rdata=0x7f54cc148318) at ../src/pjsip/sip_endpoint.c:1080
#6 0x00007f55a7533c2d in pjsip_tpmgr_receive_packet (mgr=<optimized out>, rdata=rdata@entry=0x7f54cc148318) at ../src/pjsip/sip_transport.c:2200
#7 0x00007f55a75367a6 in udp_on_read_complete (key=0x1e2fe50, op_key=<optimized out>, bytes_read=<optimized out>) at ../src/pjsip/sip_transport_udp.c:193
#8 0x00007f55a75adb00 in ioqueue_dispatch_read_event (ioqueue=<optimized out>, h=0x1e2fe50) at ../src/pj/ioqueue_common_abs.c:605
#9 0x00007f55a75adb00 in ioqueue_dispatch_read_event (ioqueue=<optimized out>, h=0x1e2fe50) at ../src/pj/ioqueue_common_abs.c:433
#10 0x00007f55a75af54b in pj_ioqueue_poll (ioqueue=0x1e2f670, timeout=timeout@entry=0x7f5514c78e00) at ../src/pj/ioqueue_epoll.c:792
#11 0x00007f55a752c64d in pjsip_endpt_handle_events2 (endpt=0x1667aa8, max_timeout=max_timeout@entry=0x7f5514c78e40, p_count=p_count@entry=0x0) at ../src/pjsip/sip_endpoint.c:745
#12 0x00007f55a752c6e7 in pjsip_endpt_handle_events (endpt=<optimized out>, max_timeout=max_timeout@entry=0x7f5514c78e40) at ../src/pjsip/sip_endpoint.c:777
#13 0x00007f552afaf833 in monitor_thread_exec (endpt=<optimized out>) at res_pjsip.c:2270
#14 0x00007f55a75b0680 in thread_main (param=0x21a4308) at ../src/pj/os_core_unix.c:649
#15 0x00007f55a69cb1ca in start_thread (arg=<optimized out>) at pthread_create.c:479
#16 0x00007f55a46dc8d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The only way to get Asterisk back working is recompiling it again. Same version, just recompile again.
How is PJSIP built? Bundled? Is there also a system version installed?
The code itself is quite old, fairly simple, and well tested. The only cases I’ve seen of this is where there is a disparity between the PJSIP Asterisk is built against and the version it is run against. If they don’t match, then things don’t align and you end up with weird crashes like this.
There’s no option exactly to just outright show… you have to poke around some.
First up would be to examine the packages on the system and the files on the filesystem to see if there are PJSIP/pjproject ones installed. Secondly you could do:
This could be related to realtime pjsip endpoints, because if we stop the database, Asterisk starts ok, but when we start the database and the endpoints begin to load it crash.
Loading endpoints would allow traffic to be recognized and further processing to occur, while not loading it would cause traffic to be rejected early and the code in question to not execute.
Was Asterisk built as needed for a proper backtrace?
The thing is there are others endpoints in pjsip.conf that load properly, they are just a few of them while in realtime there are a couple of thousands.
Yes, looks like our asterisk is able to get a proper backtrace. Does the file I uploaded look incomplete or lacking information?.
It shows lots of things as optimized, thus my question.
And your statement regarding realtime doesn’t really alter things, traffic very well could still be rejected early so the fact you’re using realtime would probably just be coincidence.
If you want you can file an issue[1] with all details and the files as provided by the script we provide. There is no time frame on when it would actually get looked into if an issue exists, or even if it will.