PJSIP Call processing issue

Hi all,

So I have a really weird one that I’m hoping you could help out with. I have asterisk 16.18 compiled on CentOS 7. Asterisk is connected to a MySQL server via unixODBC 2.3.1. The issue I’m having is that intermittently a bunch of of registered endpoints become unreachable, and no new registrations are possible, this eventually self corrects after a few minutes, however, when a new endpoint registers and places a call, asterisk processes all traffic right up until the Dial() cmd, at which point there is no INVITE sent to the upstream trunk. The only way to get thing back on track is to restart the service. This is a production machine that runs an average of 150 simultaneous calls. Any thoughts/suggestions?

You’d need to get a backtrace[1] to show where Asterisk is hanging. That being said, if you couple Asterisk with a database then ANY issues with the database become an issue in Asterisk. If the database goes down, Asterisk is kaput. If the database is slow, Asterisk can hang. If the database has latency, Asterisk can hang.

[1] Getting a Backtrace - Asterisk Project - Asterisk Project Wiki

Hey JColp,

Yeah the DB is 100%, on a beast of a machine with a bunch of other asterisk servers that also use it so not worried about that side. The service never segfaults or anything so there’s no core dump to grab that BT from sadly

A core dump isn’t needed, the wiki has instructions for collecting from a deadlocked Asterisk.

Getting a Backtrace - Asterisk Project - Asterisk Project Wiki.

Thanks, I’ll attached it soon as it happens again

Just an update here. Unfortunately a decision was taken to offload some users from the particular server, which has caused that this issue doesn’t occur as regularly. I’ll leave this post open till COB 2022-02-17 and if nothing comes of it I’ll delete the post so that it’s not just another useless post with no resolution out on the web

Gonna need to leave this open a bit longer, sorry to all, the system managed to recreate the issue, but the service was restarted before I pulled the backtrace :frowning:

Hey J,

Here are those files

In this current environment I’m having to restart the asterisk service regularly as it just stops processing SIP packets

core-asterisk-running-2022-03-13T16-07-36+0200-brief.txt (38.7 KB)
core-asterisk-running-2022-03-13T16-07-36+0200-full.txt (106.4 KB)
core-asterisk-running-2022-03-13T16-07-36+0200-info.txt (825.4 KB)
core-asterisk-running-2022-03-13T16-07-36+0200-locks.txt (704 Bytes)
core-asterisk-running-2022-03-13T16-07-36+0200-thread1.txt (1.8 KB)

In the provided backtrace I see no issue with the thread that reads SIP traffic from the network. The problem appears to be DNS. You are using the system resolver which is limited to a single DNS resolution at a time and if that takes a long period of time it will cause other things to block, if you install the unbound development libraries and re-run configure/rebuild Asterisk you will gain better DNS resolution support and it won’t block as such. You should investigate, though, why DNS is taking so long. From the backtrace it was resolved the SRV records for something.

I had a feeling it was something along those lines, didn’t realise there was an alternative to the system dns. I’ll give that a go and revert. Thanks man

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.