Asterisk freezes at reload


#1

I’m using asterisk on Centos 6.5 for a few years now. Last time, it “freezes” often when doing a “core reload”.

After the “freeze” the Recv-Q from port 5060/UDP is filling up and all registrations and devices which are on UDP are going offline.

Only solution to get it up and running again is to kill the asterisk process and start it over again.

We have this issue a few months now but in the beginning it was occasionly, last week it occurs in more than 50% off the reloads.

We tried different asterisk versions, from 11.7 to 11.25.

What could we do to troubleshoot this issue?

Thanks,


#2

Have you try using the Asterisk 13.X LTS version.


#3

Asterisk 13.5.0 has the same issue


#4

Getting a backtrace[1] would show the state of the system and why it is hung up. If you can provide that we can take a look.

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace#GettingaBacktrace-GettingInformationForADeadlock


#5

No sure but this sounds like a DNS issue when reading your configuration after the reload, but honestly not sure about it


#6

Backtrace attached.

I noticed it’s only the trafic on port 5060/UDP which stops processing. Devices on 5060/TCP are not experiencing any problems.

core-show-locks.txt (61.5 KB)
backtrace-threads.txt (102.1 KB)


#7

Are you using pbx_ael or pbx_realtime?


#8

Don’t know.
How can I check this?


#9

How are you configuring your dialplan?


#10

It’s a freepbx installation


#11

They don’t use AEL or realtime. There seems to be some sort of deadlock situation. I’d confirm it exists under the latest version of 13 (13.15.0 is the latest) and then file an issue[1] with the backtrace and console log.

[1] https://issues.asterisk.org/jira


#12

No DNS requests are done after the reload, just checked it.


#13

Just tried to reproduce in 13.15.0, problem seems not to be present here.

However our FreePBX version isn’t compatible with 13.15.0 so I have to go back to 11.25.0.

Any other ideas to debug the problem here?


#14

You would need to identify the fix that resolved the issue and backport it.