hi, i am having an issue i hope you can help me with. I am running freepbx 14.0.1.1 with asterisk 13.17 and pjsip 2.6. most of my endpoints are connected pjsip sip-tls and srtp and they drop connection when config changes are applied.
. when i first installed the system, things were great, but about 2 weeks ago, the system started dropping all tls connections when i applied config changes. initially, i thought this was a freepbx issue, but after much investigation, i am sure now, it is not. i can simply and reliable reproduce the issue this way.
initially, i run asterisk -rx ‘core reload’…asterisk reloads and there is no issue with endpoints BUT if there are any config changes in /etc/asterisk and i do a core reload, all sip-tls connected endpoints drop. none of the sip/udp connected endpoints are affected.
to take the test one step further, i tried running touch /etc/asterisk/*…i made NO changes to the file contents, just touched them and ran a core reload and connections dropped.
this has forced me to batch changes at night, i hope someone can point me in the right direct.
fyi, i have attached a verbose log of a core reload that does not drop connections and then a log of a core reload that does. core reload before touch.txt (587.6 KB)
PJSIP doesn’t have the ability to ‘reload transports’. A hack (behind an option named allow_reload) was added to allow this, but it has to be explicitly enabled. It is not the default. The hack essentially tears down the transport and creates a new one which would terminate all active connections. This is why it is not on by default. The code should be determining that no changes have been made to the transport, but that may not be doing a deep enough check. You can file an issue[1] on the issue tracker with that information.
jcolp, thanks for this information, do you know why it might be that this was not a problem initially, but after some time, it started…is it possibly linked to the number of extensions defined etc?
It’s possible that the termination/start process was quicker with fewer connections, allowing it to work. The process itself has a delay in it to wait for the old transport to stop, based on your logs it still isn’t done by the time the new one is started.
Please attach the logs so they remain with the issue. Other than that at first glance it seems fine, it’ll go through normal triage and if anything additional is needed it’ll be asked for.