Hung Channel Detection

I’m using Asterisk 13 in a primarily mobile environment with PJSIP and heavy use of ConfBridge conference rooms. My endpoints routinely connect and disconnect from the server throughout the day due to the mobile nature. I often get hung channels from endpoints that have had an interruption in their network connectivity while in a conference. I also have some non-mobile endpoints that are connected to these conferences (intentionally) for the entire day - so using a timeout setting to reset these channels is not an option.

To work around the issue, I created a small cron script to run “core show channels concise” (get the currently “active” channels), then for each one, run “pjsip show aor <>” to determine if that device is still registered. If not, I hangup the channel.

This works, but is not ideal. If a channel disconnects while in a call then reconnects before the cron job runs, I don’t hang up the channel because I don’t know if it’s really still in use or not - which brings me to my question. Is there any way you can tell which channel is currently in use by a given extension? For example, my “core show channels” command may list 3 different channels used by the same endpoint. One of those 3 may be the real, currently in use channel and the other 2 are hung (or all 3 may be hung). Is there a command I can use to sort them out and tell me which channels are hung and which are not?

I’d recommend enabling SIP Session Timers or RTP Timers so that channels will get hung up if the PBX does not receive data from the far end.

My understanding of those timers was that they were chan_sip related only, and not applicable to PJSIP. Is that not the case? The only timers I knew of were within the chan_sip.config file - which I don’t believe would apply.

Check these options
timers_min_se
Minimium session timer expiration period. Time in seconds.

timers
no
yes
required
always
forced - Alias of always
timers_sess_expires
Maximium session timer expiration period. Time in seconds.
https://wiki.asterisk.org/wiki/display/AST/Asterisk+13+Configuration_res_pjsip

That indeed looks like what I need, and I had only the “timers=yes” line defined (not the others). Based on the documentation though, it looks like the min/max should have defaulted to 90 and 1800 seconds respectively - and I’m seeing those hung channels stick around for days… making me think the timers are not having the desired impact in my case.

Perhaps I should note that I’m only seeing the stuck channels in the conference rooms themselves - not in the list of registered endpoints. Should these timers be killing the channel in the confbridge as well?

Session timers are per-call. We’d need to see a SIP trace of one to see if it is actually working.

There is also an RTP timeout you can set which will terminate the call if RTP is not received for a period of time.