Need advice: CHAN_SIP MWI mailboxes & a deadlock

I’m working on a typical hotdesk solution. A user logs into a phone and that phone should subscribe to the mailbox(es) associated with that user. I’m using realtime sip peers via ODBC. My problem is that asterisk doesn’t know when the peer’s mailbox changes so the MWI status is outdated/wrong.

I had it working by executing the “sip reload” command. (Can someone tell me if this safe to do?) This seemed to nudge asterisk into updating realtime. I could see the correct mailboxes showing with “sip show subscriptions”. All was well until I discovered today that running “sip reload” introduces a nasty deadlock in the application. Asterisk stops receiving phone calls and running “sip reload” yields: Previous sip reload not yet done.

I can produce the deadlock reliably and don’t want to risk it in production. I don’t have the time to investigate it further right now so does anyone have any solutions to my MWI problem? I could try switching over to static config files and hope the deadlock goes away or I could subscribe to an external SIP server for each mailbox, but that seems like a bit of a kludge. :confused:

I’m using Asterisk certified/13.8-cert1, along with some el cheapo Grandstream GXP2130 phones.

Thanks so much for reading!

I was having a very similar issue with deadlocks using chan_sip and realtime voicemail.

The patch to the 13 branch seemed to fix my problem.
https://gerrit.asterisk.org/#/c/3962/4

Also, I found, before the patch, that I was losing the Mailbox on the sip peer and would need to:
sip prune realtime peer 12345
sip show peer 12345 load

I would need to do that anywhere from 1 to 3 times, then the Mailbox would reload. As soon as I tried to access it, the mailbox would disappear from the peer again.

I ran the 13 branch with the patch applied since Friday. So far, nothing queueing in the taskprocessor.

I think you’re on to something!

who use realtime with peers that have mailboxes were experiencing runaway situations that manifested as a continuous stream of taskprocessor congestion errors, memory leaks and an unresponsive chan_sip

I was receiving error messages that certain tasks named like “SIP/000B8293C057-00000043” reached the ‘in queue’ limit of 500.

I just switched to static configs and the queue problem is gone and so is the deadlock. When tomorrow rolls around, I’ll try out the patch and report back. Do you have any worries about running the 13 branch in production? Thanks again!

I didn’t have much of a choice. I was running 13.5, and was going through 10% of 16 GB of RAM daily. So far, I’ve run over 1500 calls through, and there’s nothing in the queues. My dialplan is pretty extensive and the only problem I’ve seen is with the odbc briefly losing its connection. It didn’t affect the system, but I did see a warning. There were changes to the odbc driver to fix mutli-threading, but my version of the odbc driver doesn’t support it. I need to upgrade. Otherwise, it seems perfectly stable.

I grabbed branch 13 and it seems to have done the trick! No more deadlocks and after a few calls I have no queued tasks. All is well, thanks!

1 Like

Awesome! We should be making a new 13 release soonish (a few weeks).

1 Like