About every 20ish days I have problem occur where phones and trunks look registered, but nothing is working, Phones show offline.
I’ve tried to unload and reload chan sip but it wont. I can only do a core restart to reslove it.
IN logs I have errors like
2019-12-03T16:07:52.747169-06:00 vmctel22.multiservice.com asterisk[67371]: ERROR[67420]: chan_sip.c:4321 in __sip_reliable_xmit: Serious Network Trouble; __sip_xmit returns error for pkt data
2019-12-03T16:08:06.746635-06:00 vmctel22.multiservice.com asterisk[67371]: ERROR[67420]: chan_sip.c:4321 in __sip_reliable_xmit: Serious Network Trouble; __sip_xmit returns error for pkt data
Since I’m on 16.4, I was planning to update to 16.6.2 now that its out. Any ideas on the chan sip issue. Looking at the release notes I don’t see anything that is a clear cut match but I see ASTERISK-28282 which might be part of the problem. I’m not 100% sure.
Additional, When the event occurs, now happening ever 2 days I cant put a call to a phone, but the existing calls don’t drop. As well it looks like calls continue to come into the system, I just cant send them to the endpoints.
I understand. It has to be something environmental, but finding it is a bugger.
I have a change planned for 16.6.2 as I know that has some fixes but I’ve been on 16.4.0 since June 10 and this issue only started in the last 2 weeks. I’ve been on vacation for a good part of that so I know its not changes that I’ve made.
If you had been running Asterisk 16.4.0 for a few months, but this issue only started a couple weeks ago then I’d try to focus on what changed in the last couple of weeks.
Did the Asterisk configuration change? Something on the network, or network configuration? Are you using realtime? If so any database related changes? New endpoints? Same or different trunks? etc…
What kind of transports are the endpoints configured to use? Does it happen for all endpoints?
Any other warnings/errors in the log? If you haven’t already try enabling debug and setting it to at least level 3. Anything of note in the output?
I’ll look at the fd sie Memory has been looking good. I still won’t be surprised to find some weird item in the aws / vmware hypervisor. It running in a aws region with Vmware software defined datacenter.
I have not found any items to suggest a issue in the environment. However its likely different than most users. It is mostly vmware running on aws i3 bare metal hardware. Vmware is newer then what you can buy for on prem usage. Its a service so it also means vmware manages the layer and so some items require vmware techs to make changes.
Servers are running Rhel 7 as the os. with current patches.
I’m close to what woudl be a normal high period and I see fd up in the 900 range so not very far from the 1024 default. I’ve increased the limits so see if I get up past that.
One thing for sure is I’ve not seen any messages in logs around events that show too many open files messages. But if anyone has an ideas by all means let me know.
Does 1. ASTERISK-28561 that is fixed in 16.7.0 apply to chan_sip or chan_pjsip. I’m using chan_sip not pjsip.
The affected componts suggest chan_pjsip.
It should apply to either. Meaning the problem might occur when using either channel driver without the patch applied. Note, though that the problem can only occur if you are initiating a call using a fast originate (Async=true). If you are not originating calls as such then this issue wouldn’t affect you.