I have this problem occuring throughout the day. Multiple endpoints become unreachable at the same time, and then after a few seconds go back to reachable. Generally the disconnected endpoints are from the same incoming ip. And this happens to different users from different ip, not juste a specific address.
Nothing in the firewall that could cause this issue. Also if an endpoint in in a call, and becomes unreachable, the call is still on and not dropped.
OS version: Ubuntu 22.04.1
Asterisk version: 18.15.1
PJSIP version: 2.12.1
[2022-12-19 11:33:26] VERBOSE[372] res_pjsip/pjsip_options.c: Contact 1401/sip:cr78jqmi@196.50.14.90:57755;transport=ws;x-ast-orig-host=r7ajg6im8a25.invalid:0 has been deleted
[2022-12-19 11:33:26] VERBOSE[372] res_pjsip/pjsip_configuration.c: Endpoint 1401 is now Unreachable
[2022-12-19 11:33:26] VERBOSE[20451] res_pjsip/pjsip_options.c: Contact 1588/sip:l4jskccr@196.50.14.90:56911;transport=ws;x-ast-orig-host=qbpck0eqrao0.invalid:0 has been deleted
[2022-12-19 11:33:26] VERBOSE[20451] res_pjsip/pjsip_configuration.c: Endpoint 1588 is now Unreachable
[2022-12-19 11:33:26] VERBOSE[4003] res_pjsip/pjsip_options.c: Contact 1544/sip:7l56ss8e@196.50.14.90:58861;transport=ws;x-ast-orig-host=cqm5d07sn6oe.invalid:0 has been deleted
[2022-12-19 11:33:26] VERBOSE[4003] res_pjsip/pjsip_configuration.c: Endpoint 1544 is now Unreachable
.......
Sometimes i see these errors relating to websocket and ssl, before or while the problem is occuring, either one of them or both at the same time, but it’s not always the case .
[2022-12-19 11:27:01] ERROR[62] iostream.c: SSL_shutdown() failed: error:00000005:lib(0):func(0):DH lib, Underlying BIO error: Bad file descriptor
[2022-12-19 11:27:01] VERBOSE[62] res_http_websocket.c: WebSocket connection from '196.50.14.90:36770' forcefully closed due to fatal write error
That was my first assumption. That is why we moved our asterisk to aws from another cloud provider, we thought maybe there was a network problem with that specific provider, but the problem still occurs even after changing cloud providers.
I forgot to mention that asterisk is in a docker container. Can it be related to the problem? Does docker create some kind of bottleneck or something of the kind ? We were hesitant to use docker with asterisk but a lot of people use it without a problem so we made the jump.
What is making me think maybe it is specific to the sip side, is we don’t have problems with audio. If there was a problem with network or docker droping packets, then we would have a lot of audio problems, which we don’t have.
Is it possible that the web server within asterisk that handles websocket can’t support a lot of traffic, and so close some connections when overloaded?
An endpoint sends a register, asterisk removes the contact for that endpoint “due to request”, and sends a 200 OK, immediately the endpoint becomes unreachable, and then other endpoints that have the same ip address become unavailable.
[2022-12-19 12:49:34] VERBOSE[20451] res_pjsip/pjsip_options.c: Contact 1587/sip:vqqsfald@REMOTE_IP:62730;transport=ws;x-ast-orig-host=tokp14avjl56.invalid:0 has been deleted
[2022-12-19 12:49:34] VERBOSE[20451] res_pjsip/pjsip_configuration.c: Endpoint 1587 is now Unreachable
[2022-12-19 12:49:34] VERBOSE[20451] res_pjsip_registrar.c: Removed contact 'sip:lhl9hicd@REMOTE_IP:55761;transport=ws;x-ast-orig-host=fvrqt4qpullo.invalid:0' from AOR '1524' due to shut
down
[2022-12-19 12:49:34] VERBOSE[4003] res_pjsip_registrar.c: Removed contact 'sip:3i4cnauh@REMOTE_IP:51369;transport=ws;x-ast-orig-host=r9e4n067qlbo.invalid:0' from AOR '1552' due to shutd
own
[2022-12-19 12:49:34] VERBOSE[64003] res_pjsip_registrar.c: Removed contact 'sip:cr78jqmi@REMOTE_IP:57755;transport=ws;x-ast-orig-host=r7ajg6im8a25.invalid:0' from AOR '1401' due to shut
down
[2022-12-19 12:49:34] VERBOSE[75690] res_pjsip/pjsip_options.c: Contact 1524/sip:lhl9hicd@REMOTE_IP:55761;transport=ws;x-ast-orig-host=fvrqt4qpullo.invalid:0 has been deleted
[2022-12-19 12:49:34] VERBOSE[20451] res_pjsip_registrar.c: Removed contact 'sip:m3me8hqg@REMOTE_IP:59313;transport=ws;x-ast-orig-host=fvis0e38vsnp.invalid:0' from AOR '1407' due to shut
down
...........
All the endpoints removed are from the same ip address as the first one that was removed. Other endpoints with different ip aren’t removed
Shouldn’t that remove only the endpoint that sent the unregister, not every endpoint from that ip address? I see the same pattern that repeats, a Register request is sent for one endpoint, that endpoint is removed, then every endpoint with the same ip is removed immediately afterwards .
I’m thinking may be the problem is caused by the option remove_existing=yes in aor config. I’m setting it to no for all users and hoping this headache ends here.
If one of the users disconnected for whatever reason, its contact is removed, but also the contact of the other user who is from the same ip, pjsip show contacts is empty.
This is the problem i think, asterisk removes all of the contacts from the same ip address, if juste one is disonnected.
On the client side, i see that the websocket connection is still up, while asterisk is removing the contact and considering it unreachable. It becomes reachable again when the client sends a new Register request, on that same open connection!!
Are there any other reasons for this message other than a dropped connection ?
Removed contact 'sip:mcu7vmm4@X.X.X.X:50178;transport=ws;x-ast-orig-host=ijou1s2tent9.invalid:0' from AOR '1017' due to shutdown
Also, is there a cli command to show websocket connections? I couldn’t find any.
We are seeing something very similar with asterisk shutting down websocket transports while the socket itself appears to stay open on the client side. It only appears to be an issue with the websocket transport, other transports are working as expected. The endpoints go unreachable until they re-register. I can confirm this behavior was introduced in 18.15.1 and that downgrading to 18.15.0 resolves the issue but I don’t like that as a solution as 18.15.1 is a security update.
Since you’ve isolated it down to that I’ve created an issue[1] and we’ll be holding off on releases of the current release candidates so we can determine what is going on.