I have this problem occuring throughout the day. Multiple endpoints become unreachable at the same time, and then after a few seconds go back to reachable. Generally the disconnected endpoints are from the same incoming ip. And this happens to different users from different ip, not juste a specific address.
Nothing in the firewall that could cause this issue. Also if an endpoint in in a call, and becomes unreachable, the call is still on and not dropped.
OS version: Ubuntu 22.04.1
Asterisk version: 18.15.1
PJSIP version: 2.12.1
[2022-12-19 11:33:26] VERBOSE res_pjsip/pjsip_options.c: Contact 1401/sip:email@example.com:57755;transport=ws;x-ast-orig-host=r7ajg6im8a25.invalid:0 has been deleted
[2022-12-19 11:33:26] VERBOSE res_pjsip/pjsip_configuration.c: Endpoint 1401 is now Unreachable
[2022-12-19 11:33:26] VERBOSE res_pjsip/pjsip_options.c: Contact 1588/sip:firstname.lastname@example.org:56911;transport=ws;x-ast-orig-host=qbpck0eqrao0.invalid:0 has been deleted
[2022-12-19 11:33:26] VERBOSE res_pjsip/pjsip_configuration.c: Endpoint 1588 is now Unreachable
[2022-12-19 11:33:26] VERBOSE res_pjsip/pjsip_options.c: Contact 1544/sip:email@example.com:58861;transport=ws;x-ast-orig-host=cqm5d07sn6oe.invalid:0 has been deleted
[2022-12-19 11:33:26] VERBOSE res_pjsip/pjsip_configuration.c: Endpoint 1544 is now Unreachable
Sometimes i see these errors relating to websocket and ssl, before or while the problem is occuring, either one of them or both at the same time, but it’s not always the case .
[2022-12-19 11:27:01] ERROR iostream.c: SSL_shutdown() failed: error:00000005:lib(0):func(0):DH lib, Underlying BIO error: Bad file descriptor
[2022-12-19 11:27:01] VERBOSE res_http_websocket.c: WebSocket connection from '18.104.22.168:36770' forcefully closed due to fatal write error
That was my first assumption. That is why we moved our asterisk to aws from another cloud provider, we thought maybe there was a network problem with that specific provider, but the problem still occurs even after changing cloud providers.
I forgot to mention that asterisk is in a docker container. Can it be related to the problem? Does docker create some kind of bottleneck or something of the kind ? We were hesitant to use docker with asterisk but a lot of people use it without a problem so we made the jump.
What is making me think maybe it is specific to the sip side, is we don’t have problems with audio. If there was a problem with network or docker droping packets, then we would have a lot of audio problems, which we don’t have.
Is it possible that the web server within asterisk that handles websocket can’t support a lot of traffic, and so close some connections when overloaded?
An endpoint sends a register, asterisk removes the contact for that endpoint “due to request”, and sends a 200 OK, immediately the endpoint becomes unreachable, and then other endpoints that have the same ip address become unavailable.
[2022-12-19 12:49:34] VERBOSE res_pjsip/pjsip_options.c: Contact 1587/sip:vqqsfald@REMOTE_IP:62730;transport=ws;x-ast-orig-host=tokp14avjl56.invalid:0 has been deleted
[2022-12-19 12:49:34] VERBOSE res_pjsip/pjsip_configuration.c: Endpoint 1587 is now Unreachable
[2022-12-19 12:49:34] VERBOSE res_pjsip_registrar.c: Removed contact 'sip:lhl9hicd@REMOTE_IP:55761;transport=ws;x-ast-orig-host=fvrqt4qpullo.invalid:0' from AOR '1524' due to shut
[2022-12-19 12:49:34] VERBOSE res_pjsip_registrar.c: Removed contact 'sip:3i4cnauh@REMOTE_IP:51369;transport=ws;x-ast-orig-host=r9e4n067qlbo.invalid:0' from AOR '1552' due to shutd
[2022-12-19 12:49:34] VERBOSE res_pjsip_registrar.c: Removed contact 'sip:cr78jqmi@REMOTE_IP:57755;transport=ws;x-ast-orig-host=r7ajg6im8a25.invalid:0' from AOR '1401' due to shut
[2022-12-19 12:49:34] VERBOSE res_pjsip/pjsip_options.c: Contact 1524/sip:lhl9hicd@REMOTE_IP:55761;transport=ws;x-ast-orig-host=fvrqt4qpullo.invalid:0 has been deleted
[2022-12-19 12:49:34] VERBOSE res_pjsip_registrar.c: Removed contact 'sip:m3me8hqg@REMOTE_IP:59313;transport=ws;x-ast-orig-host=fvis0e38vsnp.invalid:0' from AOR '1407' due to shut
All the endpoints removed are from the same ip address as the first one that was removed. Other endpoints with different ip aren’t removed
Shouldn’t that remove only the endpoint that sent the unregister, not every endpoint from that ip address? I see the same pattern that repeats, a Register request is sent for one endpoint, that endpoint is removed, then every endpoint with the same ip is removed immediately afterwards .
On the client side, i see that the websocket connection is still up, while asterisk is removing the contact and considering it unreachable. It becomes reachable again when the client sends a new Register request, on that same open connection!!
Are there any other reasons for this message other than a dropped connection ?
Removed contact 'sip:mcu7vmm4@X.X.X.X:50178;transport=ws;x-ast-orig-host=ijou1s2tent9.invalid:0' from AOR '1017' due to shutdown
Also, is there a cli command to show websocket connections? I couldn’t find any.
We are seeing something very similar with asterisk shutting down websocket transports while the socket itself appears to stay open on the client side. It only appears to be an issue with the websocket transport, other transports are working as expected. The endpoints go unreachable until they re-register. I can confirm this behavior was introduced in 18.15.1 and that downgrading to 18.15.0 resolves the issue but I don’t like that as a solution as 18.15.1 is a security update.