Endpoints become unavaiable every few minutes

I’m thinking may be the problem is caused by the option remove_existing=yes in aor config. I’m setting it to no for all users and hoping this headache ends here.

I’d look more at the significance of “shutdown” in the messages.

I think you need to get tcpdump/wireshark traces and see how many TLS connections are being set up, when they are taken down, and by whom.

I don’t think it’s a connection problem. I could reproduce the same problem in another server.

I have two users connected from the same ip address. When i do pjsip show contacts, i have two separate contacts for each user.

  Contact:  <Aor/ContactUri..............................> <Hash....> <Status> <RTT(ms)..>
==========================================================================================

  Contact:  1017/sip:hb14hf8l@MY_IP_ADDRESS:46418;transpo 5f391ede42 Avail        89.639
  Contact:  1052/sip:43q2g0nn@MY_IP_ADDRESS:32940;transpo 27eef1bc06 Avail       219.331

If one of the users disconnected for whatever reason, its contact is removed, but also the contact of the other user who is from the same ip, pjsip show contacts is empty.

This is the problem i think, asterisk removes all of the contacts from the same ip address, if juste one is disonnected.

Asterisk doesn’t remove all contacts from the same IP address. It doesn’t work like that, and we’ve had no other reports of such an issue.

I’d suggest doing what David mentioned and getting a packet capture to determine who is shutting down the connections.

1 Like

Ok thank you @jcolp and @david551. I’ll look into it.

On the client side, i see that the websocket connection is still up, while asterisk is removing the contact and considering it unreachable. It becomes reachable again when the client sends a new Register request, on that same open connection!!

Are there any other reasons for this message other than a dropped connection ?

Removed contact 'sip:mcu7vmm4@X.X.X.X:50178;transport=ws;x-ast-orig-host=ijou1s2tent9.invalid:0' from AOR '1017' due to shutdown

Also, is there a cli command to show websocket connections? I couldn’t find any.

Thanks

As TCP connections, they would show with the shell command “netstat”.

DIdn’t think of that, thanks a lot.

We are seeing something very similar with asterisk shutting down websocket transports while the socket itself appears to stay open on the client side. It only appears to be an issue with the websocket transport, other transports are working as expected. The endpoints go unreachable until they re-register. I can confirm this behavior was introduced in 18.15.1 and that downgrading to 18.15.0 resolves the issue but I don’t like that as a solution as 18.15.1 is a security update.

Since you’ve isolated it down to that I’ve created an issue[1] and we’ll be holding off on releases of the current release candidates so we can determine what is going on.

[1] [ASTERISK-30369] res_pjsip: Websockets from same IP shut down when they shouldn't be - Digium/Asterisk JIRA

Yes, downgrading to 18.14 resolved the issue for me as well.

There’s a new cli command pjsip show transport-monitors. See if that gives you any additional information.

I’ll note we haven’t noticed anything specific to the IP the devices are connecting from just an increase in websocket transports getting shut down and the clients going offline more than normal. I’m currently attempting to isolate the issue by applying the patches in 18.15.1 to our 18.15.0 build one at a time.

I’m having the same behavior here. Reverting to 20.0.0 solves the problem. Tested every commit in pjsip and the behavior starts to happen on rev ed45a9182d.

1 Like

I’ve isolated the same problem on three different systems with webrtc clients. Definitely the problem was introduced in this change: “pjsip_transport_events: Fix possible use after free on transport”.

If one peer disconnect, all the others become unavailable.

1 Like

I’m working on the issue.

Great! Let me know if you need something!

There are patches up on Gerrit that should fix the issue. Please give them a try:
https://gerrit.asterisk.org/q/topic:ASTERISK-30369

1 Like

Thanks… Installed provided patches and resolved my issue.
Now not disconnecting other endpoints.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.