My WebRTC pages use call.mydomain.com to communicate with the Asterisk server at ip _AST_IP_.
The end users ip is _USR_IP_
All other domain names of *.mydomain.com are on ip _MAIN_IP_.
As of Asterisk 13.10.0 (Asterisk 13.9.1 is just fine) I get an error either at the beginning of a call or at the end (usually at the end).
When this happens at the start of the call, the call is not processed (that’s bad).
When a call is processed, it can last for hours and not show the message, as soon as the call is terminated, the message will apear (I can live with that).
VERBOSE[38298] res_http_websocket.c: WebSocket connection from ’ _USR_IP_:45947’ for protocol ‘sip’ accepted using version '13’ VERBOSE[38298] chan_sip.c: Registered SIP ‘1’ at _USR_IP_:45947 … CALL PROCESS … WARNING[37921] chan_sip.c: sip_xmit of 0x7f42f001e250 (len 575) to _MAIN_IP_:45947 returned -1: Operation not permitted
Working configuration is Asterisk 13.9.1, Jansson 2.8, pjproject 2.5.5
I tested Asterisk 13.9.1, 13.10.0, 13.11.1, 13.11.2 with Jansson 2.8/2.7 pjproject 2.5.5/2.4.5.
To be sure I enabled/disabled all firewall rules during testing.
My questions :
Why is the ip address of the wrong server used in sending a critical message, instead of sending it to the end user ?
Where in the code should I start looking to try and create a bug fix ?
Looking at the difference between those two versions there’s minimal changes and it doesn’t appear as though there’s anything that would have caused behavior like that. As for your questions you’d need to isolate which change introduced the problem and go from there. If you have a git clone you can use “git log 13.9.1…13.10.0 channels/chan_sip.c” to show the changes between those two versions, but like I said… there’s not a huge amount or anything that sticks out. Providing the console output in full may also show something.
I did a test with “sip set debug on” and the result made me wonder if it is a timing issue.
As soon as the call ends the user logs out and on versions later then 13.9.1, this seems to be handled a lot quicker, resulting in the call ending and the unregister process to clash.
I need to investigate further, but I hope this information is usefull for others.
I was mistaking in believing the problem was only related to versions after 13.10.0, it just did not pop up under certain instances.
I have investigated further and think the problem is related to https://issues.asterisk.org/jira/browse/ASTERISK-25237
As soon as the real time peer unregisters it is removed from the “sip show peer” list, but probably some pointer still points to the memory involved as when the process that checks the NAT runs, the peer is still processed, but with incorrect values.
Only after the messages “stasis_cache.c: Attempting to remove an item from the SIP/ _MY_PEER_-cached cache that isn’t there: ast_endpoint_snapshot_type SIP/ _MY_PEER_” is logged, the NAT settings for the peer are no longer checked.
As to why I think this is the problem:
I also figured out when the _MAIN_IP_ is used instead of _USR_IP_ as follows :
In the CLI I do the following when the user is logged on
sip show peer _MY_PEER_
sip prune realtime _MY_PEER_
sip show peer _MY_PEER_
===> Peer _MY_PEER_ not found.
sip show peer _MY_PEER_ load
===> All values are identical except for the following :
Addr->IP : _USR_IP_:36656
Prim.Transp. : WS
Reg. Contact : sips:_MY_PEER_@df7jal23ls0d.invalid;rtcweb-breaker=no;transport=wss
becomes
Addr->IP : _MAIN_IP_:36656
Where does the system get this value.
Prim.Transp. : UDP
Reg. Contact : sips:_MY_PEER_@df7jal23ls0d.invalid;rtcwe
Looks like the value is truncated for some reason
As to why _MAIN_IP_ is used I have no idea.
Additionally the Asterisk log with sip log shows the system contacting each peer in order to reestablish NAT settings.
While the peer is registered the following messages appears : VERBOSE[67189] chan_sip.c: Reliably Transmitting (NAT) to _USR_IP_:53445:
As soon as the peer unregisters the following messages appears : VERBOSE[67050] chan_sip.c: Reliably Transmitting (NAT) to _MAIN_IP_:53445:
As it appears you are using WebRTC when they unregister do they also close the WebSocket connection? If so chan_sip may be trying to fall back for some reason, as without a WebSocket connection we can’t connect back.
I don’t think the WebSocket is closed.
I use sipML5 api from Doubango and have not dug into their code.
I now modified the web page to log out after every call.
So now I register at the start of the call and unregister at the end of the call…
The WebSocket is closed by Asterisk at some point see the log, placing another call works
[Sep 20 07:59:18] VERBOSE[124390] chan_sip.c: Unregistered SIP '_MY_PEER_'
[Sep 20 07:59:18] WARNING[114542] chan_sip.c: sip_xmit of 0x7f753000ef80 (len 575) to _MAIN_IP_:38755 returned -1: Operation not permitted
[Sep 20 07:59:19] WARNING[114542] chan_sip.c: sip_xmit of 0x7f753000ef80 (len 575) to _MAIN_IP_:38755 returned -1: Operation not permitted
[Sep 20 07:59:20] WARNING[114542] chan_sip.c: sip_xmit of 0x7f753000ef80 (len 575) to _MAIN_IP_:38755 returned -1: Operation not permitted
[Sep 20 07:59:21] WARNING[114542] chan_sip.c: sip_xmit of 0x7f753000ef80 (len 575) to _MAIN_IP_:38755 returned -1: Operation not permitted
[Sep 20 07:59:22] WARNING[114542] chan_sip.c: sip_xmit of 0x7f753000ef80 (len 575) to _MAIN_IP_:38755 returned -1: Operation not permitted
[Sep 20 07:59:22] NOTICE[114542] chan_sip.c: Peer '_MY_PEER_' is now UNREACHABLE! Last qualify: 155
[Sep 20 07:59:32] ERROR[114542] stasis_cache.c: Attempting to remove an item from the SIP/_MY_PEER_-cached cache that isn't there: ast_endpoint_snapshot_type SIP/_MY_PEER_
[Sep 20 07:59:50] VERBOSE[114542] res_http_websocket.c: WebSocket connection from '_USR_IP_:38755' forcefully closed due to fatal write error
[Sep 20 08:00:15] ERROR[114542] stasis_cache.c: Attempting to remove an item from the SIP/_MY_PEER_-cached cache that isn't there: ast_endpoint_snapshot_type SIP/_MY_PEER_