Asterisk does not answer to register request for some clients

Hello,

I upgraded asterisk 18.5.0 to 18.5.1and now 2 clients can’t connect anymore, register request are coming to the server (sngrep as well as tcpdump see them) but asterisk doesn’t, a pjsip set logger host display nothing nor core set debug do. Ips are not banned.

Asterisk is running in a VM on Debian 11, nftables is the firewall. A lot of other users are going through without any problem, VPN as well as public IP. This problem arise from time to time and restarting the server -not the virtual machine- does solve the problem.But restarting a server is not the final solution, so what could be wrong here ?

Thanks for your support

Daniel

1 Like

To complete, on one customer side where the fiber link as well as all phones restarted.

You haven’t given enough information to diagnose the problem. You need to show the configuration and a SIP trace as well as the current status (e.g. pjsip show contacts, …).

It could be that the problem has nothing to do with the small update, but that there is a problem with the 2 clients or the connection to the 2 client.

pjsip show contacts show nothing as they are not registered :wink: The problem doesn’t lie on the small update but on each time or so that asterisk is restarted.

Sample of a configuration:
CreSWM-spec
;
type = endpoint
accountcode = creswm
context = from-CreSWM
language = fr
device_state_busy_at = 2
subscribe_context = CreSWMSubscribe
callerid = “MyCID” <+33123456789>
allow = !all,g722,ulaw,alaw
set_var = mySubscriptions=4
set_var = myPrivateEnv=CreSWM
set_var = myPrivateVM=100
set_var = myOnNOANSWER=main
rewrite_contact = yes
mailboxes = 100@CreSWM
voicemail_extension = 090
geoloc_outgoing_call_profile = none
geoloc_incoming_call_profile = none

creswm-aor
;
type = aor
max_contacts = 10
remove_existing = yes
qualify_frequency = 0 ;5000
default_expiration = 3600

creswm101
;
call_group = 1
pickup_group = 1
force_rport = yes
direct_media = no
rtp_symmetric = yes
transport = transport-udp
dtmf_mode = rfc4733
auth = creswm101
aors = creswm101

creswm101
;
username = creswm101
password = mypassw

[creswm101]
;
type = identify
endpoint = creswm101
match = XX.YY.180.223 ; Public IP

Thanks for your support

What do you get if you use a relativey short qualify time? Since we don’t know anything about your network, it could also be a NAT related issue.

Main question is why other clients using similar configuration doesn’t have this problem ? I agree that it is a NAT related issue within asterisk. I restarted the VM and everything is back to live. Before this I restarted asterisk (core stop now && systemctl restart asterisk) but get no changes.

Again, requests are coming to port 5060 as shown by sngrep & tcpdump but asterisk ignore them silently.

Daniel

Also, to send a qualify, phones have to be registered, but as the problem appears after asterisk is restarted, no chance :wink:

Again, you haven’t shown us anything that allows to see what is going on. The usual answer to such problems is, that something else has been overlooked. I would recommend to tackle this systematically by following the traffic in both directions.

Asterisk does not have any NAT related problems, but you might.

That said, I do not have the foggiest idea what you are trying to say. OPTION dialogs are not related to registrations and PJSIP contacts are also something different.

As said in my original post, I did: REGISTER is coming in on the VM interface as shown by sngrep and tcpdump but asterisk NEVER answer eg, no traffic coming out from asterisk. This was validated by the point that pjsip set logger host as well as core set debug 1 never show any information about the 2 IPs Hmm, only with 2 IPs and everything is working well after restart of VM ? Plenty of other customers are connected same way. For me, some related connections in asterisk are not released when asterisk is restarted. Where did I said that ? :wink:

can you check if it could be a firewall issue as tcpdumt/sngrep see trafic before the firewall

And it happen again this night with another customer. Output of a tcpdump from source address (udp and port 5060 and src or dst 2001:db8::738) this morning

10:36:40.705312 IP6 2001:db8::738.sip > pabx18.sip: SIP: OPTIONS sip:[2001:db8::5811]:5060 SIP/2.0
10:36:43.203904 IP6 2001:db8::738.sip > pabx18.sip: SIP: OPTIONS sip:[2001:db8::5811]:5060 SIP/2.0
10:36:43.704950 IP6 2001:db8::738.sip > pabx18.sip: SIP: OPTIONS sip:[2001:db8::5811]:5060 SIP/2.0
10:36:44.705087 IP6 2001:db8::738.sip > pabx18.sip: SIP: OPTIONS sip:[2001:db8::5811]:5060 SIP/2.0
10:36:44.705088 IP6 2001:db8::738.sip > pabx18.sip: SIP: OPTIONS sip:[2001:db8::5811]:5060 SIP/2.0
10:36:46.705325 IP6 2001:db8::738.sip > pabx18.sip: SIP: OPTIONS sip:[2001:db8::5811]:5060 SIP/2.0
10:36:50.705268 IP6 2001:db8::738.sip > pabx18.sip: SIP: OPTIONS sip:[2001:db8::5811]:5060 SIP/2.0
10:36:54.705925 IP6 2001:db8::738.sip > pabx18.sip: SIP: OPTIONS sip:[2001:db8::5811]:5060 SIP/2.0
10:36:58.706317 IP6 2001:db8::738.sip > pabx18.sip: SIP: OPTIONS sip:[2001:db8::5811]:5060 SIP/2.0
10:37:02.706710 IP6 2001:db8::738.sip > pabx18.sip: SIP: OPTIONS sip:[2001:db8::5811]:5060 SIP/2.0
10:37:06.706539 IP6 2001:db8::738.sip > pabx18.sip: SIP: OPTIONS sip:[2001:db8::5811]:5060 SIP/2.0

As you see, Source asterisk was restarted -asterisk as well as server-, no changes. Other customers where not impacted. Debian 11 asterisk 18.20.1

tcpdump/sngrep capture are done ON the asterisk server which is also running nftable. I see the packet coming in but no answer. No problem for others customers, IP is not in fail2ban aso. Restarting the server make things going well again without modifying any parameter.

New step: on the server who can’t connect (the one who sends above OPTIONS w/o getting answer) I ran nc -uv 2001:db8::5811 5060
and get Connection to 2001:db8::5811 5060 port [udp/sip] succeeded!

This means no firewall problem !

I did a core set verbose 5 and core set debug 5 and pjsip set debug host 2001:db8::738, not any trace from this IP in logs. In the mean time tcpdump show OPTIONS packet coming from this IP to port 5060. Well well well …

Can you sketch the complete network path for a phone that shows problems? There are too many unknowns in the equation to say something meaningful. It was new to me that there is IPv6 and that means that the usual (forward) NAT related problems are likely irrelevant, but that depends on your setup.

It could also be a problem with outgoing traffic, but that could be more predictable. If you pack everything into containers, there’s another network involved and that can also cause problems. We just do not have enough information.

Once you have a network plan, I’d suggest to sketch where traffic flows and where not. Ingress and outgress ports are important. If you think you a priori know what the error is, you’ll never the reason.

Hi Ek,

I opened the ticket on monday having this behavior for 2 clients in ipv4. There access is
Internet > our Physical Server > VM under KVM running Asterisk & nftables The problem was solved by restarting the VM.

Yesterday arise the problem for an ipv6 customer. His access
customer asterisk > local net > customer gateway > wiregard VPN > our Physical Server > VM under KVM running Asterisk & nftables For him, problem exist for IAX and SIP and ONLY in the way customer to us (he is NOT registering). Other way no problem. We switch him in SIP/ipv4 and it work.

Problem arised monday after having restarted Asterisk following upgrade from 18.20.0 to 18.20.1 Yesterday, for the ipv6 customer, there wasn’t any network failure -local or Internet- who can’t explain why the problem arise. Remember, plenty of other customers are connected with or without VPN (OpenVPN and Wireguard), ipv4 or ipv6, without any issue including our office.

In both cases, customers are connected as showed since years and never face this problem.

I read in the yesterday 18.20.2 version “res_pjproject: ast_sockaddr_cmp() always fails on sockaddrs created by ast_sockaddr_from_pj_sockaddr()”: could the problem come from here?

As the customer has a working setup I didn’t restart the VM so the problem still exist in ipv6 if you want me to do some checks or manipulations keeping in mind that the server is in production.

No. The problem would not be resolved from that change (the issue itself existed for years and was only exposed through using the function in a new way), and none of the changes in .1 or .2 are in any area that would cause this.

Guess what: at 6:00:02am this morning both SIP and IAX in ipv6 went back to live !!! Neather on one or other server there is a cron daemon who could explain the why it worked again :frowning:
Any idea ?

Not really, but almost certainly something outside of Asterisk if both SIP and IAX behaved that way.

You could be right but it wouldn’t explain why the 2 others customers behave the same behavior and never get released on monday.