we are running asterisk 18.16.0 with IAX and PJSIP in a kvm VM running Debian 11. Host is also a Debian 11 in a DC using a floating IP which redirect port 5060 to the asterisk VM IP. fail2ban is running aside asterisk, host and VM use nftables and are ipv4 and ipv6 compliant. This setup is working since ages.
We have lots of customers connecting to the asterisk server behind router in ipv4. Some of them are connecting through VPN, others in AX with or without VPN.
Now araise our problem: if we restart asterisk, all customer devices in ipv4 not using IAX or VPN can’t connect anymore, sngrep show their registration SIP header but asterisk never reply. If we restart asterisk VM, no changes, as well as restarting FW rules on the physical host or restarting libvirtd daemon. If we restart phones and router on customer side, no changes.
Only solution to get it work properly again is to restart physically the host
Any idea why asterisk have this behavior or what could be source of this problem?
Thanks for any hint
If I’d be faced with something like this, I’d first check whether it is related to the router’s NAT and firewall table states. When you restart Asterisk, it invalidates all current states related to Asterisk. It then depends on the details of the protocols how fast things get back.
A reboot does not have this problem as everything is fresh.
In case you can verify this, you’d also need to invalidate all router entries in addition to restart Asterisk. This may be acceptable, or not, depending on what else is running. My routers either run pfSense or opnSense and I know how to script things like that, but generally I wouldn’t know what to do.
I am not sure what you mean with “host”, but I also do not know the exact details of your network configuration. In case you have configured direct connections (macvtaps, or so), you should not need to restart the hypervisor. Restarting the Asterisk server might be an acceptable solution, but you’d still need to look at how your NAT tables behave. You might need 1:1 NAT, or outbound NAT rules (pfSense jargon) to make things work reliably. Again, this has to do with the details of the configuration.
By host I mean the physical server hosting the VM. Setup is following:
customer ipv4 <> Internet <> host redirecting 5060 to ipv4 IP asterisk <> asterisk VM
As stated, a restart from asterisk VM does NOT solve the problem. Also, if it would be a NAT or router problem, IAX customers should have the same problem.
Keep in mind that register packet are coming into asterisk as shown by sngrep and tcpdump, but asterisk does not reply which could be explained by a nftables or fail2ban bug eg tables entries not resetted even after reboot of the VM.
Thanks for your help
It is not enough to forward the packets for port 5060, but at first I’d like to know which VM software you are using. If it is qemu/kvm I might be able to help more. If not, I can give only general hints. I’m trying to find out if you’re using a hypervisor virtual network or a direct connection.
Also what router are you using? Are you able to get a dump of all firewall states?
Another stupid question. Are the external phones registered again after a few minutes, or does everything end up in a black hole?
As stated in original post we are using kvm on Debian11 (host & VM) with nftables. I upgraded asterisk to latest 18.17.0 and did:
. make install ; for 18.17.0
. core stop now ; on running asterisk 18.16.0
. systemctl restart asterisk (start should be the same but routine ;))
. flush nftables rules and applied them again
All phones are connecting correctly.
If nftables would be the culpit I don’t understand why a reboot of asterisk VM did not the job. Will now see what happend in the future.
Black hole, they never get registered, asterisk doesn’t send response to the register header
Macvtab device or virtual network (‘default’)? Do local phones work all the time? What happens if you disable nftables for testing?
Virtual network. Other SIP phones connected via VPN or PABXs connected in IAX are working w/o problem. Those phones too, once registered they are working flowlessly. Problem arise only after in CLI >reload or # systemctl restart asterisk
Since the SIP and IAX protocols differ, I am not surprised that there are differences in some situations. For me it looks currently like a firewall issue and I would look at a pcap trace of the hypervisor to see what arrives here before anything gets to the vm.
If you restart a phone on the other end, is it able to connect? If the problem is related to the endpoints reusing old connections, that are no longer valid when Asterisk restarts, I expect this to work.
I already restarted phones on few other ends including routers (see original post), no changes. Again, why take a pcap trace of the hypervisor side when the pcap one in asterisk VM shows packets incoming but asterisk never reply.
From my post # 4 I agree with you that the problem seems to be the firewall -nftables-, have to validate next time we reload/restart asterisk. In the mean time we will recheck on our test server, this one being in 20.2.0 version
I am not sure whether it is the nftables configuration. I described how I would proceed. I would start by tracking all communication at every point where things might change. The next step would be an analysis.
I could either ask for the detailed configuration or your entire network, which might or might not include 1 or more NAT barriers, or I could suggest to generate pcap traces. Sometimes SIP does not work as people expect it to work. Let’s say there is a NAT barrier and the other side reuses a previous connection in terms of IP and port. There might be something to keep this path open, but a restart of Asterisk might now result in a different port. The pcap traces at various points show what is going on and finally you would know who is responsible for which state in the firewalls.
Let’s say you first see that registration to IP1:PORT1 works, but after a restart you see that IP1:PORT2 has been offered but the other side still sends to IP1:PORT1, you would know what to do. I am not saying that this is exactly your problem, but it could. So the next steps depend on the analysis of the communication.
There is no static and fixed description of what to do, but you can try to find out what is actually happening.
PS: I never use virtual networks for production systems with my VMs. It would be one network component more to worry about.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.