I would like to ask for help in diagnosing a problem we have with one of our Asterisk boxen which has gotten a bit sick.
It’s an Asterisk 1.8 installation with about 225 peers. It has worked fine for years, but a few weeks ago it developed trouble. The problem is that at random intervals it logs that several peers are unreachable or lagged, and exactly 10 seconds later they are logged as reachable again. At first we thought this was a network problem, but it does not seem to be so.
Right now I am looking at a full asterisk SIP debug file next to a tcpdump of the traffic of that box. What I see is that, when this problem occurs, Asterisk is sending packets out, but these packets do not show up in the tcpdump.
This happens with the original OPTIONS packet and a couple of retransmits: they all seem to disappear. They do not show up in the tcpdump trace, in any case. (So, since the packet is never sent out, there is no reply, so Asterisk logs an unreachable state. That much makes sense, at least.)
It should not be the firewall: all outgoing traffic is accepted. We’ve tried rebooting the box, upgraded asterisk to 1.8.28cert5, but no change.
If we route all calls through one of our other servers, the problem still happens, so it should not be related to an overloaded box. (In any case, the load is zero and most of its 2GB memory is free).
These are the interface statistics:
eth1 Link encap:Ethernet HWaddr 00:25:90:24:6b:30
inet addr:xx.xx.xx.xx Bcast:xx.xx.xx.xx Mask:255.255.255.128
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:16096224 errors:0 dropped:129512 overruns:0 frame:0
TX packets:15853537 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:3636660310 (3.3 GiB) TX bytes:5201292764 (4.8 GiB)
Looks good on the TX side I think, (although I might need to look at those dropped packets on the receiving side).
So, the basic question is: how can SIP packets that Asterisk is sending, get lost on the same box?
I’m unsure where to look next. Any help would be really appreciated.
Thanks a lot,