Im trying to achieve network based failover for two asterisk servers using keepalived.
My setup is very simple and very basic with just two phones and using TLS.
The phones get registered to the active server alright.
The failover works perfect while the phones are idle. i.e I can make/receive call from any of the phones.
And when I try to do failover during and ongoing call, they phones do get registered to the new server but when I try to make a call, sometimes the call gets through and sometimes I get a CONGESTION message at the asterisk console.
Wondering if anybody has done this scenario or is it just not possible to achieve?
May be someone can share with me his keepalived.conf file setting.
I’m confused by this, as you talk about failover during an ongoing call, but then talk about a new call.
Registration only affects new, outgoing calls. Signalling addresses and media addresses are sent in the INVITE at the start of a call, override those provided by any registrations, for that call.
If you don’t need Asterisk to see media/DTMF during the bridged phase of a call, enabling direct media, and ensuring there is nothing that conflicts with it, will mean the media will survive the loss of the PABX, at least until there is a signalling event (e.g. session timers, or attempt to hold a call).
You need to provide the actual log, from /var/log/asterisk/full, which you should enable in logger.conf, and with a verbosity of. at least, 3. You do need to tell us which channel driver you are using. You may need to provide the protocol debugging/logging for your channel driver.
I’m not sure that CONGESTION gets include in the logs without specific dialplan, so we may need your dialplan. I think the standard output for congestion is the everyone is busy message, from Dial(), with the four numbers, for number busy, number congested, number available, and total outgoing legs attempted. In any case, you need to get logging for the failure from before that mesage, to understand the reason.
As David said, “You may need to use tcpdump/wireshark to work out what is happending at the TLS level.” I would suggest doing that to understand what is going on and have a clear picture.
Can you please guide me if keepalived is the way to go to achieve a basic level of redundancy at the ASTERISK level. Or is there some other, better way to achieve this goal.
I am aware there are people who have used keepalived, at least in the past. Others have used multiple Asterisk instances behind a SIP proxy, and then had the SIP proxy redundant too. I don’t have a guide or suggestions, it really depends on expectations, needs, and even experience.