Asterisk HA with Keepalived

Im trying to achieve network based failover for two asterisk servers using keepalived.

My setup is very simple and very basic with just two phones and using TLS.

The phones get registered to the active server alright.

The failover works perfect while the phones are idle. i.e I can make/receive call from any of the phones.

And when I try to do failover during and ongoing call, they phones do get registered to the new server but when I try to make a call, sometimes the call gets through and sometimes I get a CONGESTION message at the asterisk console.

Wondering if anybody has done this scenario or is it just not possible to achieve?

May be someone can share with me his keepalived.conf file setting.

Thanks.

I’m confused by this, as you talk about failover during an ongoing call, but then talk about a new call.

Registration only affects new, outgoing calls. Signalling addresses and media addresses are sent in the INVITE at the start of a call, override those provided by any registrations, for that call.

If you don’t need Asterisk to see media/DTMF during the bridged phase of a call, enabling direct media, and ensuring there is nothing that conflicts with it, will mean the media will survive the loss of the PABX, at least until there is a signalling event (e.g. session timers, or attempt to hold a call).

Thanks david551 for the reply.

I dont mind call being disconnected on the failure of the MASTER node.

Let me rephrase the scenario step by step.

ARCHITECTURE:

2x ubuntu Servers with same asterisk config.
Keepalived installed for network based failover with a VIP (Virtual IP).

2x Phones registered with the Server1 which is currently MASTER.

I can make and receive calls.

SCENARIO 1:

I pull the network cable from Server1 to initiate failover.

VIP is assigned to Server2.

Phones get registered with Server2 because it is now the MASTER.

I can make and receive calls alright.

SCENARIO 2:

Server1 is MASTER with the VIP.

I initiate a call from one phone to the other. During the call I pull the network cable to initiate the failover.

The call is dropped.

Server1 becomes BACKUP

VIP is assigned to Server2 which is now the MASTER.

Phones get registered with Server2.

I try to initial the call again and get CONGESTION message on the Server2 Asterisk console.

This is the problem. Am I doing something wrong with the keepalived config. Or something I need to to at the Asterisk end ???

Hope my problem is clear now.

You need to provide the actual log, from /var/log/asterisk/full, which you should enable in logger.conf, and with a verbosity of. at least, 3. You do need to tell us which channel driver you are using. You may need to provide the protocol debugging/logging for your channel driver.

I’m not sure that CONGESTION gets include in the logs without specific dialplan, so we may need your dialplan. I think the standard output for congestion is the everyone is busy message, from Dial(), with the four numbers, for number busy, number congested, number available, and total outgoing legs attempted. In any case, you need to get logging for the failure from before that mesage, to understand the reason.

The log file is attached in the link below as I cannot upload it here as a “new user”.

We are using PJSIP.

DialPlan is just one line:

exten => _9XX,1,Dial(PJSIP/${EXTEN})

Server1 IP: 192.168.2.91
Server2 IP: 192.168.2.92
VIP: 192.168.2.100

It looks like it broke in this section:

[Apr  2 09:50:20] DEBUG[3653] res_pjsip_session.c:  PJSIP/906-00000009 TSX State: Calling  Inv State: CALLING
[Apr  2 09:50:20] DEBUG[3653] res_pjsip_session.c:  Topology: Pending:  <0:audio-0:audio:sendrecv (g722)>  Active: (null topology)
[Apr  2 09:50:20] DEBUG[3653] res_pjsip_session.c:  
[Apr  2 09:50:20] DEBUG[3653] chan_pjsip.c:  RC: 0
[Apr  2 09:50:20] DEBUG[3350] res_pjsip_session.c:  PJSIP/906-00000009 Event: TSX_STATE  Inv State: DISCONNCTD

I’m not sure of the signficance of TSX_STATE, but I assume the B side TLS connection got closed, presuimably by the remote end.

The INVITE is shown as sent.

You may need to use tcpdump/wireshark to work out what is happending at the TLS level.

Yes, a transport error occurred and the transport was disconnected.

Thanks a lot david551.
Your help is really appreciated.

Ok.

So that means what?

Am I doing something wrong.

or

Is it even possible to use TLS in this kind of arrangement.

If yes then how can it be achieved.

As David said, “You may need to use tcpdump/wireshark to work out what is happending at the TLS level.” I would suggest doing that to understand what is going on and have a clear picture.

Use sngrep for better call trace with all the packets.

It looks like your 2nd IP phone didn’t reregister immediately to the new server after the VIP switching.

To ensure that I see the events happening in sngrep in real time.

If it hadn’t registered in time for the call, Asterisk would not have known what address to put into the INVITE!

[Apr  2 09:50:20] VERBOSE[3653] res_pjsip_logger.c: <--- Transmitting SIP request (1039 bytes) to TLS:192.168.2.106:50193 --->
INVITE sip:906@192.168.2.106:50193;transport=TLS;ob SIP/2.0

The phones obviously take some time to register to the new Server, but they register alright, sooner or later.

Can you please guide me if keepalived is the way to go to achieve a basic level of redundancy at the ASTERISK level. Or is there some other, better way to achieve this goal.

Thanks.

I am aware there are people who have used keepalived, at least in the past. Others have used multiple Asterisk instances behind a SIP proxy, and then had the SIP proxy redundant too. I don’t have a guide or suggestions, it really depends on expectations, needs, and even experience.