This post assumes a basic familiarity with High Availability concepts but makes no assumptions regarding practical implementation. For example, you should already be familiar with terms like active-active and active-passive but don’t need to have heard of keepalived before this point.

This article is mainly intended for people who for some reason want experience with keepalived. Such as they manage an already existing instance or want an understanding how it all works for their own benefit. In general, I would have to recommend against using anything based on VRRP for anything important since there are solutions out there with much more robustness and security.

What is keepalived?

In it’s most common configuration, keepalived is the system daemon which implements the binary-formatted Virtual Router Redundancy Protocol¹ (VRRP) and takes actions according to events described by that protocol. VRRP itself is (by default) a multicast protocol that sits outside of either UDP or TCP and is only distinguishable via Protocol number in the transmitted IP packets.

Basics of VRRP

I’m not going to go into an exhaustive description of VRRP since that would be pretty boring. A basic understanding of the protocol is helpful though just so that you can more easily understand what your configuration of keepalived is actually doing.

The first thing to learn is the terminology used with VRRP.

Each node in a VRRP cluster must exist on the same subnet and are referred to as “physical routers” (not to be confused with a network router). Collectively, the physical routers compose a single “virtual router” for managing the given VIP.

Virtual routers are identified by a numeric (1-255)² “Virtual Router ID” (or “VRID”) which will also be used as the last part of the MAC address that will be used for (R)ARP request against VIP’s managed by VRRP.

When each cluster starts, the VIP must be placed on one of the physical routers and so an election takes place, with the winner being labeled “master” and all others being labeled “backup.” The election is conducted by each physical router having been assigned a static priority and the router with the highest priority will become the master. If a priority isn’t assigned by an administrator in the configuration then a default of 100 is assumed. If two different routers have the same priority the system with the highest IP address is selected as the master.³

For as long as a master remains operational, it will periodically send out “advertisements” to all of the backup nodes letting them know it’s still around. If a configurable amount of time transpires without the backup nodes receiving a complete advertisement, another election will take place.³ If a master is going down gracefully, it will send another advertisement temporarily setting its own priority to zero which forces a new election process with the remaining nodes so that the current master “loses” the election just prior to shutting down.⁴

Brief Note on LVS

In addition to basic VIP management keepalived also contains a notion of “Linux Virtual Servers” (or “LVS”). I’m not going to bother with virtual servers in this article but essentially it performs the function of actually load balancing traffic to the end nodes.

LVS exists and is deployed in many environments however the more common pattern seems to be to use something like nginx or haproxy for load balancing and relegate keepalived to strictly managing VIP’s. This is anecdotal but it’s likely due to how finicky keepalived‘s configuration parsing seems to be. Once you get VIP management to work, you’re typically in the mood to start working with something other than keepalived.

At a later date I will make another post describing a keepalived+haproxy stack for highly available load balancers.

Example keepalived Configuration

Let’s take an example of a two-node cluster. Since keepalived is basically just a VRRP client, it doesn’t do any actual VIP management on its own. That part is simple enough so we’re going to write our on glue there. Each server will have the following VIP assignment script placed on the servers/physical routers at /etc/keepalived/notify.sh:

#!/bin/bash

vipAddress="192.168.121.100/24"

if [[ "x$1" == "xmaster" ]]; then

 ip address add dev eth1 ${vipAddress}

else

  ip address del dev eth1 ${vipAddress}

fi

The keepalived.conf configuration for the master is as follows:

vrrp_instance VI_1 {

  state MASTER
  interface eth0
  garp_master_delay 10
  smtp_alert
  virtual_router_id 51
  priority 100
  vrrp_unicast_bind 192.168.121.51
  vrrp_unicast_peer 192.168.121.52
  advert_int 1

  authentication {
    auth_type PASS
    auth_pass testpass
  }

  notify_master "/etc/keepalived/notify.sh master"
  notify_backup "/etc/keepalived/notify.sh backup"

}

The above is pretty simple to understand. Essentially we create a new cluster (“vrrp_instance“) called VI_1. Each keepalived instance can keep track of multiple VRRP clusters, where with some clusters the node will be master but in other clusters merely a backup for a different VIP.

We then configured this daemon to come up expecting to be the master and that the virtual router ID is 51.

Since many networks filter multicast messages and we know the IP address for the only other peer in our simple use case, we manually specified the other node with vrrp_unicast_peer and set the primary IP address for keepalived to use for elections with vrrp_unicast_bind.

We then set a password (sent in cleartext) for this virtual router to be testpass. Since it’s sent unencrypted/hashed, passwords with VRRP are largely just to avoid accidental joins by otherwise honest parties. More reliable security can be achieved through network filtering. It’s basically impossible to have a truly secure VRRP instance, though.

Finally we register two handlers for state transitions. The first handler (“notify_master“) is for handling transitions into the MASTER state whereas the second one (“notify_backup“) is for handling transitions into the BACKUP state. In our simple use case, we’re just adding and removing IP addresses so we’ve put it all in a single script (listed at the top of this section) and just pass in the state we’re transitioning into via command line arguments.

The keepalived.conf configuration for the backup router is pretty much a mirror of the MASTER router:

vrrp_instance VI_1 {

  state BACKUP
  interface eth0
  garp_master_delay 10
  smtp_alert
  virtual_router_id 51
  priority 50
  vrrp_unicast_bind 192.168.121.52
  vrrp_unicast_peer 192.168.121.51
  advert_int 1

  authentication {
    auth_type PASS
    auth_pass testpass
  }

  notify_master "/etc/keepalived/notify.sh master"
  notify_backup "/etc/keepalived/notify.sh backup"

}

Obviously, your implementation can get as elaborate as you need. For instance your notify.sh script might send an email alert notifying admins of the state change or something similar.

Once configured, you should be able to configure a web server on each node listening on all IP’s port 80. Taking the master node down gracefully should result in the node being immediately becoming available on the backup node. Forcing the power off immediately should result in web access becoming inaccessible temporarily before coming back.

Are there alternatives to keepalived?

keepalived is the one of the most popular VRRP daemons available but it’s not the only VRRP daemon nor is it the only daemon capable of HA VIP management.

A brief listing of some alternatives would be:

* uvrrpd which is another FOSS (GPL) implementation of the VRRP protocol. Only does VIP management and not as popular as keepalived

* CoroSync High Availability suite that instead of VRRP uses the Totem protocol. This implementation is still widely deployed and forms the basis of Red Hat’s ClusterSuite product offering. It doesn’t have the same security issues as keepalived since the Totem messages are authenticated via encryption. Together with Pacemaker it’s also able to manage a large number of resources ranging from VIP’s to filesystems. This is the form of High Availability clustering most often deployed in enterprise environments due to its flexibility and increased security.

* ucarp is a BSD-licensed implementation of the CARP protocol for VIP management. It’s not as popular as keepalived or corosync but it’s still more popular than uvrrpd. The daemon itself only does VIP management but it’s a userspace implementation that has also been ported to various other Unix platforms. Unlike VRRP-based daemons it reaches the same level of security as CoroSync by authenticating messages via symmetric encryption.

What Next?

As stated above this post is just intended to start you down the road of VIP management with keepalived. The ambitious can try to extend my keepalived configuration above by implementing higher level health checks so that nodes fence themselves if they can sense something’s not quite right.

Overall, I would direct most people towards learning a different system such as CoroSync since that’s the more secure and flexible arrangement. Another potential choice would be to look into ucarp. Knowing how keepalived works is useful information to have since it does still have a large install base but I would refrain from performing any new installs with it.