Usually, you can use VRRP + keepalived for making HAProxy redundant and providing a good service availability.
In day-to-day operations, there are many cases where you have to take down a whole HAProxy host. For example, server reboots (because of kernel/os updates), hardware maintenance, new version of os distribution or installing/upgrading new HAProxy versions.
For this types of planned maintenance, dns failover or other dns based solutions aren’t an option because you can’t control dns caching (ISP resolvers, browsers, etc.) on the client side.
If you want to do planned maintenance with keepalived, for example, on the MASTER of a keepalived/VRRP failover pair, the VIPs/services have to be failed over to the BACKUP node. Even if you set the lowest VRRP timers, you have a downtime of 3.6 seconds when keepalived fails over the VIP to the other node. Even worse, all current tcp sessions are terminated and your service is disrupted.
This problem can be solved at layer 3 and with the help of routing protocols. In the solution i’ve implemented, the service IPs (VIPs) are configured as /32 loopback ips on the HAProxy servers. Then, these ips are announced from multiple HAProxy nodes with Quagga (you can use BIRD as well) as OSPF routes to the upstream layer 3 devices. This technique is also called “Route Health Injection” and is used here as a sort of Anycast.
The upstream layer 3 devices than have multiple paths to the ips and can balance the traffic over all available paths (with ECMP for example) or can use a preferred path.
I would have preferred to use BGP as routing protocol for this use case, but this wasn’t available on the cisco network gear in this project. If you can use BGP, you can also use a lightweight solution (ExaBGP/GoBGP) to announce the ips (instead of Quagga/BIRD).
This setup implements an Active/Passive setup of the HAProxy boxes. The HAProxy nodes are multihomed (connected to two upstream layer 3 devices) for redundancy reasons. Which node is active is controlled by the different OSPF cost that is announced by Quagga. If you can use ECMP, you can use this setup as an Active/Active solution and for horizontal scaling (scale-out) of the HAProxy nodes. The general network setup is shown in the figure below:
The setup above uses a VLAN interface on each L3 switch (This is due to the existing network architecture). In other setups/network environments you can use a real point-to-point L3 connection for every link.
Configuration upstream layer 3 devices (Cisco)
sw01:
interface Vlan150
ip address 10.25.150.1 255.255.255.0
ip ospf hello-interval 5
ip ospf dead-interval 60
router ospf 10
router-id 10.25.150.1
log-adjacency-changes detail
passive-interface default
no passive-interface Vlan150
network 10.25.150.0 0.0.0.255 area 51
distribute-list prefix AllowedOSPFRoutes in
ip prefix-list AllowedOSPFRoutes seq 5 deny 0.0.0.0/0
ip prefix-list AllowedOSPFRoutes seq 6 deny 10.25.160.0/24
ip prefix-list AllowedOSPFRoutes seq 10 permit 0.0.0.0/0 le 32
sw02:
interface Vlan160
ip address 10.25.160.1 255.255.255.0
ip ospf hello-interval 5
ip ospf dead-interval 60
router ospf 10
router-id 10.25.160.1
log-adjacency-changes detail
passive-interface default
no passive-interface Vlan160
network 10.25.160.0 0.0.0.255 area 51
distribute-list prefix AllowedOSPFRoutes in
ip prefix-list AllowedOSPFRoutes seq 5 deny 0.0.0.0/0
ip prefix-list AllowedOSPFRoutes seq 10 deny 10.25.150.0/24
ip prefix-list AllowedOSPFRoutes seq 15 permit 0.0.0.0/0 le 32
Configure Loopback IPs on HAProxy nodes
haproxy01:
auto lo
iface lo inet loopback
up ip addr add 10.46.46.46/32 dev lo
# to sw01
auto eth0
iface eth0 inet static
address 10.25.150.10
netmask 255.255.255.0
# to sw02
auto eth1
iface eth1 inet static
address 10.25.160.10
netmask 255.255.255.0
haproxy02:
auto lo
iface lo inet loopback
up ip addr add 10.46.46.46/32 dev lo
# to sw01
auto eth0
iface eth0 inet static
address 10.25.150.11
netmask 255.255.255.0
# to sw02
auto eth1
iface eth1 inet static
address 10.25.160.11
netmask 255.255.255.0
Quagga configuration
haproxy01:
!
hostname haproxy01
log file /var/log/quagga/zebra.log
log file /var/log/quagga/ospfd.log
!
password PLEASECHANGEME
enable password PLEASECHANGEME
!
interface eth0
ip ospf dead-interval 60
ip ospf hello-interval 5
ip ospf priority 0
ipv6 nd suppress-ra
no link-detect
!
interface eth1
ip ospf dead-interval 60
ip ospf hello-interval 5
ip ospf priority 0
ipv6 nd suppress-ra
no link-detect
!
interface lo
ip ospf cost 300
ip ospf priority 0
no link-detect
!
router ospf
ospf router-id 1.1.1.15
log-adjacency-changes detail
passive-interface default
no passive-interface eth0
no passive-interface eth1
no passive-interface lo
network 10.25.150.0/24 area 0.0.0.51
network 10.25.160.0/24 area 0.0.0.51
network 10.46.46.46/32 area 0.0.0.51
!
line vty
exec-timeout 0 0
!
haproxy02:
!
hostname haproxy02
log file /var/log/quagga/zebra.log
log file /var/log/quagga/ospfd.log
!
password PLEASECHANGEME
enable password PLEASECHANGEME
!
interface eth0
ip ospf dead-interval 60
ip ospf hello-interval 5
ip ospf priority 0
ipv6 nd suppress-ra
no link-detect
!
interface eth1
ip ospf dead-interval 60
ip ospf hello-interval 5
ip ospf priority 0
ipv6 nd suppress-ra
no link-detect
!
interface lo
ip ospf cost 400
ip ospf priority 0
no link-detect
!
router ospf
ospf router-id 1.1.1.20
log-adjacency-changes detail
passive-interface default
no passive-interface eth0
no passive-interface eth1
no passive-interface lo
network 10.25.150.0/24 area 0.0.0.51
network 10.25.160.0/24 area 0.0.0.51
network 10.46.46.46/32 area 0.0.0.51
!
line vty
exec-timeout 0 0
!
Setting the default route on the HAProxy Boxes
Because the servers are multihomed, they can’t have only one default gateway. To distribute a default route to the HAProxy nodes you can use OSPF for example:
sw01:
router ospf 10
default-information originate always metric 15
sw02:
router ospf 10
default-information originate always metric 20
With these settings, the HAProxy nodes receive 2 default routes and install the one with the lower metric.
A disadvantage of this solution is that it can lead to asymmetric routing (the return traffic is send out over an interface differing to the incoming interface). This can be difficult to debug and can cause problems when state is involved/needed (firewalls for example).Therefore, i’ve used a static configuration without OSPF to manage the default routes. This included the usage of Linux Policy Routing to send the return traffic through the interface which the traffic was coming in.
Configure “rp_filter” (also needed if the ip is only configured on a loopback device)
haproxy01 / haproxy02:
for i in /proc/sys/net/ipv4/conf/*/rp_filter; do echo 2 > $i; done
Configure iptables connection/packet marking based on the incoming interface
haproxy01 / haproxy02:
iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark
iptables -t mangle -A OUTPUT -j CONNMARK --restore-mark
iptables -t mangle -A INPUT -i eth0 -j MARK --set-mark 1
iptables -t mangle -A INPUT -i eth1 -j MARK --set-mark 2
iptables -t mangle -A INPUT -j CONNMARK --save-mark
Configure linux policy routing (enabling “ip_forward” is not required in this setup and not desired)
haproxy01 / haproxy02:
echo 100 vl150 >> /etc/iproute2/rt_tables
echo 200 vl160 >> /etc/iproute2/rt_tables
ip rule add prio 100 from all fwmark 1 lookup vl150
ip rule add prio 110 from all fwmark 2 lookup vl160
ip route add table vl150 default via 10.25.150.1 dev eth0 metric 100
ip route add table vl160 default via 10.25.160.1 dev eth1 metric 100
Verification of the setup
After all configuration steps are completed, verify if the routes are injected properly:
sw01# sh ip route ospf
10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
O 10.46.46.46/32 [110/301] via 10.25.150.10, 01:53:46, Vlan150
sw02# sh ip route ospf
10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
O 10.46.46.46/32 [110/301] via 10.25.160.10, 00:04:55, Vlan160
Service Healthcheck
If you use route health injection, you also have to monitor the HAProxy process itself.
If the haproxy process fails, the routes to this haproxy node have to be removed from OSPF, because otherwise traffic is send to this node even if haproxy isn’t running. If this occurs, you are “blackholing” the traffic.
If you use BGP, there are many solutions with integrated healthcheck capabilities, ExaBGP/GoBGP for example. If you use BIRD + BGP/OSPF, take a look at anycast-healthchecker.
I haven’t found something similar for Quagga, so i created something simple with monit. The following is a simple example which stops Quagga when the HAProxy process fails. When Quagga is stopped, the OSPF neighbours detect this and removes the routes (after a failure/timeout threshold).
‘monit’ configuration on haproxy01 / haproxy02
cat /etc/monit/conf.d/haproxy-quagga:
check process haproxy with pidfile /var/run/haproxy.pid
if does not exist for 5 cycles then exec "/etc/init.d/quagga stop" else if succeeded for 6 cycles then exec "/etc/init.d/quagga start"
Stopping the Quagga process is only one possibility, there are many more to increase granularity:
- only set higher ospf cost instead of shutting down the quagga process
- only remove specific routes for specific ips
- remove routes when HAProxy has no backends available in a pool
How to do zero downtime maintenance
Simply go to the active node (haproxy01 in our case) and set the ospf cost to be higher than on haproxy02:
root@haproxy01:~# vtysh -c 'conf t' -c 'int lo' -c 'ip ospf cost 500'
After executing the command above, wait a few seconds till the upstream devices have changed their preferred path for the service IPs to haproxy02. When the upstream devices change the paths, not a single packet is lost.
Before switchover:
sw01# sh ip route ospf
10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
O 10.46.46.46/32 [110/301] via 10.25.150.10, 3w2d, Vlan150
After switchover:
sw01# sh ip route ospf
10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
O 10.46.46.46/32 [110/401] via 10.25.150.11, 00:00:06, Vlan150
Hello,
Nice article. Good to see more and more devops embrassing L3 🙂
ExaBGP comes with an ‘healthcheck’ program which can check if an application is up and withdraw announcement(s) if it goes down (for example pulling the haproxy admin page).
Vincent Bernat explains how it can be achieved in his blog:
https://vincent.bernat.im/en/blog/2013-exabgp-highavailability.html