Testing new HAProxy versions with some sort of A/B Testing

Recently, i have been thinking about better methods to bring new HAProxy versions into production without taking too much risk. We’re using HAProxy 1.5 in production and want to do a smooth migration to 1.6.
As you know, every update can be critical, even more on a critical component like a load balancer or if the update is a major version update.

We have staging/testing environments in place to test such updates, but you can never test all situations/variants and it’s not 100% identical to real production traffic. The plan was to shift only a small percentage of production traffic (10% as a starting point) to another server (with HAProxy 1.6 installed and configured with a duplicate of the current HAProxy 1.5 configuration).

If no problems occur while handling the 10% with the new version, the percentage should be increased stepwise. You can call it some sort of A/B Testing.

I’ve looked at the following solutions to accomplish the task:

  • Gor, duplicator, em-proxy, iptables TEE etc. (or other traffic duplication solutions): Only for cloning/mirroring the traffic, mostly too intrusive (deployed directly in the traffic path), too many caveats, needs strongly isolated test environments
  • BGP loadbalancing or another (weighted) route load balancing, (weighted) ECMP: Network gear with the respective features is needed, direct access to network layer (configuration) is needed, if using BGP you need direct BGP session(s) to the HAProxy nodes
  • (Weighted) DNS round robin: You have to use a DNS provider with this feature (AWS Route 53 WRR for example), DNS changes are not instantly
  • Iptables + NAT + statistic-module: “iptables as a loadbalancer”, we had many problems with this approach while testing, no persistence possible etc.
  • LVS: looked very promising (i wanted to use LVS-NAT, but this requires changes in network infrastructure (LVS server has to be the default gateway of the balanced nodes (or you have to fiddle with linux policy routing on the balanced nodes)
  • HAProxy: As traffic forwarder (on the existing production node or as dedicated server in front of the HAProxy nodes)

In our environment, using HAProxy itself was the best, simplest and most practicable solution (every other solution had it’s problems or wasn’t possible).

HAProxy config (you can even use one ”listen” block if you like) on the existing node:

frontend fe_preselect
 mode tcp
 bind PUBLIC_IP_REDACTED:PORT
 use_backend be_preselect

backend be_preselect
 mode    tcp
 balance source
 hash-type map-based
 #hash-type consistent

 server LOCAL_haproxy  127.0.0.1:4899 check send-proxy-v2 weight 90
 server REMOTE_haproxy 192.168.50.100:80 check send-proxy-v2 weight 10

frontend fe_http_in
 mode http
 # Coming from preselector frontend
 bind 127.0.0.1:4899 accept-proxy

New HAProxy node (1.6):

frontend fe_http_in
 mode http
 bind  192.168.50.100:80 accept-proxy

In the above configuration, a dedicated TCP frontend is used for forwarding traffic to the local HAProxy (1.5) and to a remote server with HAProxy 1.6. With the “weight” option we directing only 10% of the traffic to the new version. So we can minimize the risk and slowly moving more traffic to the server with the new version. With “balance source” we accomplish persistence (a client forwarded to the new server should remain on this server).

You can even use use this method for other products/software to accomplish a smooth migration, not only for HAProxy itself.

Leave a Reply