Load Balancing For High Availability With haproxy

Load balancing an application requires some forethought. Some applications are simple and can handle their load on their own, taking traffic directly from the users and if they go down from time-to-time then oh well, just bring it back soon, I guess.

Other applications need to scale more dynamically with higher availability requirements. Automatically scaling the application might include being able to spread the load acsross a static number of nodes or some solution that allows us to take nodes in and out as needed (often programmatically).

As you can imagine, the solution will increase in complexity in a manner that’s directly proportionally to the performance requirements of the application. For this reason it’s always helpful to have an understanding of how it all fits together and judge for yourself what your availability requirements are. In this article I’m going to try my best to cover the main areas you’re likely to care about. You may care about some more than others and for that reason I would ask that you be judicious in which sections you choose to read.

Comparison with nginx Load Balancing

Due to both having a “single process event-driven” model of operation, if you have experience with nginx load balancing then you may be wondering how haproxy compares and when you would use one or the other.

Features that haproxy has that nginx does not:

  • Non-HTTP Health checks are easier that with nginx in that they often come pre-made as opposed to nginx which often forces you to write your own match directive.
  • Fewer instances where you run into paywalled features. Many of the more advanced features in nginx are incredibly useful especially for enterprise users but you’ll often run into Plus-only features seemingly at random.

Features that nginx has that haproxy does not:

  • Full HTTP web server implementation. haproxy can only route HTTP traffic. It lacks support for things such as FastCGI or WSGI for communicating with web applications.
  • haproxy does not support caching responses. This means if your dynamic content changes rarely then these are always retrieved from the backend for each request over and over.
  • As a consequence of the above, using nginx for both load balancers and application servers makes it easier to support your overall architecture by re-using knowledge and experience from one for the other.
  • Load balancing for UDP protocols such as syslog or DNS.
  • In my opinion, development of the core nginx product (both Plus and OSS) seems to be proceeding at a pace that exceeds haproxy’s development.

My general rule of thumb would be to use nginx load balancing if at all possible unless you legitimately need advanced load balancing features and can’t afford the Plus version of nginx. Ultimately, given the options available nginx load balancing is just more comprehensive and personally I find its configuration much more intuitive.

How haproxy Works

Before starting anything with haproxy, it’s important to understand roughly how haproxy configuration is structured. It’s not the most intuitive system so without a reference point, a lot of this can be hard to wrap your head around initially.

Configuration Structure

Like nginx, haproxy configuration is primarily directive/keyword based with whitespace being largely optional with the exception of the newline character (which marks the end of a directive definition). By convention though, sections are usually separated by an extra new line and non-section keywords are indented so that it’s easy to see which section they belong to. Sections aren’t explicitly ended and instead section keywords implicitly indicate that the preceding section has ended. Keyword order is preserved though in certain cases (due to how the keywords function) order won’t matter.

Let’s examine a few of the more fundamental keywords:

  • defaults: marks the beginning the section for default values for various parameters. Some parameters can’t be placed in a defaults section while others can. You can have multiple defaults sections but only the last has any effect (i.e the effect isn’t cumulative).
  • listen: marks the begin of a section that defines both the frontend connection, the backend server, and any relevant processing that must be done on the load balancer.
  • option configures a parameter for the given mode that you’re operating within. Many options can be negated by prefixing the option keyword with “no” for instance no option checkcache disables the checkcache option underneath http mode. .
  • acl: defines a particular Access Control List Entry. Despite it’s name, the actual function of the acl keyword is to match text from the request or response and set the ACL’s name equal to either a boolean true or boolean false. This boolean value can then be used in conditional logic for subsequent keywords.
  • http-request: performs operations either on or with the HTTP request. Once haproxy is done processing its request all data related to the request is no longer available.
  • http-response: performs operations after the response from the backend server has began coming back (and therefore by necessity after http-request above).

Knowing which keywords can be used in which sections can only be determined by checking the documentation.

If the effect you’re going for requires extracting data from a request or response then haproxy provides mechanisms known as fetch methods. Some of the more fundamental fetch methods would be:

  • req.hdr / res.hdr extracts the given header from the HTTP request or response (respectively). If a particular header contains one or more commas this is (per RFC) interpreted as multiple values for the given header being given on the same line. There’s an optional second argument for which occurrence you want to return. Example Usage:
    • req.hdr(Host) to extract the Host header sent by the client.
    • req.hdr(X-Custom,2) returns the second value for the X-Custom header in the HTTP request.
  • req.fhdr / req.fhdr are identical to the above methods except commas are not treated specially. As mentioned before, this deviates from RFC but is required for things such as User-Agent which will always contain a comma for historical reasons and should be considerd only one value. This also takes which occurrence you’re interested in as an optional second argument but will only consider fully new lines that start with the same header name as a second occurrence.
  • req.cook / res.cook extracts the given cookie from the HTTP request or response (respectively).
    • Example Usage: req.cook(visitingUser) returns any cookie in the request called visitingUser.
  • path returns the path in the URL.
    • For the URL http://example.com/topDir/myPage.php the path fetcher will return /topDir/myPage.php
  • method returns the HTTP verb (POST, GET, PUT, etc) used in the request.
  • status returns the status code in the HTTP response generated by the backend.
  • var returns the variable given as an argument. For example var(txn.my_var) will yield the contents of the txn.my_var variable.
  • url_param returns the value of the given URL parameter.
    • Example Usage: for the GET /?myVar=block&otherVar=fit HTTP request the url_param(otherVar) will return the string fit.

Once you’ve fetched the data, you can then run converters on-the-fly to transform the data into something else by immediately after the fetcher (no space) adding a comma and a pre-defined converter name. This operates in a manner similar to command line pipes where the output of one operation (the fetcher) is used as input for another operation (the converter). Some fundamental converters would be:

  • map(map_file) opens the whitespace delimited file specified by map_file (at process startup) and locates the string to ultimately return by taking the fetcher’s output and locating the first line with a match for that string in its first field and then returning the string specified in the second field.
    • If a given fetcher returns someData and map_file contains the line someData returnMe then the string the fetcher+converter would return is returnMe
  •  ipmask(netmask) takes an IP address as input and returns the network address that IP address belongs to for the given netmask.
    • If the source IP address of a request is 192.168.34.22 then a fetcher+converter of src,ipmask(255.255.255.0) would yield 192.168.34.0. This is useful for later comparison by being able to summarize IP’s by the network rather than individually.
  • lower/upper converts the string input from the fetcher to either lower or upper case respectively.
  • regsub(matchingRegex,substituteText[, flag]) executes regular expression substitution on the input string. The arguments work similar to the sed command on Unix/Linux. The optional third parameter flag is the same as well for example a flag of g causes the regex to match all occurrences and not just the first occurrence. Some examples:
    • req.hdr(X-Header),regsub(\s,,g) strips all whitespace from the X-Header request header. haproxy has special processing for \s that prevents the configuration parser from non-intuitive interpretation of the backslash.
    • req.hdr(X-Header),regsub(\\x2C,,g) strips all commas from the X-Header request header. This is normally problematic as literal commas in your regular expression would cause a syntax error. By matching the ASCII code for a comma (via hex) we sidestep this issue entirely. You’ll notice that hex codes need to be double backslashed otherwise the haproxy will interpret as you escaping the x (unnecessary but syntactically correct) followed by literal x2C characters which is what it sends to the PCRE library.

For fetch methods that take arguments, it’s important to not have any spaces inbetween each argument. The haproxy config processor is notoriously finicky and will fail randomly with vague errors if you do. The above is just a short list of the methods I’ve found most useful. Please consult the documentation for the full low down on all the fetch methods haproxy supports.

As mentioned above haproxy configuration also has the notion of scoped variables. The names can be any alphanumeric character or underscore but can not begin with a digit. Additionally, all variables have to fall within one of five pre-defined scopes that govern how long the variable will live:

  • proc for variables that need to last for the length of a particular haproxy process.
  • txn for variables specific to a particular transaction (request/response pair) that can be deleted once the transaction is over.
  • sess for variables that only need to exist for the length of a particular session (series of transactions).
  • req for variables that only need to exist during the request processing portion of a transaction. For each transaction, variables scoped here will be garbage collected before any http-response directives are evaluated.
  • res for variables that only need to exist during the response formation portion of a transaction. For each transaction, variables scoped here won’t exist when any http-request directives are evaluated.

As an example of variable usage, variables are often useful when taking data available within haproxy’s HTTP request processing stage and utilizing them within the HTTP response stage by storing the data in a variable with txn scope. Take this snippet for example:

http-request set-var(txn.request_method) method
http-response set-header X-Request-Method %[var(txn.request_method)]

The above takes the output of the method fetcher (mentioned earlier) and saves it to a variable called request_method in the txn variable scope which will survive the entire transaction. You’ll notice the introduction of the %[] construct. This construct will execute a fetch method and return the data as a string.

OK, phew. Now that all that’s out of the way and we’re a bit more familiar with how haproxy configuration works, let’s actually start doing something useful.

Basic HTTP Load Balancing

Let’s start off simple with a regular round-robin load balance. With haproxy you need to define both a backend instance and a frontend instance that ties itself to a particular backend. A simple haproxy.cfg configuration might simply be:

listen http-default
  bind *:80
  mode http
  server node01 node01:80
  server node02 node02:80

The above is a fully functional round robin load balance. Breaking it down:

  • We create a new listen section called http-default
  • We bind to port 80 on all available IP addresses (the *)
  • We turn http mode on for this load balance to give haproxy application-layer visibility. The default mode of tcp would also work but lacks the ability to do anything intelligent with HTTP..
  • We then specify two backend servers with the server directive. The first argument is the name haproxy will use for the backend service (more on where that shows up later) and the second the hostname and TCP port for connecting to the backend service.

Considerations for HTTP Headers and Logging

Introducing a load balancer fixes some problems but it also introduces others. One such problem is the backend server’s lack of visibility on the client connection due to said connection now being terminated on the load balancer instead of the application server.

Two of the most important consequences of this are:

  • If your frontend terminates an HTTPS connection but your backend connection is HTTP, the application server won’t be aware of that.
  • The backend application server itself can now no longer see the IP address of each remote client connecting as now each application server’s TCP connection is with the load balancer instead. This complicates log creation (if you do that on the application server) and prevents the application server from targeting particular IP address.

To work around these issues we can also introduce some custom HTTP headers. In the case of the lost client IP this means setting a X-Forwarded-For request header (you can name it anything but that’s the standard name) on the load balancer before it’s relayed to the backend application server. You may additionally need to also create a X-Real-IP (a still common name) header of the same value, depending on your application’s requirements.

For identifying the client’s connection protocol, the standard practice is to set the X-Forwarded-Proto header on the load balancer and the backend server can take whatever steps are required to do the right thing on the backend server. One example would be adding logic to the wp-config.php for a WordPress website or settings.php for Drupal which will allow it to construct links back to itself that contain the proper application protocol for clients to use.

Modified with our new criteria, our haproxy.cfg configuration file should look like this:

listen http-default
  bind *:80
  mode http

  option forwardfor
  reqadd X-Forwarded-Proto:\ http

  server node02 node02:80
  server node03 node03:80

This looks pretty much the same as the first one except, we’ve added two new keywords to this load balance:

  • option forwardfor modifies the HTTP handling (invoked by our use of mode http) such that a new header named X-Forwarded-For is added the the request before being handed off to the backend. This header will be set to the visiting user’s IP address.
  • reqadd is used to also add the X-Forwarded-Proto header to the request before it’s sent to the backend. You’ll note that we had to escape the space between the colon and the header value. This would be the case for any tabs or spaces in the header value.

Access Control Lists

Don’t let the name “access control lists” fool you. With haproxy, ACL’s are used for evaluating any sort of conditional logic by reducing all decisions into TRUE/FALSE results that can then be embedded into other directives as arguments in order to modify their behavior. Most directives support this conditional execution via if which only evaluates the given directive if the ACL result was TRUE and unless which only evaluates the directive if the result was FALSE. 

The general format for an ACL entry is:

acl <aclName> <fetch method>,<optional converter> <pattern>

More complicated ACL’s will take advantage of more complicated aspects of acl syntax but starting out that’s a good mental model to start with. Each component is fairly obvious: You name the ACL, you specify a fetch method optionally running it through a comma-separated list of converters and then finally you have the pattern to match the output of that fetch method + optional converter coupling.

Let’s illustrate this idea with an ACL inside of a listen block:

listen testsite
  mode http
  bind 0.0.0.0:80

  acl urlBegin path_beg /index.html
  http-response set-header X-Test-Header Index\ Page if urlBegin
  http-response set-header X-Test-Header Some\ Other\ Page unless urlBegin

  server node01 node01:80
  server node02 node02:80
  server node03 node03:80

In the above, we’ve added two new keywords to our repertoire:

  • acl marks the beginning of an access control entry. Here we create a new ACL called urlBegin and use the path_beg fetch method (a ACL-only version of the path method mentioned above) to match the beginning of the request URL against the static string /index.html
  • We use http-response to interact with haproxy’s HTTP response back to the client.
    • Specifically we’re using set-header to add a  X-Test-Header  response header.
    • At the end of each http-response directive we add conditional logic in the form of an if  on the first one and an unless (the inverse) on the second one so that X-Test-Header will return different values based on the result of the urlBegin ACL.

The above example probably isn’t useful on its own but it shows you the MO with acl directives. You use the matching/extraction capabilities of acl to establish the truth or falsehood of a particular fact and embed the conditional expression in the relevant keyword’s argument list.

Let’s create a new ACL for setting the same header based on the requesting user’s subnet:

listen testsite
  mode http
  bind 0.0.0.0:80

  acl userNetwork src,ipmask(255.255.255.0) 192.168.122.0
  http-response set-header X-Test-Header Local\ User if userNetwork
  http-response set-header X-Test-Header Internet\ User unless userNetwork

  server node01 node01:80
  server node02 node02:80
  server node03 node03:80

You’ll notice it’s much the same syntax as before, except our ACL has changed by using some of the fetch methods we’ve described before. Of course http-response isn’t the only directive you can use (most directives will support if and unless conditions) but I kept the two examples as similar as possible to illustrate the structure and how to bring this all together.

As this article progresses, you’ll see more advanced and meaningful use of access control lists but the above should give you a solid idea of how to get everything done.

More information about haproxy Access Control Lists…

Routing Traffic

Weighted Preference and Backup Nodes

OK, so now that we know how haproxy is structured and how to do a basic load balance, let’s push it a bit further. Let’s say you knew particular backends were more powerful than others. For instance, you may have two backend servers, one on a new machine with two sockets and four cores each, meanwhile the other is a single socket dual core. Obviously, if that’s all you have then you have to work with then that’s all that you have.

That presents a problem though. In our current configuration we’re redirecting traffic equally to each node even though we already know one is usually going to be the better choice. Ideally, you’d want the dual socket machine to handle most requests, with the single socket machine just easing the pressure off its big brother or potentially take over the workload entirely if the higher capacity backend needs to be restarted.

To get there, we need to begin shaping the routing decisions by introducing the concept of weight and backup nodes. These both function pretty intuitively, let’s look at an example listen block:

listen testsite
  mode http
  bind 0.0.0.0:80

  server node01 node01:80
  server node02 node02:80 weight 2
  server node03 node03:80 backup

You’ll notice a few things here:

  • The backend server node01 is as vanilla as it gets. All it specifies is the backend host and port. Since weight isn’t specified it’s assigned a weight of 1 by default.
  • Our node02 backend however now has a weight of 2 assigned. Since node01‘s weight is one, this means that the ratio of requests served by node02 to those serviced by node01 is 2:1 making node02 service twice the load given to node01.
  • Our final node, is similarly not given a weight but is marked as backup meaning it won’t participate in the load balance unless both the nodes above have failed their health check (more on that later). This configuration is referred to as active-passive since the backup node is just sitting there passively doing nothing. This enables you to economize by having the same backup system for different application servers.

Sticky Sessions

OK so now we can route traffic with pre-determined preference and establish an active-passive configuration for our backend servers by specifying one node as the passive node. Sometimes though you need to continuously route the same user to the same backend to preserve user session data (for applications that can’t or don’t support sharing user sessions). To do this we need to make user sessions temporarily “stick” to a particular application server.

There are two generally used means of doing this:

  • Using the source IP address to route the traffic to the backend node. This has the benefit of being protocol agnostic however for mobile clients (such as WiFi or cellular) their IP address may conceivably change during the life of the session which will obviously break if the user goes to the next building over and tries to resume what they were doing before.
  • Using an HTTP cookie to tie the user to a backend service. This has the benefit of staying with the user no matter the change in their networking situation, but obviously is limited to the HTTP protocol and even then only clients that will accept the cookies that you try to set.

Let’s look at each option in detail.

Balance Source

Let’s look at a simple example listen block:

listen testsite
  mode http
  bind 0.0.0.0:80
  balance source

  server node01 node01:80
  server node02 node02:80
  server node03 node03:80

You’ll notice we’ve only added a single directive here and it’s balance source. This is pretty straight forward. It changes the load balancing algorithm from a round robin to one where an in-memory hash table is creating mapping a hash of IP addresses and the backend servers they’re associated with. When a request from a new client comes in the normal round-robin logic applies (meaning fudging weight should still produce the same workload distribution).

Introducing Stick Tables

The above approach to IP-based routing has an issue though. While it’s definitely the easiest way to get IP-based load balancing, due to the internal implementation if anything about the load balancing pool changes (availability, weight, algorithm, etc) then all clients are assigned to new servers. Granted this is ideally a rare occurrence and technically results in no down time but your users may be annoyed by the lost session. We can take more granular control over the load balancing process by using haproxy’s stick tables.

Put simply, stick tables are in-memory databases that store client information. If you’ve configured peer groups (mentioned later) the stick tables can even be shared amongst them (allowing stickiness to persist even if you lose a load balancer). Stickiness isn’t their only use though and that will be expanded upon in later sections.

So let’s look at an example configuration:

listen testsite
  mode http
  bind 0.0.0.0:80

  stick-table type ip size 1m expire 60m
  stick on src

  server web01 web01:80 check
  server web02 web02:80 check
  server web03 web03:80 check

We’ve only introduced two new keywords here:

  • stick-table which creates and defines the in-memory database.
    • If you don’t give it a table name, it will default to naming the stick table after the listen/backend section name. In our case above, it will default to testsite.
    • We specify a table type of ip to signify that the key we’ll be searching for on each request will be an IP address. There are many types you may set your table to (string, binary, integer, etc) but using the ip type specifically (instead of say string) allows haproxy to make optimizations and search the table more quickly.
    • The table will store a maximum of 1,048,576 entries via size 1m
      • Confusingly they use size suffixes to denote maximum entries. For example, size 4g would store a maximum of 4,294,967,296 entries and would not indicate a table of 4GB in size.
    • Each entry will expire after an hour (60m)
  • Finally, we instruct haproxy to look at the value in src (fetch method mentioned earlier) for the key associated with this request.
    • Other fetch methods are possible. For example, stick on req.hdr(X-Backend) would instruct haproxy to use the contents of the X-Backend request header as the key (of course the table type would need to be changed as appropriate as well).

So when do you use balance source and when do you use stick tables for IP-based stickiness? A good rule of thumb would be: if you’re just looking for simple persistence on a low priority application, then balance source yields fewer lines using code the purpose of which is obvious. If your application is of any meaningful importance though, use stick tables and comment your code.

HTTP Cookie Persistence

But what about the other method, using an HTTP cookie? Let’s take a look at a listen block for that:

listen testsite
  mode http
  bind 0.0.0.0:80
  cookie backendServer insert

  server node01 node01:80 cookie s1
  server node02 node02:80 cookie s2
  server node03 node03:80 cookie s3

We’ve got two changes here:

  • A new cookie directive, which causes haproxy to check for the presence of an HTTP cookie of name backendServer.
    • This cookie will contain a simple text value for matching the corresponding server below.
    • On its own this isn’t enough though. To support applications managing the persistence cookie (for example sending them to a particular server to perform an operation) by default all cookie does is obey the cookie if it’s there. To get it to add a cookie when it’s not present we’ve added the insert argument.
    • You probably also want to use the nocache argument so that when the cookie must be set, downstream caching servers are informed (via HTTP headers) that the response should not be cached (as each client should have its own cookie value).
  • In our server directives we’ve added another set of arguments called cookie which establishes for each webserver what cookie value corresponds to this particular server. For example if backendServer contains s2 then the user will always be served by web02 meanwhile users with a cookie value of s3 will be served by web03 then cookie values are arbitrary and need only be unique within this particular load balance.

“Least Connections” vs “Round Robin”

OK so let’s imagine our web application was truly clustered. The application server doesn’t matter since each one should be able to service each application request equally well. However all requests aren’t the same. Some requests are just quick GET requests for things like CSS or Javascript whereas others kick off long CPU or memory intensive jobs. If we were to spread the load around based on the number of incoming requests we run the risk of overloading a particular server if it happens to get several resource intensive requests in a row.

In an effort to spread the load around a little more evenly (rather than just the requests) we can switch to using “least connections” as the load balancing metric. This is done with the idea that most web requests are synchronous in nature therefore while not a 1:1 relationship resource intensive-tasks will tend to have requests that hold the connection open long and thus the connection count is a good metric to select against when picking a backend server.

This is as simple as overriding the default roundrobin algorithm with leastconn using the balance directive:

listen testsite
  mode http
  bind 0.0.0.0:80
  balance leastconn

  server node01 node01:80
  server node02 node02:80
  server node03 node03:80

You’ll notice that we’re using the same directive as when we using IP-based persistence (with balance source). Due to this haproxy will treat leastconn and source as mutually exclusive propositions with source using a round robin algorithm when first selecting the user’s backend server. This makes sense when you think about it though as the current number of open connections is transient and establishing long term persistence based on transient metrics could lead to cluster instability later on when congestion on a particular server clears up.

Health Checks

Finally, when it comes to routing traffic you need to make sure your backends are successfully accomplish the task. By default haproxy will continue attempting to deliver a request to a particular web server until three attempts have failed at which point it will mark the server as “down” and stop trying to deliver web traffic. After being marked “down” the haproxy will attempt to connect to the web server every two seconds and two consecutive successes will cause the server to be marked as ready again.

By default, no server is actively checked and instead a “failure” will be an attempt to proxy a request to the given server. This isn’t usually ideal since obviously it would be helpful if the load balancer proactively found out an backend server wasn’t there anymore rather than stalling an actual client request until haproxy has to pick a different server. To do a simple TCP check on each server you need only add a check argument to each backend server definition. A simple example of this would be:

listen testsite
  mode http
  bind 0.0.0.0:80

  server node01 node01:80 check
  server node02 node02:80 check
  server node03 node03:80 check

Which will attempt to connect over TCP every two seconds. However it’s usually a good idea to test application-level availability. Let’s say you had a request that should always succeed (let’s say GET /rss.xml) and it can be ran many many times with no deleterious effect on the backend application. We can use the option httpchk directive in our listen block to instruct haproxy to issue an HTTP-level request which by default will be OPTIONS / (which checks for HTTP availability without triggering any actions within the application.

To implement our desired health check though we might create a listen block similar to this:

listen testsite
  mode http
  bind 0.0.0.0:80
  option httpchk GET /rss.xml

  server node01 node01:80 check rise 2 fall 2 inter 5000
  server node02 node02:80 check rise 2 fall 2 inter 5000
  server node03 node03:80 check rise 2 fall 2 inter 5000

Let’s break down the changes here:

  • The option httpchk GET /rss.xml directive not only activates HTTP-level health checks (httpchk) but goes on to specify the HTTP request method and URI to use in the health check. Since we’re using the application’s ability to generate an RSS feed to determine liveness we’ll just have haproxy periodically download a copy.
  • We’ve added some new arguments to the server directive for each backend
    • check enables health checks for this server
    • rise 2 establishes that two consecutive successful downloads of the RSS feed will result in the server being marked as alive (this is the default but made explicit here).
    • fall 2 establishes that two consecutive failures will cause the server to be marked as down.
    • inter 5000 modifies the health check interval from the default of 2000ms (2 seconds) to 5000ms (5 seconds) to account for possible generation time of the RSS feed.

A Brief Note on External Health Checks

So the above will actually work for most people but occasionally you may need to execute fairly arbitrary instructions are part of a health check. These are referred to as “external checks.”

Essentially external checks work by haproxy periodically invoking (as whatever user `haproxy is running as) a particular executable instead of using internal logic. This executable will be passed environmental variables such as $HAPROXY_SERVER_NAME and $HAPROXY_SERVER_PORT and then the return status will be used to determine success or failure.

To enable external checks a few things need to happen:

global
  external-check

listen testsite
  mode http
  bind 0.0.0.0:80
  option external-check
  external-check path "/usr/bin:/bin"
  external-check command /tmp/test.sh

  server web01 web01:80 check
  server web02 web02:80 check
  server web03 web03:80 check

Let’s break this one down now:

  • We’ve added an option to the global section called external-check which enables all backend and listen sections to use external checks. Supposedly this is a security feature.
  • Inside our listen block we’ve added three health-check related directives
    • option external-check enables this section to implement an external check
    • external-check path "/usr/bin:/bin" sets the path for the executable, otherwise $PATH will be empty when the executable runs.
    • external-check command sets the actual command to run, in this case a script.

Since external checks can get involved I won’t write a fully functional version here but I provide this an an explanation of what they are.

Redirection

I’ve explained basic application routing in nginx before, but since there isn’t enough “application routing” content relating to haproxy, I’ll include it here.

The keyword for performing any sort of HTTP redirect is simply redirect and in it’s most basic form looks something like this:

listen testsite
  mode http
  bind 0.0.0.0:80

  acl redirect_page path_beg /redirect
  redirect location https://google.com if redirect_page

  server web01 web01:80 check
  server web02 web02:80 check
  server web03 web03:80 check

Breaking the new stuff down:

  • We establish an ACL for matching the current request against a URI of /redirect
  • We then use redirect location
    • For this direct we directly give haproxy the value it needs to put into the Location: header.
    • We make this directive contingent upon the previous redirect_page ACL matching.
    • Optionally we could have specified a specific HTTP return  code using the code argument after the URL. For example:
      • redirect location https://google.com code 301 if redirect_page

Let’s assume you want to be a little bit more terse or avoid putting hardcoded values into your URL if at all possible. The redirect keyword supports two other options:

  • prefix for modifying everything before the URI (for example the http://example.com in http://example.com/myPage.php)
  • scheme for modifying only the scheme (i.e protocol) portion of the URL.

A demonstrative example using all the above forms of redirect might look something like this:

listen testsite
  mode http
  bind 0.0.0.0:80

  acl ssl_page ssl_fc 0
  acl redirect_page path_beg /redirect
  acl short_url path_beg /short

  redirect scheme https code 301 unless ssl_page
  redirect location https://google.com/search code 302 if redirect_page 
  redirect prefix http://example.com/evenLongerURL if short_url

  server web01 web01:80 check
  server web02 web02:80 check
  server web03 web03:80 check

Explaining each of the redirect directives:

  • The first URL updates the scheme in the URL to https:// unless the user is already accessing the site over SSL. This means http://example.com/myPage.php will turn into https://example.com/myPage.php. Additionally, we set an HTTP status code of 301 so that this redirect is made permanent by the client. This causes search engines to only index the HTTPS version of the site.
  • The second one is the same redirect from before except this time we’re setting the HTTP status code to 302 so that browsers and search engines don’t permanently store this redirection. This is default but we’ve made it explicit here.
  • The third redirect changes the prefix of the URL to the one given above. For example, requests to https://example.com/short will be automatically redirected to https://example.com/evenLongerURL/short due to haproxy preservice the URI (the /short in the original URL) and prefixing it with the one given in this directive.

SSL Termination

Since nginx has plenty of guides on terminating SSL, I didn’t cover that in it’s load balancer article. Similar to the redirect section above haproxy will never have SSL covered outside of load balancing and therefore I’ve decided to cover it here.

Basic no-frills SSL termination:

listen testsite
  mode http
  bind 0.0.0.0:443 ssl crt /etc/haproxy/server.pem

  server node01 node01:80
  server node02 node02:80
  server node03 node03:80

The only relevant changes to this from the “Basic Load Balancing” example are on the bind line. The ssl argument operates by itself (i.e no value) and activates the SSL engine. The crt argument points to the file containing both the certificate and the private key. In general order doesn’t matter but if there are intermediate certificates the order should be: Intermediate certificate (starting from root and going to the actual signing CA) the the server certificate and then the private key.

If you want to fine tune the ciphers used for SSL or disable particular SSL versions you would use the ssl-default-bind-ciphers and ssl-default-bind-options directives respectively (both located in the globals section). For example:

ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
ssl-default-bind-options ssl-min-ver TLSv1.0

The above enables cipher suites such as AES256+Diffie-Hellman while also disabling cipher suties that use algorithms such as MD5 or DSS. The second line explicitly enforces a minimum protocol version of TLS v1.0.

Please keep client compatibility in mind when tuning these parameters. Better security is obviously better but it’s also important that your load balancer be able to negotiate some kind of connection, otherwise user experience will be impacted (to the point of being non-existent).

Load Balancing Non-HTTP Traffic

Of course, it’s not only web sites that need load balancing. Many other services can cause customer-facing outages should they fail and thus would benefit from attempts to maximize availability. The following is only a partial list to give you an idea of the “haproxy approach” to balancing non-HTTP TCP traffic. It’s by no means exhaustive and only represents information I wish I had when I began my load balancing adventure.

MySQL Cluster Access

In principle, load balancing MySQL/MariaDB isn’t much different than HTTP. Let’s look at a basic load balance:

listen testsite
  mode tcp
  bind 0.0.0.0:3306

  server web01 web01:3306 check
  server web02 web02:3306 check
  server web03 web03:3306 check

You’ll recognize almost all of that, all that’s really changed from our HTTP example is that we now have to switch over to tcp mode and the port number on the bind and backend sockets. There’s a problem here though and it’s with the health check.

Since tcp mode will continually test the backend’s availability by opening and closing a TCP connection (every two seconds by default) without actually sending anything over the socket many times MySQL will interpret this as a connection error. This results in situations where the load balancer itself becomes blacklisted and you start getting errors such as:

joel@lb01:~$ mysql -h lb01 -u rascaldev
ERROR 1129 (HY000): Host '192.168.122.1' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'

Ugly. Well we can get around this by authenticating to MySQL and thus testing both application-level availability as well as getting around the “connection error” issue above. Let’s modify our load balance to suit:

listen testsite
  mode tcp
  bind 0.0.0.0:3306
  option mysql-check user haproxy

  server web01 web01:3306 check
  server web02 web02:3306 check
  server web03 web03:3306 check

This configuration supposes you’ve created a passwordless user that haproxy can use to connect as. For security reasons, you’ll probably want to give this user access to nothing and use the HBAC portion of their username to restrict logins to IP addresses associated with the load balancer (for example CREATE USER 'haproxy'@'192.168.122.%;)

Once restarted your MySQL nodes should show haproxy connecting as the given user and then immediately quitting gracefully. For example from the general_log in MySQL:

211 Connect haproxy@192.168.122.11 as anonymous on 
211 Quit

Obviously, you may want more complicated checks that either access privileged functions (thus necessitating complex queries and passwords/SSL) but to implement that you can use the same external-check functionality mentioned above.

Redis Cluster Access

Alright, so let’s go with something haproxy doesn’t have a native check for. Let’s assume you have a Redis cluster that you want to provide load balanced access to. A decent bare bones configuration might look like this:

listen testsite
  mode tcp
  bind 0.0.0.0:6379
  option tcp-check
  tcp-check send PING\r\n
  tcp-check expect string +PONG
  tcp-check send QUIT\r\n

  server web01 web01:6379 check
  server web02 web02:6379 check
  server web03 web03:6379 check

You’ll notice we added two things:

  • An option tcp-check directive for using the health check defined by subsequent tcp-check keywords to be used in lieu of a simple TCP connection test.
  • We have three instances of tcp-check:
    • First we send a string consisting of PING followed by a carriage return and new line.
    • Then we instruct haproxy to expect the backend to reply with a +PONG (in the redis protocol lines that start with + indicate the operation was a success while - indicates failure). Since the expect will pass if the given string is anywhere in the response we don’t need to include the \r\n will indeed send in its own response.
    • Finally we gracefully close the connect by send‘ing a QUIT command.

As you can see, you can emulate many different simple conversations for text-based protocols. The tcp-check keyword also supports binary protocols. To test a binary protocol’s functionality you can replace send with send-binary followed by backslashed hex codes representing the binary data and when receiving replace expect string with expect binary also with hex codes as arguments.

Gatekeeping

As mentioned in the nginx article, having a load balancer out front also presents you with an opportunity to protect the web app from the evil, dark, and hostile forces of the internet. It also provides you a common point to modify routing and to enhance performance through caching and rate limiting.

Caching

haproxy merely strives to be a load balancer. It leaves caching (full page or section-based) to other applications such as the backend application server or some sort of Varnish frontend. Given that goal, in general haproxy only caches two things:

  • Session information (such as SSL state)
  • Small content objects such as favicons.

Implementing a content cache isn’t too difficult (just define the cache then tie your backend or listen section’s http-response and http-request to it). I won’t show it here though since it’s intentionally so limited.

For SSL caching, if you deal with a high volume of concurrent users of an HTTPS website you might want to tweak haproxy’s native SSL caching. By default, haproxy will cache 20,000 sessions for 300 seconds (five minutes) and will immediately attempt to re-use this session (including symmetric key and ciphersuite) when a client re-connects. This saves a lot of latency due to less calculate and less communication from client to server. Since this functionality is native you only need to tweak global parameters such as either the size with the tune.ssl.cachesize keyword or the cache lifetime with tune.ssl.lifetime.

Returning to Stick Tables

Beyond session persistence, you can use stick tables to both secure your applications and limit the rate of traffic as it goes through the load balancer. These options aren’t like the nginx options, where you can limit resource utilization for legitimate clients and slow them down. haproxy’s stick table-based controls do help one to secure the application against denial of service attacks or vulnerability scans.

Since stick tables are essentially just in-memory database tables optimized for quick lookups, you can use them for retrieving data associated with a visitor’s record. Unfortunately you can’t store arbitrary data (outside of the key value itself) but haproxy does give us some handy options to feed the stick-table directive (via the store argument) when we’re creating it. An abbreviated list would be:

  • gpc0/gpc1are a generic unsigned 32-bit integers to associate with the record.
  • gpc0_rate/gpc1_rate are read-only unsigned 32-bit integers that indicate how fast their respective counters are growing. Takes a single argument (during table definition) that indicates the rolling period being monitored.
  • conn_cnt/conn_curr unsigned 32-bit integer indicating the total number of connections received and the total number of connections currently open.
  • conn_rate similar to gpc0_rate but indicates the rate at which new connections are created. Takes a single argument (during table definition) that indicates the rolling period being monitored.
  •  http_req_cnt/http_req_rate HTTP-level analogs for the similarly named data types above. Due to request pipelining (or multiplexing with HTTP2) these should be preferred for web applications.

There are additional data types for bandwidth counting, but these are the most useful in my experience. To be able to use this data, though you need to be able enable tracking. Otherwise the data type will be reserved in the stick table but things like conn_cnt won’t be populated.

Security

OK so now that we’re storing more information in the stick table let’s use it. Let’s start by implementing rudimentary DoS protect. Let’s do this by rejecting HTTP requests in excess of 100 per second. This would be an example configuration:

listen testsite
  mode http
  bind 0.0.0.0:80

  stick-table type ip size 30k expire 5m store conn_rate(10s)
  http-request track-sc0 src
  http-request deny if { src_conn_rate gt 100 }

  server web01 web01:80 check
  server web02 web02:80 check
  server web03 web03:80 check

Let’s take the important parts one at a time:

  • We create a new stick table with the stick-table keyword
    • We use type ip because we’re going to use the IP address as the record’s key.
    • We have 30,720 possible entries (via size 30k) which expire after five minutes (expire 5m).
    • In addition to the backend server assigned (aka server_id) we’re storing a rolling period of 1 second for storing the connection rate (aka conn_rate(10s))
  • We instruct haproxy to begin tracking the current connection by associating it with the record matching the client IP address (src).
    • To do this we associate the client IP with the “stick counter” number 0: http-request track-sc0
  • We then instruct haproxy to deny the latest HTTP request if the src_conn_rate is above 100. Since our rolling period is one second this will only happen if they’ve exceeded our limit.
    • To accomplish this we use a new option for http-request called deny based on whether or not the given ACL evaluates to TRUE.
    • In this case rather than using a predefined ACL (since this is the only place we’re going to use it) we’re using an anonymous ACL by placing the logic within curly brackets on the same keyword line.
    • By rejecting at the HTTP level even if the user is in the middle of a pipeline of HTTP requests, their requests begin to be denied.
    • Ideally you would also pair this with tcp-request which would stop a successful TCP connection from being formed. Denying at the HTTP level stops abusive HTTP pipelines but blocking at the TCP level is preferable due to lower resource requirements given that higher level functionality is never invoked.

OK so the above does an alright job of kicking abusive users to the curb, but once our one second window is over with then they’re right back. We need a more permanent record of abusive users.

To accomplish this we need to use the aforementioned general purpose counter gpc0 to flag particular users as abusive on a more permanent basis. Doing this in the same stick table we’ve defined above would work but presents a traffic problem. If your haproxy instance is clustered, all updates to shared stick tables must be communicated to all peers in the cluster. If each and every request a user makes regardless of intent causes an update to the table (as is the case with monitoring the conn_rate) then that puts a lot of stress on the network and makes the cluster more fragile as a result.

To get around this we can structure our stick tables such that conn_rate is tracked by an unshared stick table, whereas the gpc0 value that flags abusiveness can be shared by itself. In the event that we lose the active load balancer and there’s a fail over, abusive users will remain blocked and the new load balancer will just lose the 1 second of conn_rate tracking the old load balancer had stored in memory.

Let’s look at what the configuration for something like that would look like:

listen testsite
  mode http
  bind 0.0.0.0:80

  stick-table type ip size 300k expire 60m store conn_rate(1s)
  http-request track-sc0 src

    ## Set or Retrieve abuser status. When flag_abuser is evaluated gpc0 will be incremented
  acl flag_abuser src_inc_gpc0(abuse) ge 0
  acl is_abuser src_get_gpc0(abuse) gt 0

    ## Define Abusive behavior
  acl path_too_long path_len 15:
  acl above_request_limit src_conn_rate gt 20
  acl wp_login path_beg /wp-login.php
  acl valid_auth_cookie req.cook(flash) thunder

    ## Evaluate abuse ACL's dropping the connection where appropriate.
  http-request silent-drop if path_too_long flag_abuser
  http-request silent-drop if wp_login flag_abuser
  http-request silent-drop if !valid_auth_cookie { path_beg /admin } flag_abuser
  http-request silent-drop if !valid_auth_cookie { path_beg /moderation } flag_abuser
  http-request silent-drop if above_request_limit flag_abuser

    ## Categorical denial at the TCP levels for abusers
  tcp-request connection silent-drop if is_abuser

  server web01 web01:80 check
  server web02 web02:80 check
  server web03 web03:80 check

backend abuse
  stick-table type ip size 10k expire 120m store gpc0

OK wow. There’s a whole lot more to dig into:

  • We’re creating two stick tables,
    • Both are using IP addresses for the lookup key.
    • One stick table for the high frequency connection rate tracking. We’re capable of tracking 307,200 connections with entries expiring after 10 minutes and connection rates being tracked within a 1 second window.
    • The other is in an otherwise empty backend section called abuse. haproxy only allows a single stick table per section but allows you to refer to other sections’ stick tables so this is the only way to create a second one. No actual configuration of backend servers is required though, it’s just an empty container for the stuff we’re putting into it.
  • We initiate the actual tracking with http-request track-sc0 src which loads the actual requester’s IP address (again from src) into memory as the lookup key and retrieves the entries in the two tables specified with our previous stick on keywords once layer 7 processing has started. At this point src_* fetch methods will pull data from these two tables.
  • We define two ACL’s for modifying and querying the abuse status for this user
    • The flag_abuser ACL will execute src_inc_gpc0  in order to increment the gpc0 counter in all tables. The ACL itself will always evaluate to TRUE due to the actual ACL logic checking to see if the value returned (the value of gpc0) is above zero which it will always be even if the entry was previous non-existent (which evaluates to zero). This code will only be evaluated if the ACL itself must be evaluated. This becomes important behavior in a bit.
    • The is_abuser ACL is a true ACL in the sense that it checks a boolean status instead of being an indirect means of executing code. In this case it checks the current value of gpc0 and returns TRUE only if flag_abuser has been called on this IP address before.
  • We then define three ACL’s that would qualify a user as “abusive” with our app. Real world examples would likely be more complicated. I won’t dissect each one here since the syntax should be obvious. If you’re uncertain of what an ACL is doing, please refer either to the haproxy documentation (linked below) or the “Access Control Lists” section above.
  • We then use a http-request silent-drop if pattern for each of our ACL’s.
    • Technically, we could do this inline as well instead of defining named ACL’s like I’ve done but when configurations get complicated I prefer named ACL’s as they help produce self-documenting code.
    • In each case, we’re stringing together multiple ACL’s with logical AND operations. With haproxy, if you list multiple ACL values without any sort of logical operator there’s an implicit AND operation between the ACL’s where the preceding ACL expression must return TRUE before the following ACL’s will be evaluated.
    • The first directive enforces a maximum URL length of 15 characters (including preceding slash). This is useful for attacks which involve crafting abnormally large URL’s. To determine the application’s enforced maximum URL length allow your application to be used under realistic conditions for at least a month, gather all the paths from your logs, take the longest path length present and add 10-15 characters of buffer space.
    • The second directive simply will first check if the request path looks like someone trying to find a wordpress login page. If this evaluates to TRUE then flag_abuser will be called, otherwise flag_abuser is left alone.
    • The third and fourth directive complicate things a bit by introducing both negation and the mixing anonymous ACL’s and named ACL’s. In this one if the valid_auth_cookie ACL does NOT return TRUE (i.e. the user’s cookie is invalid) then their request will be checked to see if it’s for a path that’s sensitive within our application if that DOES return TRUE then flag_abuser will be called.
    • Finally, we check the above_request_limit ACL which will return TRUE if the user has issued more than 20 requests during the rolling window for monitoring the conn_rate for this user.
    • In every directive flag_user is only called if ACL validation makes it that far which means all of our criteria for an abusive user was met and we’re good to increment gpc0 and perform the default action for the directive.
    • The silent-drop directive has the effect of immediately dropping the TCP connection without sending  any TCP reset packets or HTTP errors to the client. In effect their connection must just time out.
    • If each connection is established and then just dropped though, this can leave stale connections in intermediate routers and firewalls though. This in turn may create an opportunity for Denial-of-Service if they know that’s what you’re doing. For this reason you may prefer tarpit instead of silent-drop which simulates a failure of a backend server and issues a 500 error. The question of which you would want is dependent upon what kinds of attacks you expect to happen.
  • Regardless of whether the HTTP connection is silently dropped or tarpitted, you can safely drop all future connections from abusive users. When the connections don’t complete routers and firewalls should purge them from their tables as required.
    • We’re doing this using tcp-request connection silent-drop based upon the client IP’s abuse status.

As you can imagine it can get much much more complicated than the above, especially when you’re load balancing multiple applications, but the above should give you a firm basis to start securing your applications on the load balancer itself.

HAProxy API

Often times you might want to administer your load balancer using kid gloves. For instance, you may want to modify the behavior of a particular website without affecting absolutely all websites being load balanced through haproxy or safely take a node offline by first draining any open connections from it while leaving the load balance itself online.

To accomplish this haproxy provides a simple text-based command API for interacting with the running state of the load balancer. There’s also a RESTful API available for the enterprise ALOHA implementation but I won’t be describing that here for the same reason I didn’t describe the nginx API either.

Admin Socket Configuration

There are two ways to interact with the API: over a unix domain socket or over a TCP socket.

You configure each the same way, by including a stats socket directive in the global section of your configuration file and ensure that its privilege level is set to admin. Take this example globals section:

globals
  stats socket ipv4@0.0.0.0:1234 level admin
  stats socket /run/haproxy/admin.sock mode 0600 level admin

This creates two different admin gateways:

  • The first directive instructs haproxy to open a stats socket on all available IP addresses binding to port 1234 and giving connections to this port admin-level access to the load balancer.
    • You can interact with this socket by way of netcat. For example, to print the load balancing metrics over TCP:
      • echo show stat | nc localhost 1234
    • Given the lack of encryption or authentication, you probably want to either forgo this option in production setups or at least configure some strict HBAC controls (such as port knocking) so that only trusted systems can communicate with the client.
  • The second directive creates the same admin socket using a UNIX Domain socket located /run/haproxy/admin.sock
    • In addition to level admin we can set the mode of the socket to 0600 so only the root user can communicate with it.
    • To communicate over this socket you can use the socat utility. For example:
      • echo show stat | socat /run/haproxy/admin.sock -

In the case of the TCP socket, additional security can be enabled by enabling SSL on this socket the same as mentioned above. For example:

stats socket ipv4@0.0.0.0:1234 level admin ssl crt /etc/haproxy/server.pem

If you enable SSL you will need to use openssl s_client instead of netcat to communicate with haproxy. For example:

echo show stat | openssl s_client -ign_eof -connect localhost:1234

The -ign_eof overrides some non-intuitive behavior with openssl that breaks communication with the API gateway.

As you can probably glean from the above, the text-based API operates by way of simple commands and arguments (similar to the bash command line). To get a full list of possible commands just use the help command with no options or check haproxy’s management guide.

ACL Manipulation

As stated much earlier in the post, ACL’s can execute arbitrary code (as with the gpc0 stick table counter before) but ultimately provide TRUE or FALSE results that later conditionals can reference to determine their appropriate behavior via if and unless clauses. haproxy doesn’t support ACL creation through the API but matching criteria can be added or removed as necessary. This allows for conditional logic to be primarily defined in the textual configuration but tweaked via the API.

Let’s take the example of being able to put a website into “lock down” mode if we determine a credibly threat is attempting to exploit the application. The lockdown mode should cause all requests to /admin (the hypothetical application’s administration area) to be rejected. Let’s assume the listen block looks like this:

listen testsite
  mode http
  bind 0.0.0.0:80

  acl site_lockdown dst_port,mul(0) 1
  http-request deny if { path_beg /admin } site_lockdown

  server web01 web01:80 check
  server web02 web02:80 check
  server web03 web03:80 check

The above has two important parts:

  • A new ACL called site_lockdown is created.
    • It takes the local port the client has connected to (which is port 80 above but the important part is that it’s always an integer) and then multiples that integer with zero, thus always yielding zero.
    • This result is then compared against the last argument which is hardcoded to be 1.
      • Since obviously 0 != 1 this ACL as written will always yield a FALSE result.
  • We use http-request to deny access to any URL that begins with /admin if site_lockdown is TRUE

OK now that the ACL is in place and always returning a FALSE (indicating the website is not in lockdown mode) let’s explore replacing this value.

First let’s enumerate the ACL’s defined on the load balancer with the show acl command:

root@lb01:~# echo show acl | socat /run/haproxy/admin.sock -
# id (file) description
0 () acl 'dst_port' file '/etc/haproxy/haproxy.cfg' line 9
1 () acl 'path_beg' file '/etc/haproxy/haproxy.cfg' line 11

OK, we can see our two ACL’s: the always-false dst_port ACL and the inline anonymous ACL for our http-response directive. Let’s take a closer look at the dst_port ACL which has an ID of 0 above again using show acl but this time giving the ACL index we’re interested in:

root@lb01:~# echo 'show acl #0' | socat /run/haproxy/admin.sock -
0x556df6060eb0 1

OK above we have a two column output. The first column is the memory location the ACL pattern is stored at and the second one is the matching pattern for this particular ACL. Let’s add a new matching pattern to the ACL with the add acl command instructing haproxy to match against 0 instead:

root@lb01:~# echo 'add acl #0 0' | socat /run/haproxy/admin.sock -
root@lb01:~# echo 'show acl #0' | socat /run/haproxy/admin.sock -
0x559e4a6d2f30 1
0x559e4a718760 0

Now that we’ve done that all patterns are logically OR’d together this new pattern will cause site_lockdown to start being evaluated to TRUE since dst_port,mul(0) actually does match 0. In turn this causes all directives conditional on site_lockdown being true (such as our http-request) to come into effect.

Once the drama subsides and you can delete the new pattern by instructing haproxy to delete all ACL entries that have a pattern of 0 like so:

root@lb01:~# echo 'del acl #0 0' | socat /run/haproxy/admin.sock -

Which should result in normal website functionality being restored.

Updating Maps

As mentioned before, you can use the map fetch method to return arbitrary values for those returned by the standard fetch methods. For instance, let’s say we have the following listen block:

listen testsite
  mode http
  bind 0.0.0.0:80

  http-response add-header X-Extra %[src,ipmask(16),map(/var/tmp/mapfile.txt,"Default Value")]

  server web01 web01:80 check
  server web02 web02:80 check
  server web03 web03:80 check

The relevant part in this block is the addition of the X-Extra response header:

  • We use the %[] construct to embed the fetch method and converters’ return data.
    • This is as opposed to the {} construct used for anonymous ACL’s that we used in the previous section. There is no comparison yielding a TRUE or FALSE here (hence not an “ACL”).
  • I’ve chained two converters  here, this isn’t specific to the %[] construct or the use of map per se, but is often useful in conjunction with map so that values undergo some level of sanitization so that we maximize the odds of correctly mapping to something inside of our map file
  • For the map itself, it is looking into the /var/tmp/mapfile.txt text file upon startup and loading all the map data inside that.
    • If no map key is located, a string of Default Value will be returned instead.

Let’s look at the contents of my mapfile.txt file:

8.8.0.0     United States
192.166.0.0 Germany
10.0.0.0    Private Network
192.168.0.0 Private Network
201.8.2.0   Brazil

But let’s imagine the haproxy instance has been started and we just now noticed the problem with that last line. Our ipmask converter reduces everything down to two octets meaning it will never match a201.8.2.0/24 because our previous converter will just reduce it to 201.8.0.0. Let’s use the API to delete this effectively dead map entry and add the entry we meant to add.

First let’s enumerate all the maps currently running using show map with no arguments:

root@lb01:~# echo 'show map' | nc localhost 1234 
# id (file) description
0 (/var/tmp/mapfile.txt) pattern loaded from file '/var/tmp/mapfile.txt' used by map at file '/etc/haproxy/haproxy.cfg' line 9

OK so the only map available is the one we’re after and it’s been assigned an ID of 0. Let’s look at the current contents of that map:

root@lb01:~# echo 'show map #0' | nc localhost 1234 
0x5561ed38ab90 8.8.0.0 United States
0x5561ed392280 192.166.0.0 Germany
0x5561ed392300 10.0.0.0 Private Network
0x5561ed392380 192.168.0.0 Private Network
0x5561ed392400 201.8.2.0 Brazil
root@lb01:~# echo 'del map #0 #0x5561ed392400' | nc localhost 1234

root@lb01:~# echo 'show map #0' | nc localhost 1234 
0x5561ed38ab90 8.8.0.0 United States
0x5561ed392280 192.166.0.0 Germany
0x5561ed392300 10.0.0.0 Private Network
0x5561ed392380 192.168.0.0 Private Network

root@lb01:~# echo 'add map #0 201.8.0.0 Brazil' | nc localhost 1234 

root@lb01:~# echo 'show map #0' | nc localhost 1234 
0x5561ed38ab90 8.8.0.0 United States
0x5561ed392280 192.166.0.0 Germany
0x5561ed392300 10.0.0.0 Private Network
0x5561ed392380 192.168.0.0 Private Network
0x5561ed392400 201.8.0.0 Brazil

Alright, so we’ve fixed the map in-memory. We can now modify the flat-file to protect against regressions should haproxy be restarted.

Managing Stick Tables

Let’s return to our previous stick table example where we had two separate stick tables one of which contains a gpc0 counter for flagging a user as abusive:

listen testsite
  mode http
  bind 0.0.0.0:80

  stick-table type ip size 300k expire 60m store conn_rate(1s)
  http-request track-sc0 src

    ## Set or Retrieve abuser status. When flag_abuser is evaluated gpc0 will be incremented
  acl flag_abuser src_inc_gpc0(abuse) ge 0
  acl is_abuser src_get_gpc0(abuse) gt 0

    ## Define Abusive behavior
  acl path_too_long path_len 15:
  acl above_request_limit src_conn_rate gt 20
  acl wp_login path_beg /wp-login.php
  acl valid_auth_cookie req.cook(flash) thunder

    ## Evaluate abuse ACL's dropping the connection where appropriate.
  http-request silent-drop if path_too_long flag_abuser
  http-request silent-drop if wp_login flag_abuser
  http-request silent-drop if !valid_auth_cookie { path_beg /admin } flag_abuser
  http-request silent-drop if !valid_auth_cookie { path_beg /moderation } flag_abuser
  http-request silent-drop if above_request_limit flag_abuser

    ## Categorical denial at the TCP levels for abusers
  tcp-request connection silent-drop if is_abuser

  server web01 web01:80 check
  server web02 web02:80 check
  server web03 web03:80 check

backend abuse
  stick-table type ip size 10k expire 120m store gpc0

Let’s assume as administrators we’ve determined a particular IP address is abusive even though it doesn’t meet any of our regular criteria. To do so we must manually set gpc0 for the entry to a non-zero number which will cause our is_abuser ACL to begin returning TRUE.

First let’s enumerate the tables and then inspect the one we’re interested in both with the show table command:

root@lb01:~# echo show table | socat /run/haproxy/admin.sock -
# table: testsite, type: ip, size:307200, used:1
# table: abuse, type: ip, size:10240, used:1

root@lb01:~# echo show table abuse | socat /run/haproxy/admin.sock -
# table: abuse, type: ip, size:10240, used:1
0x559e0fe7a0d8: key=192.168.122.1 use=0 exp=7162942 server_id=1 gpc0=0

OK so we’re seeing both the abuse table and the IP we’re wanting to flag (192.168.122.1) above. To manually flag the user, we can just set gpc0 to a 1 using the set table command:

root@lb01:~# echo set table abuse key 192.168.122.1 data.gpc0 1 | socat /run/haproxy/admin.sock -

root@lb01:~# echo show table abuse | socat /run/haproxy/admin.sock -
# table: abuse, type: ip, size:10240, used:1
0x559e0fe7a0d8: key=192.168.122.1 use=0 exp=7193806 server_id=1 gpc0=1

And that’s it, user has been blocked.

Please note that in our current configuration this won’t close any connection the abuser has currently open. In order to immediately reject all further communications from abusive parties you should change the tcp-request connection silent-drop if is_abuser directive from connection to content (or add a duplicate directive that only changes that one thing) so that any data sent over TCP triggers the ACL check and closes the TCP connection.

API Limitations

As stated before there are many things this API can or can not do. For instance, you can not create new ACL’s or modify their fetch method/converter configuration. Similarly you can modify map files but you can’t create new ones. You also can’t make any configuration change that requires new directives to be introduced. Use of the API should be restricted to dynamic data that changes often and you don’t want said change to result in either a process reload (as with maps) or a configuration reload (as with most other changes). Thankfully configuration reloads are mostly non-destructive/non-invasive.

Outside of what has been mentioned above, the API can do basic backend server management, which we will go into more detail in the “Dynamic Member Management” section.

Logging, Alerts, and Monitoring

So once your application is up and running you need to be able to both monitor the load balancers themselves and log the traffic that’s going through them. haproxy is a little peculiar in both those categories so it bears going into.

Enabling Logging

One of the most peculiar things to me is haproxy’s approach to log management. The only means of getting log data out of haproxy is through syslog and unlike nginx there’s no native option for either combined or common log formats. That means whatever you’re using for log extract (to elasticsearch or what have you) has to be capable of understanding the format haproxy give us.

The most basic way to enable logging globally, this is useful for monitoring the haproxy application itself (rather than log traffic going through the load balancer) and is configured as simply as providing the log keyword in the global section of your configuration:

global
  log /dev/log local0

The above instructs haproxy:

  • To communicate (using the syslog protocol) with syslog over the /dev/log UNIX domain socket (the default syslog socket on most Linux systems)
  • To use facility local0. Obviously if you’re using local0 for something already, change that part to one of the other localX facilities.
  • In the context of haproxy running in a container, your best bet is probably to either have something like  logstash running sidecar in the same pod or establish a central syslog service and replace /dev/log with ipv4@x.x.x.x where x.x.x.x is the IPv4 address of the service.

For request logging, logging must be enabled in either the frontend or listen block receiving the connection the same way it was enabled in global. Meaning you can’t assume that just because you have a log directive in your global section that it will be inherited by all other relevant blocks in your configuration (which is how the other directive work). You can still point to the global log in your proxy sections but with the global argument. For example this would be a sparse but complete configuration:

global
  log /dev/log local0

listen testsite
  mode http
  bind 0.0.0.0:80
  log global

  server web01 web01:80 check
  server web02 web02:80 check
  server web03 web03:80 check

In the above we define the global logging mechanism as being syslog with local0 as the facility. This configuration produces this for requests HTTP that come in:

Aug 04 23:26:20 lb01 haproxy[7558]: Connect from 192.168.122.1:50240 to 192.168.122.11:80 (testsite/HTTP)

Obviously, the above is about as basic as a log gets, just letting you know the client socket and local socket they connect to (along with the listen/frontend block associated with that socket).

In the next section we’ll cover log formats getting more useful log information.

Log Formats

Alright so the default haproxy request log format is boring to say the least. So let’s see if we can do better. First let’s assume we have a HTTP web application that we’re load balancing. The most least-effort way of logging this information might be:

listen testsite
  mode http
  bind 0.0.0.0:80
  log /dev/log local0
  option httplog

  server web01 web01:80 check
  server web02 web02:80 check
  server web03 web03:80 check

The option httplog is new can causes haproxy to produce logs such as:

Aug 05 20:24:30 lb01 haproxy[8565]: 192.168.122.1:55900 [05/Aug/2018:20:24:30.279] testsite testsite/web01 0/0/1/1/2 200 352 - - ---- 1/1/0/0/0 0/0 "GET / HTTP/1.1"

OK so now we’re getting more information albeit in the idiosyncratic haproxy format. This particular log entry breaks down to this:

  • Aug 05 20:24:30 lb01 haproxy[8565]: normal syslog preamble.
  • 192.168.122.1:55900 The client IP address+port they’re connecting from
  • [05/Aug/2018:20:24:30.279] Additional date information
  • testsite testsite/web01 The frontend name followed by the backend name and the actual backend application server the request was proxied to.
  • 0/0/1/1/2 Request timing metrics:
    • First 0 is the time spent receiving the HTTP request from the client
    • Second 0 is the time the request spent waiting in queue. This would be non-zero if requests are being backlogged on the load balancerto spare the application server.
    • First 1 is the number of milliseconds it took haproxy to establish a connection to web01
    • Second 1 is the number of milliseconds haproxy spent waiting for the backend server to send an HTTP response
    • The final field, the 2 is the total number of milliseconds spent on this request. In this case it’s just the 1ms for connecting to the backend server and the 1ms for waiting until web01 sent a response.
  • In this case we have a 200 status code indicating success.
  • Our response from the point of haproxy to the client (including headers haproxy adds) is 352 bytes.
  • The first two  - is only used if you use capture cookie which I haven’t included a description of here.
  • The ---- is a placeholder for the connection state which would contain descriptive information if the connection aborted abnormally.
  • 1/1/0/0/0 Connection Metrics
    • The first field is the number of active connections to the load balancer at the time of the request
    • The second field is the number of active connections to this particular frontend or listen block.
    • Third field represents the number of backend connections that remain open to the application server at the point of logging. Since by default logging only happens after the request is completed and there are no other requests going to this load balancer it’s zero in my case.
    • Fourth field is the number of connections still active to the backend server
    • Finally, the fifth field is the number of retries haproxy had to make for this request. Typically zero unless something’s broken.
  • 0/0 Backend Queue Metrics
    • First field is the “server queue” metric. If this is non-zero then it’s a measure of how many requests were queued for the backend server on the load balancer.
    • If the second field is non-zero it’s the same measure but for all requests to the same pool.
  • Finally, we have the first part of the HTTP request made so we can see what they actually were doing. In this case, just accessing the main index page at /.

OK, phew. Not that we understand that, let’s see if we can coerce it to look more like Apache or nginx logs using a custom format:

listen testsite
  mode http
  bind 0.0.0.0:80
  log /dev/log local0
  option httplog
  log-format "%ci - - [%trg] \"%r\" %ST %B \"-\" \"-\""

  server web01 web01:80
  server web02 web02:80 
  server web03 web03:80

I’m not going to explain Combined Log Format here (there are other resources for that) but let’s make note of some important points:

  • The second and third fields (ident and REMOTE_USER respectively) are categorically nulled out since haproxy doesn’t have a way of logging this information. This is available to HTTP but haproxy has no way of referring to them.
  • Similarly both the HTTP referer and User-Agent fields can’t be included.
  • Please note that when a value can’t be included a hyphen represents null value and string values (like referer and User-Agent) are still quoted even if they’ll never be available.

OK but let’s say you were load balancing MySQL database access instead of HTTP. Luckily you can still log at the TCP level with haproxy:

listen testsite
  mode tcp
  bind 0.0.0.0:3306
  log /dev/log local0
  option tcplog

  server web01 web01:3306
  server web02 web02:3306
  server web03 web03:3306

Which produces logs such as:

Aug 05 21:15:34 lb01 haproxy[8760]: 192.168.122.1:35518 [05/Aug/2018:21:15:28.375] testsite testsite/web03 1/0/5652 273 -- 1/1/0/0/0 0/0

You’ll notice immediately that it looks almost exactly like the default HTTP log format. That’s more or less what it is with a few exceptions:

  • The 1/0/5652 compound field represents “Queue time”, “Miliseconds to connect to the backend server”, and “Milliseconds lifespan off the TCP connection” respectively
  • The HTTP-specific fields such as status code, request, and cookie fields are not present

At this point you should have a basic understanding to get started on logging. Like I said early haproxy is a perhaps too complex topic but if you understand the above you should be able to address any logging problems you have to the furthest extent possible.

Alert Management

OK so we now have logging setup, let’s move onto getting alerted when something changes with the state of a load balance pool. Currently haproxy only supports email alerts natively, but external checks can also be written to include any manner of custom notification options outside of email. Since email alerts are the only native option, that’s the only one I’ll concentrate on here.

Let’s look at an example configuration that includes email alerts:

mailers localsmtp
  mailer mysmtp smtpgw.example.com:25

listen testsite
  mode http
  bind 0.0.0.0:80

  email-alert mailers localsmtp
  email-alert to me@example.com
  email-alert from me@example.com

  server web01 web01:80 check
  server web02 web02:80 check
  server web03 web03:80 check

Breaking down the new parts:

  • We’ve introduced a mailers section I’ve called localsmtp
    • Optionally, we can specify multiple SMTP servers here, but I’ve only created a single one called mysmtp
  • We’ve introduced the email-alert keyword
    • The first one ties email alerts for this  listen block to the localsmtp mailers we defined up top.
    • We also set the “To” and “From” (respectively)
  • Finally, we ensure health checks since that’s what will trigger the alert.

The above generates an email similar to:

Subject: [HAproxy Alert] Server testsite/web03 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue
Body:
Server testsite/web03 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue

It’s impossible to change either the subject or the body of the email (this is baked into the haproxy binary itself) but as stated before, if you need customized alerts you can issue them from an external check script.

Status and Basic Performance Monitoring

OK now that we’re logging traffic and being notified when we lose a backend server, the final piece of the puzzle when it comes to basic monitoring is having some sort of dashboard view so we can get a quick summary of how our load balance pools are doing. You’ll likely have the most success with custom monitoring tools, but haproxy comes with a baked-in dashboard system for monitoring your pools.

Here is an example configuration file:

listen statspage
  bind 0.0.0.0:8080
  mode http
  stats enable
  stats uri /
  stats refresh 5s
  stats show-node

listen testsite
  mode http
  bind 0.0.0.0:80

  server web01 web01:80 check
  server web02 web02:80 check
  server web03 web03:80 check

The listen block at the bottom is fairly normal, but the listen block at the top is where the magic happens:

  • We bind like normal, and enable http mode so that our later commands can make reference to HTTP-level information.
  • The statistic page is then configured with various stats keywords:
    • stats enable signifies that a statistics page will be served through this frontend.
    • stats uri / specifies that the stats page will be located at the HTTP root.
    • stats refresh 5s is an optional parameter that causes the stats page to automatically refresh ever five seconds.
    • Finally, stats show-node will print the server hostname at the top of the page so that it’s easier to keep track of which load balancer you’re looking at. This is useful when you’ve set up an haproxy cluster as mentioned later.

Some people feel the need to enable stats auth to prompt for a username or password to access the page, but the page doesn’t contain sensitive information (outside of backend server hostnames) so security is probably better served through serving the stats over a particular IP address and implementing some sort of HBAC to restrict access to the stats page to internal users only.

Clustering

Overview

Of course part of the benefit of a load balancer is shielding users from temporary disruptions in the backend server. If you have to reboot the VM serving a particular application you want everyone to be moved to the servers that aren’t rebooting. Part of this availability goes away though if the load balancing function itself isn’t highly available.

To enable high availability for load balancers you need to implement clustering. For a load balancer to be considered properly clustered all haproxy instances need to share at least three things:

  • Configuration Meaning configuration changes are deployed to the load balancers (via ansible, git, etc) rather than being manually maintained which introduces the possibility of human error. If your load balance includes external checks, you need to deploy those along with configuration as a package rather than piecemeal.
  • State Which can be understood as both the data populating key stick tables (such as abusive users) but also the backend application state.
  • VIP Addresses External users shouldn’t be aware of planned change overs from one load balancer to another. You may mitigate this somewhat by managing many different VIP’s and draining/migrating them one-by-one but ultimately you’ll need to implement a VIP management solution like keepalived or ucarp for performing the automatic recovery.

The first and last are out of scope for haproxy, but haproxy does natively support state sharing with passive nodes which is what we’ll concentrate on here. Configuration management can take many forms and I haven’t written an article on that yet. For VIP management you can also read my keepalived and ucarp articles.

State Transfer

For peer state transfer to happen, you first need to configure the peers haproxy actually has using a peers block and share a particular stick table out to a particular set of peers. Without a stick table to share, no transfer will take place. Let’s take a look at a simple configuration:

global
  stats socket /run/haproxy/admin.sock mode 0600 level admin

peers clusterpeers
  peer lb01 lb01:1024
  peer lb02 lb02:1024

listen testsite
  mode http
  bind 0.0.0.0:80

  stick-table type ip size 30k expire 5m peers clusterpeers store gpc0
  http-request track-sc0 src
  acl is_flagged src_get_gpc0 gt 0
  http-request reject if is_flagged

  server web01 web01:80 check
  server web02 web02:80 check
  server web03 web03:80 check

Breaking this down:

  • We establish a peer group called clusterpeers
    • Each peer is given an “haproxy” name which is used during state transfer to identify where a particular update is coming from so it’s important that it’s unique for the cluster
    • We then specify the hostname and port combination for peer communication to take place over.
    • If an haproxy instance finds a peer instance that whose haproxy name matches its default hostname (or an IP address that appears on a local interface) it will assume this is its peer entry and open the port specified in the second argument.
    • If there are no stick tables shared out these ports will remained closed.
  • We create a basic stick table in the testsite listen block
    • uses ip address for entry key and stores only the gpc0 counter
    • Each entry expires after five minutes (5m) and the table can store 30,720 entries.
    • We share this stick table out to the peers listed in the peer group above with peers clusterpeers
  • we then enable associating the visiting user with stick table entries and use an ACL called is_flagged to determine if the gpc0 for the given IP is non-zero.
  • If it is non-zero we reject the HTTP request.

Assuming both lb01 and lb02 have identical configurations this will result in the stick table testsite being migrated to both nodes. For instance, setting the gpc0 counter on lb01 should result in it being immediately available and set on lb02:

root@lb01:~# echo set table testsite key 192.168.122.1 data.gpc0 1 | socat /run/haproxy/admin.sock -

root@lb01:~# echo show table testsite | socat /run/haproxy/admin.sock -
# table: testsite, type: ip, size:30720, used:1
0x559d8f0314e4: key=192.168.122.1 use=0 exp=297594 gpc0=1

meanwhile on lb02:

root@lb02:~# echo show table testsite | socat /run/haproxy/admin.sock -
# table: testsite, type: ip, size:30720, used:1
0x55ba2fe3fd54: key=192.168.122.1 use=0 exp=297394 gpc0=1

There are no native encryption or authentication mechanisms available for this peer communication, so if you need to be concerned about the security of your load balancer’s state information (for example you’re on a public cloud provider) you’ll need to setup a point-to-point network using something like weave or IPSec.

Monitoring Cluster Status

There is no native monitoring for peer status changes, so you have to get creative. One way to monitor peer status is to create a null backend checking port status sending email alerts on changes. An example configuration might look like this:

global
  stats socket /run/haproxy/admin.sock mode 0600 level admin

peers clusterpeers
  peer lb01 lb01:1024
  peer lb02 lb02:1024

mailers clusterpeers
  mailer gateway smtpgw.example.com:25

backend clusterpeers

  stick-table type ip size 1 expire 60m peers clusterpeers store gpc0
  server lb01 lb01:1024 check
  server lb02 lb02:1024 check

  email-alert mailers clusterpeers
  email-alert to support@example.com
  email-alert from support@example.com

listen testsite
  mode http
  bind 0.0.0.0:80

  server web01 web01:80 check
  server web02 web02:80 check
  server web03 web03:80 check

The above creates an incredibly small stick table so that the peer state transfer ports will stay open regardless of what other stuff you eventually take out of put into the other listen/frontend/backend blocks. It is then designed to send email alerts when the TCP check fails with a remote peer. I’ve yet to devise a ping/pong external check for peers so a TCP check is as close as I’ve ever been able to get it.

Dynamic Member Management

OK so now we’ve reached the capstone of the article. Now that we have a pretty functional intermediate-to-advanced knowledge of how haproxy functions, let’s bring it together into the most comprehensive scenario that’s still realistic to learn about: dynamic autoscaling.

Let’s construct a scenario:

  • You administer a monolithic web application (i.e not SOA) where the worker nodes can be scaled up on demand with no ill effects on the fellow workers.
  • Each worker sits on its own virtual machine, and a new instance needs to be deployable by deploying a new server template in VMware.
  • When a new VM is provisioned, it should automatically come onto the network and connect to the database on its own and join the load balance.

OK so as for what’s left to be done on the “ops” side is getting onto the network automatically can be done through DHCP. Once we have our IP address though how do we get haproxy to begin delivering load?

Using Server Templates

The way that’s both the most direct and involves the least infrastructure would be to configure server templates in the haproxy configuration and use the API to manually set the IP addresses and add them to the load balancing pool for the application and then re-disable them once the worker node leave the load balance during the “scale down” portion of the autoscaling.

Following this approach, might look something like this:

global
  stats socket ipv4@0.0.0.0:1234 level admin

listen testsite
  mode http
  bind 0.0.0.0:80
  server-template web 300 web01:80 check disabled

The above is pretty succinct and straight forward:

  • We enable the administrative API on port tcp/1234
  • We establish a server template called web
    • Upon startup this template will cause haproxy to generate 300 slots for backend pools.
    • It will fill each slot with a server named “template name”+ index (where the index is the current iteration it’s on) with a server definition set as what follows the server count.
    • In the above each server will have a default web01:80 backend, use TCP checks for health checks, and most importantly start in disabled status so that we’re only load balancing to web01 once rather than 300 times.

So when haproxy starts up it has a bunch of backend servers defined but none of them are active. The benefit of this approach is that it creates a slot for our new servers to occupy and we can use the API to modify and enable the backend server definition.

First let’s manually enable  web01 just because that’s a fun thing to do:

root@lb01:~# echo "set server testsite/web1 state ready" | nc localhost 1234
root@lb01:~#

At this point the load balance should switch from “Unavailable” to exclusively load balancing to web01.

Let’s now manually set an new IP address on web2 and enable it to receive new connections:

root@lb01:~# echo "set server testsite/web2 addr 192.168.122.22" | nc localhost 1234
IP changed from '192.168.122.21' to '192.168.122.22' by 'stats socket command'
root@lb01:~# echo "set server testsite/web2 state ready" | nc localhost 1234
root@lb01:~#

At the time of this writing setting the FQDN (as previous config examples have done) causes unstable pool behavior likely due to a bug so I’ve switched to modifying the IP address instead.

At this point the web02 application server should be part of the load balance and serving requests. Let’s say your logic has now determined that there isn’t as much of a need for workers here and needs to scale down that process.

First you start that process by setting the backend server’s status to drain which prevents new user requests from going to it:

root@lb01:~# echo "set server testsite/web2 state drain" | nc localhost 1234
root@lb01:~#

At this point, you would start some process to monitor your application’s network usage. In the case of web servers you may wait for a long pause in HTTP requests (indicating that the client has disconnected) or the total data transfer rate to the load balancer to drop below a certain level (or potentially both). What you should not do (per the documentation) is assume your TCP connection count to go down to reach zero as the load balancer may still attempt to create persistent connections (such as HTTP keep-alive) or perform health checks.

Once you’re satisfied that no users are actually access the web application through this VM, you can then finally set the application server’s state to MAINT to permanently remove it from the load balance:

root@lb01:~# echo "set server testsite/web2 state maint" | nc localhost 1234
root@lb01:~#

To recap the overall approach here is:

  1. Establish enough servers in the configuration to handle more than the peak load of the application but start them off in disabled status.
  2. When new VM’s are added to the load balance, they receive an IP address via DHCP.
  3. The VM’s startup scripts contact haproxy over the API to determine an open slot
  4. The VM then uses the API again to set an open slot to use its IP address at the known port and change it’s status to ready.
  5. Once the worker isn’t needed anymore the VM’s shutdown scripts call out to the API to change its status to drain so that requests stop coming in and shutdown stalls on waiting for some criteria to be met proving that requests are not being serviced by this application server anymore.
  6. Once the criteria has been met, the API is called once more to set the status to maint indicating purposeful absence (rather than DOWN which indicates a failure of something that should be working).

And in so doing the above, you’re able to add and remove backend web servers on the fly without modifying the haproxy configuration or potentially disrupting other applications that might also be going through the same load balancer.

Using DNS-based Service Discovery

Now the approach above is functional but there are a few issues that jump out:

  • The slot selection process requires a good deal of custom code. All auto-scaling will consist of some amount of custom code but maximizing your use of standard tools helps improve readability.
  • It requires API access to haproxy, when you may want to limit a particular set of VM’s to merely being able to add and remove iterations of itself and not potentially have administrative access to the entire load balancer.
  • If you have to rapidly autoscale with new VM’s potentially coming up concurrently then you run the risk of race conditions where two new VM’s may inadvertently pick the same slot and then you have two instances running over top of each other with unpredictable results.

Given the above let’s explore using DNS for service discovery, specifically Consul. Using DNS has the benefit of being able to programmatically do the above without race conditions, without undue levels of access being given to automated processes, and without allocating more than we need. Additionally, since haproxy is determining this via DNS we can setup all web applications to load balance this way and then the net number of changes to haproxy that are required is actually zero. Neato.

Setting up Consul is a little out of scope for this article and so we will concentrate only on the parts haproxy will see. For the time being assume that Consul has been setup to use Access Control Lists that limit the worker VMs’  Consul API access to only the webapp service and anonymous users can only query DNS.

A working configuration might look something like this:

global
  stats socket ipv4@0.0.0.0:1234 level admin

resolvers consul
  nameserver consul 192.168.122.15:8600
  hold valid 10s

listen testsite
  mode http
  bind 0.0.0.0:80
  server-template webapp 4 webapp.service.consul:80 check resolvers consul

OK so let’s dig into the new stuff here:

  • We now have a resolvers section for defining DNS resolution outside of the normal system DNS.
    • In our case we point it at the Consul instance where our webapp is defined by specifying the consul server’s IP address and the default port for Consul’s DNS service (8600).
    • We instruct haproxy to hold onto (i.e cache) valid results for 10 seconds. This stops an undue amount of traffic going out to Consul from the load balancers.
    • You can also instruct haproxy to cache NXDOMAIN (i.e hostname doesn’t exist yet) responses from DNS but in general it’s usually best to assume things will generally be configured correctly and you won’t be pointing to hostnames in Consul that don’t actually exist. Therefore I would only cache invalid response (errors, NXDOMAIN, etc) if you were running into an issue.
  • Down in the listen block we have server-template again
    • The template named webapp with four preallocated slots.
    • The FQDN is webapp.service.consul with a port of 80 with TCP-based health checking enabled.
    • We set the DNS resolvers for this server to consul which isn’t strictly speaking needed but useful if you end up defining multiple resolvers.
    • At the time of this writing, installing 1.8.13 was required to get around initial DNS resolution failures which attempt to use the system’s DNS resolver instead of those configured via resolvers. If you must use 1.8.8 please add init-addr none to suppress DNS lookups for this server at startup.
    • Upon startup Consul will return however many backend servers there are as A records, and haproxy will fill the 4 available slots with them. Upon removal from Consul, they’ll stop  showing up in DNS and therefore haproxy will remove them from any slots they occupy. records in excess of available slots in haproxy will be silently ignore and only used if one of the active backends are removed.

and there you go. Your application is now setup to automatically reconcile the load balance pool with a record of note with no unnecessary access being given and no possibility for race condition.

Where to Go From Here

As hard as it may be to believe this guide, as comprehensive as it is, doesn’t cover haproxy completely. I’ve abbreviated many of the examples and the description of several of the keywords have intentionally left out possible options. To get a full sense of haproxy, though, you’ll need to just dive into it, do something useful, then rely heavily on this guide and the two official guides available on github (Management API and Configuration) to fill in what I didn’t cover above.

So in addition to completing your understanding of the various keywords you end up finding most useful where can you go? Well no cluster would be complete without VIP management. I would suggest you look at my ucarp and keepalived articles to see how to implement that.

By default haproxy is a single process multi-threaded application. You may find it useful to benchmark performance by tweaking the process management directives or the performance tuning options I’ve completely omitted from this guide.

I’ve also intentionally glossed over SSL because usually the basic configuration listed above is enough for most users. However you may want to tweak various parameters associated with SSL or implement the SSL client authentication I mentioned before. This would be a good option for implementing some level of client authentication for API access.

Beyond that the #haproxy freenode channel is always open and I’ve always found them very responsive. Whatever you do with haproxy though, I hope you found this guide useful and have a productive time.

Further Reading