Load balancing an application requires some forethought. Some applications are simple and can handle their load on their own, taking traffic directly from the users and if they go down from time-to-time then oh well, just bring it back soon, I guess.
Other applications need to scale more dynamically with higher availability requirements. Automatically scaling the application might include being able to spread the load acsross a static number of nodes or some solution that allows us to take nodes in and out as needed (often programmatically).
As you can imagine, the solution will increase in complexity in a manner that’s directly proportionally to the performance requirements of the application. For this reason it’s always helpful to have an understanding of how it all fits together and judge for yourself what your availability requirements are. In this article I’m going to try my best to cover the main areas you’re likely to care about. You may care about some more than others and for that reason I would ask that you be judicious in which sections you choose to read.
- Comparison with nginx Load Balancing
- How haproxy Works
- Basic HTTP Load Balancing
- Access Control Lists
- Routing Traffic
- SSL Termination
- Load Balancing Non-HTTP Traffic
- Gatekeeping
- HAProxy API
- Logging, Alerts, and Monitoring
- Clustering
- Dynamic Member Management
- Where To Go From Here
Comparison with nginx Load Balancing
Due to both having a “single process event-driven” model of operation, if you have experience with nginx load balancing then you may be wondering how haproxy compares and when you would use one or the other.
Features that haproxy has that nginx does not:
- Non-HTTP Health checks are easier that with nginx in that they often come pre-made as opposed to nginx which often forces you to write your own
match
directive. - Fewer instances where you run into paywalled features. Many of the more advanced features in nginx are incredibly useful especially for enterprise users but you’ll often run into Plus-only features seemingly at random.
Features that nginx has that haproxy does not:
- Full HTTP web server implementation. haproxy can only route HTTP traffic. It lacks support for things such as FastCGI or WSGI for communicating with web applications.
- haproxy does not support caching responses. This means if your dynamic content changes rarely then these are always retrieved from the backend for each request over and over.
- As a consequence of the above, using nginx for both load balancers and application servers makes it easier to support your overall architecture by re-using knowledge and experience from one for the other.
- Load balancing for UDP protocols such as syslog or DNS.
- In my opinion, development of the core nginx product (both Plus and OSS) seems to be proceeding at a pace that exceeds haproxy’s development.
My general rule of thumb would be to use nginx load balancing if at all possible unless you legitimately need advanced load balancing features and can’t afford the Plus version of nginx. Ultimately, given the options available nginx load balancing is just more comprehensive and personally I find its configuration much more intuitive.
How haproxy Works
Before starting anything with haproxy, it’s important to understand roughly how haproxy configuration is structured. It’s not the most intuitive system so without a reference point, a lot of this can be hard to wrap your head around initially.
Configuration Structure
Like nginx, haproxy configuration is primarily directive/keyword based with whitespace being largely optional with the exception of the newline character (which marks the end of a directive definition). By convention though, sections are usually separated by an extra new line and non-section keywords are indented so that it’s easy to see which section they belong to. Sections aren’t explicitly ended and instead section keywords implicitly indicate that the preceding section has ended. Keyword order is preserved though in certain cases (due to how the keywords function) order won’t matter.
Let’s examine a few of the more fundamental keywords:
defaults
: marks the beginning the section for default values for various parameters. Some parameters can’t be placed in adefaults
section while others can. You can have multipledefaults
sections but only the last has any effect (i.e the effect isn’t cumulative).listen
: marks the begin of a section that defines both the frontend connection, the backend server, and any relevant processing that must be done on the load balancer.option
configures a parameter for the given mode that you’re operating within. Many options can be negated by prefixing theoption
keyword with “no” for instanceno option checkcache
disables thecheckcache
option underneathhttp
mode. .acl
: defines a particular Access Control List Entry. Despite it’s name, the actual function of theacl
keyword is to match text from the request or response and set the ACL’s name equal to either a boolean true or boolean false. This boolean value can then be used in conditional logic for subsequent keywords.http-request
: performs operations either on or with the HTTP request. Once haproxy is done processing its request all data related to the request is no longer available.http-response
: performs operations after the response from the backend server has began coming back (and therefore by necessity afterhttp-request
above).
Knowing which keywords can be used in which sections can only be determined by checking the documentation.
If the effect you’re going for requires extracting data from a request or response then haproxy provides mechanisms known as fetch methods. Some of the more fundamental fetch methods would be:
req.hdr
/res.hdr
extracts the given header from the HTTP request or response (respectively). If a particular header contains one or more commas this is (per RFC) interpreted as multiple values for the given header being given on the same line. There’s an optional second argument for which occurrence you want to return. Example Usage:req.hdr(Host)
to extract theHost
header sent by the client.req.hdr(X-Custom,2)
returns the second value for theX-Custom
header in the HTTP request.
req.fhdr
/req.fhdr
are identical to the above methods except commas are not treated specially. As mentioned before, this deviates from RFC but is required for things such asUser-Agent
which will always contain a comma for historical reasons and should be considerd only one value. This also takes which occurrence you’re interested in as an optional second argument but will only consider fully new lines that start with the same header name as a second occurrence.req.cook
/res.cook
extracts the given cookie from the HTTP request or response (respectively).- Example Usage:
req.cook(visitingUser)
returns any cookie in the request calledvisitingUser
.
- Example Usage:
path
returns the path in the URL.- For the URL
http://example.com/topDir/myPage.php
thepath
fetcher will return/topDir/myPage.php
- For the URL
method
returns the HTTP verb (POST
,GET
,PUT
, etc) used in the request.status
returns the status code in the HTTP response generated by the backend.var
returns the variable given as an argument. For examplevar(txn.my_var)
will yield the contents of thetxn.my_var
variable.url_param
returns the value of the given URL parameter.- Example Usage: for the
GET /?myVar=block&otherVar=fit
HTTP request theurl_param(otherVar)
will return the stringfit
.
- Example Usage: for the
Once you’ve fetched the data, you can then run converters on-the-fly to transform the data into something else by immediately after the fetcher (no space) adding a comma and a pre-defined converter name. This operates in a manner similar to command line pipes where the output of one operation (the fetcher) is used as input for another operation (the converter). Some fundamental converters would be:
map(map_file)
opens the whitespace delimited file specified bymap_file
(at process startup) and locates the string to ultimately return by taking the fetcher’s output and locating the first line with a match for that string in its first field and then returning the string specified in the second field.- If a given fetcher returns
someData
andmap_file
contains the linesomeData returnMe
then the string the fetcher+converter would return isreturnMe
- If a given fetcher returns
-
ipmask(netmask)
takes an IP address as input and returns the network address that IP address belongs to for the givennetmask
.- If the source IP address of a request is
192.168.34.22
then a fetcher+converter ofsrc,ipmask(255.255.255.0)
would yield192.168.34.0
. This is useful for later comparison by being able to summarize IP’s by the network rather than individually.
- If the source IP address of a request is
lower
/upper
converts the string input from the fetcher to either lower or upper case respectively.regsub(matchingRegex,substituteText[, flag])
executes regular expression substitution on the input string. The arguments work similar to thesed
command on Unix/Linux. The optional third parameterflag
is the same as well for example a flag ofg
causes the regex to match all occurrences and not just the first occurrence. Some examples:req.hdr(X-Header),regsub(\s,,g)
strips all whitespace from theX-Header
request header. haproxy has special processing for\s
that prevents the configuration parser from non-intuitive interpretation of the backslash.req.hdr(X-Header),regsub(\\x2C,,g)
strips all commas from theX-Header
request header. This is normally problematic as literal commas in your regular expression would cause a syntax error. By matching the ASCII code for a comma (via hex) we sidestep this issue entirely. You’ll notice that hex codes need to be double backslashed otherwise the haproxy will interpret as you escaping thex
(unnecessary but syntactically correct) followed by literalx2C
characters which is what it sends to the PCRE library.
For fetch methods that take arguments, it’s important to not have any spaces inbetween each argument. The haproxy config processor is notoriously finicky and will fail randomly with vague errors if you do. The above is just a short list of the methods I’ve found most useful. Please consult the documentation for the full low down on all the fetch methods haproxy supports.
As mentioned above haproxy configuration also has the notion of scoped variables. The names can be any alphanumeric character or underscore but can not begin with a digit. Additionally, all variables have to fall within one of five pre-defined scopes that govern how long the variable will live:
proc
for variables that need to last for the length of a particularhaproxy
process.txn
for variables specific to a particular transaction (request/response pair) that can be deleted once the transaction is over.sess
for variables that only need to exist for the length of a particular session (series of transactions).req
for variables that only need to exist during the request processing portion of a transaction. For each transaction, variables scoped here will be garbage collected before anyhttp-response
directives are evaluated.res
for variables that only need to exist during the response formation portion of a transaction. For each transaction, variables scoped here won’t exist when anyhttp-request
directives are evaluated.
As an example of variable usage, variables are often useful when taking data available within haproxy’s HTTP request processing stage and utilizing them within the HTTP response stage by storing the data in a variable with txn
scope. Take this snippet for example:
http-request set-var(txn.request_method) method http-response set-header X-Request-Method %[var(txn.request_method)]
The above takes the output of the method
fetcher (mentioned earlier) and saves it to a variable called request_method
in the txn
variable scope which will survive the entire transaction. You’ll notice the introduction of the %[]
construct. This construct will execute a fetch method and return the data as a string.
OK, phew. Now that all that’s out of the way and we’re a bit more familiar with how haproxy configuration works, let’s actually start doing something useful.
Basic HTTP Load Balancing
Let’s start off simple with a regular round-robin load balance. With haproxy you need to define both a backend
instance and a frontend
instance that ties itself to a particular backend. A simple haproxy.cfg
configuration might simply be:
listen http-default bind *:80 mode http server node01 node01:80 server node02 node02:80
The above is a fully functional round robin load balance. Breaking it down:
- We create a new
listen
section calledhttp-default
- We
bind
to port80
on all available IP addresses (the*
) - We turn
http
mode on for this load balance to give haproxy application-layer visibility. The default mode oftcp
would also work but lacks the ability to do anything intelligent with HTTP.. - We then specify two backend servers with the
server
directive. The first argument is the name haproxy will use for the backend service (more on where that shows up later) and the second the hostname and TCP port for connecting to the backend service.
Considerations for HTTP Headers and Logging
Introducing a load balancer fixes some problems but it also introduces others. One such problem is the backend server’s lack of visibility on the client connection due to said connection now being terminated on the load balancer instead of the application server.
Two of the most important consequences of this are:
- If your frontend terminates an HTTPS connection but your backend connection is HTTP, the application server won’t be aware of that.
- The backend application server itself can now no longer see the IP address of each remote client connecting as now each application server’s TCP connection is with the load balancer instead. This complicates log creation (if you do that on the application server) and prevents the application server from targeting particular IP address.
To work around these issues we can also introduce some custom HTTP headers. In the case of the lost client IP this means setting a X-Forwarded-For
request header (you can name it anything but that’s the standard name) on the load balancer before it’s relayed to the backend application server. You may additionally need to also create a X-Real-IP
(a still common name) header of the same value, depending on your application’s requirements.
For identifying the client’s connection protocol, the standard practice is to set the X-Forwarded-Proto
header on the load balancer and the backend server can take whatever steps are required to do the right thing on the backend server. One example would be adding logic to the wp-config.php
for a WordPress website or settings.php
for Drupal which will allow it to construct links back to itself that contain the proper application protocol for clients to use.
Modified with our new criteria, our haproxy.cfg
configuration file should look like this:
listen http-default bind *:80 mode http option forwardfor reqadd X-Forwarded-Proto:\ http server node02 node02:80 server node03 node03:80
This looks pretty much the same as the first one except, we’ve added two new keywords to this load balance:
option forwardfor
modifies the HTTP handling (invoked by our use ofmode http
) such that a new header namedX-Forwarded-For
is added the the request before being handed off to the backend. This header will be set to the visiting user’s IP address.reqadd
is used to also add theX-Forwarded-Proto
header to the request before it’s sent to the backend. You’ll note that we had to escape the space between the colon and the header value. This would be the case for any tabs or spaces in the header value.
Access Control Lists
Don’t let the name “access control lists” fool you. With haproxy, ACL’s are used for evaluating any sort of conditional logic by reducing all decisions into TRUE/FALSE results that can then be embedded into other directives as arguments in order to modify their behavior. Most directives support this conditional execution via if
which only evaluates the given directive if the ACL result was TRUE and unless
which only evaluates the directive if the result was FALSE.
The general format for an ACL entry is:
acl <aclName> <fetch method>,<optional converter> <pattern>
More complicated ACL’s will take advantage of more complicated aspects of acl
syntax but starting out that’s a good mental model to start with. Each component is fairly obvious: You name the ACL, you specify a fetch method optionally running it through a comma-separated list of converters and then finally you have the pattern to match the output of that fetch method + optional converter coupling.
Let’s illustrate this idea with an ACL inside of a listen
block:
listen testsite mode http bind 0.0.0.0:80 acl urlBegin path_beg /index.html http-response set-header X-Test-Header Index\ Page if urlBegin http-response set-header X-Test-Header Some\ Other\ Page unless urlBegin server node01 node01:80 server node02 node02:80 server node03 node03:80
In the above, we’ve added two new keywords to our repertoire:
acl
marks the beginning of an access control entry. Here we create a new ACL calledurlBegin
and use thepath_beg
fetch method (a ACL-only version of thepath
method mentioned above) to match the beginning of the request URL against the static string/index.html
- We use
http-response
to interact with haproxy’s HTTP response back to the client.- Specifically we’re using
set-header
to add aX-Test-Header
response header. - At the end of each
http-response
directive we add conditional logic in the form of anif
on the first one and anunless
(the inverse) on the second one so thatX-Test-Header
will return different values based on the result of theurlBegin
ACL.
- Specifically we’re using
The above example probably isn’t useful on its own but it shows you the MO with acl
directives. You use the matching/extraction capabilities of acl
to establish the truth or falsehood of a particular fact and embed the conditional expression in the relevant keyword’s argument list.
Let’s create a new ACL for setting the same header based on the requesting user’s subnet:
listen testsite mode http bind 0.0.0.0:80 acl userNetwork src,ipmask(255.255.255.0) 192.168.122.0 http-response set-header X-Test-Header Local\ User if userNetwork http-response set-header X-Test-Header Internet\ User unless userNetwork server node01 node01:80 server node02 node02:80 server node03 node03:80
You’ll notice it’s much the same syntax as before, except our ACL has changed by using some of the fetch methods we’ve described before. Of course http-response
isn’t the only directive you can use (most directives will support if
and unless
conditions) but I kept the two examples as similar as possible to illustrate the structure and how to bring this all together.
As this article progresses, you’ll see more advanced and meaningful use of access control lists but the above should give you a solid idea of how to get everything done.
More information about haproxy Access Control Lists…
Routing Traffic
Weighted Preference and Backup Nodes
OK, so now that we know how haproxy is structured and how to do a basic load balance, let’s push it a bit further. Let’s say you knew particular backends were more powerful than others. For instance, you may have two backend servers, one on a new machine with two sockets and four cores each, meanwhile the other is a single socket dual core. Obviously, if that’s all you have then you have to work with then that’s all that you have.
That presents a problem though. In our current configuration we’re redirecting traffic equally to each node even though we already know one is usually going to be the better choice. Ideally, you’d want the dual socket machine to handle most requests, with the single socket machine just easing the pressure off its big brother or potentially take over the workload entirely if the higher capacity backend needs to be restarted.
To get there, we need to begin shaping the routing decisions by introducing the concept of weight and backup nodes. These both function pretty intuitively, let’s look at an example listen
block:
listen testsite mode http bind 0.0.0.0:80 server node01 node01:80 server node02 node02:80 weight 2 server node03 node03:80 backup
You’ll notice a few things here:
- The backend server
node01
is as vanilla as it gets. All it specifies is the backend host and port. Sinceweight
isn’t specified it’s assigned a weight of 1 by default. - Our
node02
backend however now has a weight of2
assigned. Sincenode01
‘s weight is one, this means that the ratio of requests served bynode02
to those serviced bynode01
is 2:1 makingnode02
service twice the load given tonode01
. - Our final node, is similarly not given a weight but is marked as
backup
meaning it won’t participate in the load balance unless both the nodes above have failed their health check (more on that later). This configuration is referred to asactive-passive
since the backup node is just sitting there passively doing nothing. This enables you to economize by having the same backup system for different application servers.
Sticky Sessions
OK so now we can route traffic with pre-determined preference and establish an active-passive
configuration for our backend servers by specifying one node as the passive node. Sometimes though you need to continuously route the same user to the same backend to preserve user session data (for applications that can’t or don’t support sharing user sessions). To do this we need to make user sessions temporarily “stick” to a particular application server.
There are two generally used means of doing this:
- Using the source IP address to route the traffic to the backend node. This has the benefit of being protocol agnostic however for mobile clients (such as WiFi or cellular) their IP address may conceivably change during the life of the session which will obviously break if the user goes to the next building over and tries to resume what they were doing before.
- Using an HTTP cookie to tie the user to a backend service. This has the benefit of staying with the user no matter the change in their networking situation, but obviously is limited to the HTTP protocol and even then only clients that will accept the cookies that you try to set.
Let’s look at each option in detail.
Balance Source
Let’s look at a simple example listen
block:
listen testsite mode http bind 0.0.0.0:80 balance source server node01 node01:80 server node02 node02:80 server node03 node03:80
You’ll notice we’ve only added a single directive here and it’s balance source
. This is pretty straight forward. It changes the load balancing algorithm from a round robin to one where an in-memory hash table is creating mapping a hash of IP addresses and the backend servers they’re associated with. When a request from a new client comes in the normal round-robin logic applies (meaning fudging weight
should still produce the same workload distribution).
Introducing Stick Tables
The above approach to IP-based routing has an issue though. While it’s definitely the easiest way to get IP-based load balancing, due to the internal implementation if anything about the load balancing pool changes (availability, weight, algorithm, etc) then all clients are assigned to new servers. Granted this is ideally a rare occurrence and technically results in no down time but your users may be annoyed by the lost session. We can take more granular control over the load balancing process by using haproxy’s stick tables.
Put simply, stick tables are in-memory databases that store client information. If you’ve configured peer
groups (mentioned later) the stick tables can even be shared amongst them (allowing stickiness to persist even if you lose a load balancer). Stickiness isn’t their only use though and that will be expanded upon in later sections.
So let’s look at an example configuration:
listen testsite mode http bind 0.0.0.0:80 stick-table type ip size 1m expire 60m stick on src server web01 web01:80 check server web02 web02:80 check server web03 web03:80 check
We’ve only introduced two new keywords here:
stick-table
which creates and defines the in-memory database.- If you don’t give it a table name, it will default to naming the stick table after the
listen
/backend
section name. In our case above, it will default totestsite
. - We specify a table type of
ip
to signify that the key we’ll be searching for on each request will be an IP address. There are many types you may set your table to (string
,binary
,integer
, etc) but using theip
type specifically (instead of saystring
) allows haproxy to make optimizations and search the table more quickly. - The table will store a maximum of 1,048,576 entries via
size 1m
- Confusingly they use size suffixes to denote maximum entries. For example,
size 4g
would store a maximum of 4,294,967,296 entries and would not indicate a table of 4GB in size.
- Confusingly they use size suffixes to denote maximum entries. For example,
- Each entry will expire after an hour (
60m
)
- If you don’t give it a table name, it will default to naming the stick table after the
- Finally, we instruct haproxy to look at the value in
src
(fetch method mentioned earlier) for the key associated with this request.- Other fetch methods are possible. For example,
stick on req.hdr(X-Backend)
would instruct haproxy to use the contents of theX-Backend
request header as the key (of course the table type would need to be changed as appropriate as well).
- Other fetch methods are possible. For example,
So when do you use balance source
and when do you use stick tables for IP-based stickiness? A good rule of thumb would be: if you’re just looking for simple persistence on a low priority application, then balance source
yields fewer lines using code the purpose of which is obvious. If your application is of any meaningful importance though, use stick tables and comment your code.
HTTP Cookie Persistence
But what about the other method, using an HTTP cookie? Let’s take a look at a listen
block for that:
listen testsite mode http bind 0.0.0.0:80 cookie backendServer insert server node01 node01:80 cookie s1 server node02 node02:80 cookie s2 server node03 node03:80 cookie s3
We’ve got two changes here:
- A new
cookie
directive, which causes haproxy to check for the presence of an HTTP cookie of namebackendServer.
- This cookie will contain a simple text value for matching the corresponding server below.
- On its own this isn’t enough though. To support applications managing the persistence cookie (for example sending them to a particular server to perform an operation) by default all
cookie
does is obey the cookie if it’s there. To get it to add a cookie when it’s not present we’ve added theinsert
argument. - You probably also want to use the
nocache
argument so that when the cookie must be set, downstream caching servers are informed (via HTTP headers) that the response should not be cached (as each client should have its own cookie value).
- In our
server
directives we’ve added another set of arguments calledcookie
which establishes for each webserver what cookie value corresponds to this particular server. For example ifbackendServer
containss2
then the user will always be served byweb02
meanwhile users with a cookie value ofs3
will be served byweb03
then cookie values are arbitrary and need only be unique within this particular load balance.
“Least Connections” vs “Round Robin”
OK so let’s imagine our web application was truly clustered. The application server doesn’t matter since each one should be able to service each application request equally well. However all requests aren’t the same. Some requests are just quick GET
requests for things like CSS or Javascript whereas others kick off long CPU or memory intensive jobs. If we were to spread the load around based on the number of incoming requests we run the risk of overloading a particular server if it happens to get several resource intensive requests in a row.
In an effort to spread the load around a little more evenly (rather than just the requests) we can switch to using “least connections” as the load balancing metric. This is done with the idea that most web requests are synchronous in nature therefore while not a 1:1 relationship resource intensive-tasks will tend to have requests that hold the connection open long and thus the connection count is a good metric to select against when picking a backend server.
This is as simple as overriding the default roundrobin
algorithm with leastconn
using the balance directive:
listen testsite mode http bind 0.0.0.0:80 balance leastconn server node01 node01:80 server node02 node02:80 server node03 node03:80
You’ll notice that we’re using the same directive as when we using IP-based persistence (with balance source
). Due to this haproxy will treat leastconn
and source
as mutually exclusive propositions with source
using a round robin algorithm when first selecting the user’s backend server. This makes sense when you think about it though as the current number of open connections is transient and establishing long term persistence based on transient metrics could lead to cluster instability later on when congestion on a particular server clears up.
Health Checks
Finally, when it comes to routing traffic you need to make sure your backends are successfully accomplish the task. By default haproxy will continue attempting to deliver a request to a particular web server until three attempts have failed at which point it will mark the server as “down” and stop trying to deliver web traffic. After being marked “down” the haproxy will attempt to connect to the web server every two seconds and two consecutive successes will cause the server to be marked as ready again.
By default, no server is actively checked and instead a “failure” will be an attempt to proxy a request to the given server. This isn’t usually ideal since obviously it would be helpful if the load balancer proactively found out an backend server wasn’t there anymore rather than stalling an actual client request until haproxy has to pick a different server. To do a simple TCP check on each server you need only add a check
argument to each backend server definition. A simple example of this would be:
listen testsite mode http bind 0.0.0.0:80 server node01 node01:80 check server node02 node02:80 check server node03 node03:80 check
Which will attempt to connect over TCP every two seconds. However it’s usually a good idea to test application-level availability. Let’s say you had a request that should always succeed (let’s say GET /rss.xml
) and it can be ran many many times with no deleterious effect on the backend application. We can use the option httpchk
directive in our listen block to instruct haproxy to issue an HTTP-level request which by default will be OPTIONS /
(which checks for HTTP availability without triggering any actions within the application.
To implement our desired health check though we might create a listen block similar to this:
listen testsite mode http bind 0.0.0.0:80 option httpchk GET /rss.xml server node01 node01:80 check rise 2 fall 2 inter 5000 server node02 node02:80 check rise 2 fall 2 inter 5000 server node03 node03:80 check rise 2 fall 2 inter 5000
Let’s break down the changes here:
- The
option httpchk GET /rss.xml
directive not only activates HTTP-level health checks (httpchk
) but goes on to specify the HTTP request method and URI to use in the health check. Since we’re using the application’s ability to generate an RSS feed to determine liveness we’ll just have haproxy periodically download a copy. - We’ve added some new arguments to the
server
directive for each backendcheck
enables health checks for this serverrise 2
establishes that two consecutive successful downloads of the RSS feed will result in the server being marked as alive (this is the default but made explicit here).fall 2
establishes that two consecutive failures will cause the server to be marked as down.inter 5000
modifies the health check interval from the default of 2000ms (2 seconds) to 5000ms (5 seconds) to account for possible generation time of the RSS feed.
A Brief Note on External Health Checks
So the above will actually work for most people but occasionally you may need to execute fairly arbitrary instructions are part of a health check. These are referred to as “external checks.”
Essentially external checks work by haproxy periodically invoking (as whatever user `haproxy is running as) a particular executable instead of using internal logic. This executable will be passed environmental variables such as $HAPROXY_SERVER_NAME
and $HAPROXY_SERVER_PORT
and then the return status will be used to determine success or failure.
To enable external checks a few things need to happen:
global external-check listen testsite mode http bind 0.0.0.0:80 option external-check external-check path "/usr/bin:/bin" external-check command /tmp/test.sh server web01 web01:80 check server web02 web02:80 check server web03 web03:80 check
Let’s break this one down now:
- We’ve added an option to the
global
section calledexternal-check
which enables allbackend
andlisten
sections to use external checks. Supposedly this is a security feature. - Inside our
listen
block we’ve added three health-check related directivesoption external-check
enables this section to implement an external checkexternal-check path "/usr/bin:/bin"
sets the path for the executable, otherwise$PATH
will be empty when the executable runs.external-check command
sets the actual command to run, in this case a script.
Since external checks can get involved I won’t write a fully functional version here but I provide this an an explanation of what they are.
Redirection
I’ve explained basic application routing in nginx before, but since there isn’t enough “application routing” content relating to haproxy, I’ll include it here.
The keyword for performing any sort of HTTP redirect is simply redirect
and in it’s most basic form looks something like this:
listen testsite mode http bind 0.0.0.0:80 acl redirect_page path_beg /redirect redirect location https://google.com if redirect_page server web01 web01:80 check server web02 web02:80 check server web03 web03:80 check
Breaking the new stuff down:
- We establish an ACL for matching the current request against a URI of
/redirect
- We then use
redirect location
- For this direct we directly give haproxy the value it needs to put into the
Location:
header. - We make this directive contingent upon the previous
redirect_page
ACL matching. - Optionally we could have specified a specific HTTP return code using the
code
argument after the URL. For example:redirect location https://google.com code 301 if redirect_page
- For this direct we directly give haproxy the value it needs to put into the
Let’s assume you want to be a little bit more terse or avoid putting hardcoded values into your URL if at all possible. The redirect
keyword supports two other options:
prefix
for modifying everything before the URI (for example thehttp://example.com
inhttp://example.com/myPage.php
)scheme
for modifying only the scheme (i.e protocol) portion of the URL.
A demonstrative example using all the above forms of redirect
might look something like this:
listen testsite mode http bind 0.0.0.0:80 acl ssl_page ssl_fc 0 acl redirect_page path_beg /redirect acl short_url path_beg /short redirect scheme https code 301 unless ssl_page redirect location https://google.com/search code 302 if redirect_page redirect prefix http://example.com/evenLongerURL if short_url server web01 web01:80 check server web02 web02:80 check server web03 web03:80 check
Explaining each of the redirect directives:
- The first URL updates the scheme in the URL to
https://
unless the user is already accessing the site over SSL. This meanshttp://example.com/myPage.php
will turn intohttps://example.com/myPage.php
. Additionally, we set an HTTP status code of301
so that this redirect is made permanent by the client. This causes search engines to only index the HTTPS version of the site. - The second one is the same redirect from before except this time we’re setting the HTTP status code to
302
so that browsers and search engines don’t permanently store this redirection. This is default but we’ve made it explicit here. - The third
redirect
changes the prefix of the URL to the one given above. For example, requests tohttps://example.com/short
will be automatically redirected tohttps://example.com/evenLongerURL/short
due to haproxy preservice the URI (the/short
in the original URL) and prefixing it with the one given in this directive.
SSL Termination
Since nginx has plenty of guides on terminating SSL, I didn’t cover that in it’s load balancer article. Similar to the redirect section above haproxy will never have SSL covered outside of load balancing and therefore I’ve decided to cover it here.
Basic no-frills SSL termination:
listen testsite mode http bind 0.0.0.0:443 ssl crt /etc/haproxy/server.pem server node01 node01:80 server node02 node02:80 server node03 node03:80
The only relevant changes to this from the “Basic Load Balancing” example are on the bind
line. The ssl
argument operates by itself (i.e no value) and activates the SSL engine. The crt
argument points to the file containing both the certificate and the private key. In general order doesn’t matter but if there are intermediate certificates the order should be: Intermediate certificate (starting from root and going to the actual signing CA) the the server certificate and then the private key.
If you want to fine tune the ciphers used for SSL or disable particular SSL versions you would use the ssl-default-bind-ciphers
and ssl-default-bind-options
directives respectively (both located in the globals
section). For example:
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS ssl-default-bind-options ssl-min-ver TLSv1.0
The above enables cipher suites such as AES256+Diffie-Hellman while also disabling cipher suties that use algorithms such as MD5 or DSS. The second line explicitly enforces a minimum protocol version of TLS v1.0.
Please keep client compatibility in mind when tuning these parameters. Better security is obviously better but it’s also important that your load balancer be able to negotiate some kind of connection, otherwise user experience will be impacted (to the point of being non-existent).
Load Balancing Non-HTTP Traffic
Of course, it’s not only web sites that need load balancing. Many other services can cause customer-facing outages should they fail and thus would benefit from attempts to maximize availability. The following is only a partial list to give you an idea of the “haproxy approach” to balancing non-HTTP TCP traffic. It’s by no means exhaustive and only represents information I wish I had when I began my load balancing adventure.
MySQL Cluster Access
In principle, load balancing MySQL/MariaDB isn’t much different than HTTP. Let’s look at a basic load balance:
listen testsite mode tcp bind 0.0.0.0:3306 server web01 web01:3306 check server web02 web02:3306 check server web03 web03:3306 check
You’ll recognize almost all of that, all that’s really changed from our HTTP example is that we now have to switch over to tcp
mode and the port number on the bind
and backend sockets. There’s a problem here though and it’s with the health check.
Since tcp
mode will continually test the backend’s availability by opening and closing a TCP connection (every two seconds by default) without actually sending anything over the socket many times MySQL will interpret this as a connection error. This results in situations where the load balancer itself becomes blacklisted and you start getting errors such as:
joel@lb01:~$ mysql -h lb01 -u rascaldev ERROR 1129 (HY000): Host '192.168.122.1' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'
Ugly. Well we can get around this by authenticating to MySQL and thus testing both application-level availability as well as getting around the “connection error” issue above. Let’s modify our load balance to suit:
listen testsite mode tcp bind 0.0.0.0:3306 option mysql-check user haproxy server web01 web01:3306 check server web02 web02:3306 check server web03 web03:3306 check
This configuration supposes you’ve created a passwordless user that haproxy can use to connect as. For security reasons, you’ll probably want to give this user access to nothing and use the HBAC portion of their username to restrict logins to IP addresses associated with the load balancer (for example CREATE USER 'haproxy'@'192.168.122.%;
)
Once restarted your MySQL nodes should show haproxy connecting as the given user and then immediately quitting gracefully. For example from the general_log
in MySQL:
211 Connect haproxy@192.168.122.11 as anonymous on 211 Quit
Obviously, you may want more complicated checks that either access privileged functions (thus necessitating complex queries and passwords/SSL) but to implement that you can use the same external-check
functionality mentioned above.
Redis Cluster Access
Alright, so let’s go with something haproxy doesn’t have a native check for. Let’s assume you have a Redis cluster that you want to provide load balanced access to. A decent bare bones configuration might look like this:
listen testsite mode tcp bind 0.0.0.0:6379 option tcp-check tcp-check send PING\r\n tcp-check expect string +PONG tcp-check send QUIT\r\n server web01 web01:6379 check server web02 web02:6379 check server web03 web03:6379 check
You’ll notice we added two things:
- An
option tcp-check
directive for using the health check defined by subsequenttcp-check
keywords to be used in lieu of a simple TCP connection test. - We have three instances of
tcp-check
:- First we
send
a string consisting ofPING
followed by a carriage return and new line. - Then we instruct haproxy to
expect
the backend to reply with a+PONG
(in the redis protocol lines that start with+
indicate the operation was a success while-
indicates failure). Since theexpect
will pass if the givenstring
is anywhere in the response we don’t need to include the\r\n
will indeed send in its own response. - Finally we gracefully close the connect by
send
‘ing aQUIT
command.
- First we
As you can see, you can emulate many different simple conversations for text-based protocols. The tcp-check
keyword also supports binary protocols. To test a binary protocol’s functionality you can replace send
with send-binary
followed by backslashed hex codes representing the binary data and when receiving replace expect string
with expect binary
also with hex codes as arguments.
Gatekeeping
As mentioned in the nginx article, having a load balancer out front also presents you with an opportunity to protect the web app from the evil, dark, and hostile forces of the internet. It also provides you a common point to modify routing and to enhance performance through caching and rate limiting.
Caching
haproxy merely strives to be a load balancer. It leaves caching (full page or section-based) to other applications such as the backend application server or some sort of Varnish frontend. Given that goal, in general haproxy only caches two things:
- Session information (such as SSL state)
- Small content objects such as favicons.
Implementing a content cache isn’t too difficult (just define the cache then tie your backend
or listen
section’s http-response
and http-request
to it). I won’t show it here though since it’s intentionally so limited.
For SSL caching, if you deal with a high volume of concurrent users of an HTTPS website you might want to tweak haproxy’s native SSL caching. By default, haproxy will cache 20,000 sessions for 300 seconds (five minutes) and will immediately attempt to re-use this session (including symmetric key and ciphersuite) when a client re-connects. This saves a lot of latency due to less calculate and less communication from client to server. Since this functionality is native you only need to tweak global
parameters such as either the size with the tune.ssl.cachesize
keyword or the cache lifetime with tune.ssl.lifetime
.
Returning to Stick Tables
Beyond session persistence, you can use stick tables to both secure your applications and limit the rate of traffic as it goes through the load balancer. These options aren’t like the nginx options, where you can limit resource utilization for legitimate clients and slow them down. haproxy’s stick table-based controls do help one to secure the application against denial of service attacks or vulnerability scans.
Since stick tables are essentially just in-memory database tables optimized for quick lookups, you can use them for retrieving data associated with a visitor’s record. Unfortunately you can’t store arbitrary data (outside of the key value itself) but haproxy does give us some handy options to feed the stick-table
directive (via the store
argument) when we’re creating it. An abbreviated list would be:
gpc0
/gpc1
are a generic unsigned 32-bit integers to associate with the record.gpc0_rate
/gpc1_rate
are read-only unsigned 32-bit integers that indicate how fast their respective counters are growing. Takes a single argument (during table definition) that indicates the rolling period being monitored.conn_cnt
/conn_curr
unsigned 32-bit integer indicating the total number of connections received and the total number of connections currently open.conn_rate
similar togpc0_rate
but indicates the rate at which new connections are created. Takes a single argument (during table definition) that indicates the rolling period being monitored.-
http_req_cnt
/http_req_rate
HTTP-level analogs for the similarly named data types above. Due to request pipelining (or multiplexing with HTTP2) these should be preferred for web applications.
There are additional data types for bandwidth counting, but these are the most useful in my experience. To be able to use this data, though you need to be able enable tracking. Otherwise the data type will be reserved in the stick table but things like conn_cnt
won’t be populated.
Security
OK so now that we’re storing more information in the stick table let’s use it. Let’s start by implementing rudimentary DoS protect. Let’s do this by rejecting HTTP requests in excess of 100 per second. This would be an example configuration:
listen testsite mode http bind 0.0.0.0:80 stick-table type ip size 30k expire 5m store conn_rate(10s) http-request track-sc0 src http-request deny if { src_conn_rate gt 100 } server web01 web01:80 check server web02 web02:80 check server web03 web03:80 check
Let’s take the important parts one at a time:
- We create a new stick table with the
stick-table
keyword- We use
type ip
because we’re going to use the IP address as the record’s key. - We have 30,720 possible entries (via
size 30k
) which expire after five minutes (expire 5m
). - In addition to the backend server assigned (aka
server_id
) we’re storing a rolling period of 1 second for storing the connection rate (akaconn_rate(10s)
)
- We use
- We instruct haproxy to begin tracking the current connection by associating it with the record matching the client IP address (
src
).- To do this we associate the client IP with the “stick counter” number 0:
http-request track-sc0
- To do this we associate the client IP with the “stick counter” number 0:
- We then instruct haproxy to deny the latest HTTP request if the
src_conn_rate
is above 100. Since our rolling period is one second this will only happen if they’ve exceeded our limit.- To accomplish this we use a new option for
http-request
calleddeny
based on whether or not the given ACL evaluates to TRUE. - In this case rather than using a predefined ACL (since this is the only place we’re going to use it) we’re using an anonymous ACL by placing the logic within curly brackets on the same keyword line.
- By rejecting at the HTTP level even if the user is in the middle of a pipeline of HTTP requests, their requests begin to be denied.
- Ideally you would also pair this with
tcp-request
which would stop a successful TCP connection from being formed. Denying at the HTTP level stops abusive HTTP pipelines but blocking at the TCP level is preferable due to lower resource requirements given that higher level functionality is never invoked.
- To accomplish this we use a new option for
OK so the above does an alright job of kicking abusive users to the curb, but once our one second window is over with then they’re right back. We need a more permanent record of abusive users.
To accomplish this we need to use the aforementioned general purpose counter gpc0
to flag particular users as abusive on a more permanent basis. Doing this in the same stick table we’ve defined above would work but presents a traffic problem. If your haproxy instance is clustered, all updates to shared stick tables must be communicated to all peers in the cluster. If each and every request a user makes regardless of intent causes an update to the table (as is the case with monitoring the conn_rate
) then that puts a lot of stress on the network and makes the cluster more fragile as a result.
To get around this we can structure our stick tables such that conn_rate
is tracked by an unshared stick table, whereas the gpc0
value that flags abusiveness can be shared by itself. In the event that we lose the active load balancer and there’s a fail over, abusive users will remain blocked and the new load balancer will just lose the 1 second of conn_rate
tracking the old load balancer had stored in memory.
Let’s look at what the configuration for something like that would look like:
listen testsite mode http bind 0.0.0.0:80 stick-table type ip size 300k expire 60m store conn_rate(1s) http-request track-sc0 src ## Set or Retrieve abuser status. When flag_abuser is evaluated gpc0 will be incremented acl flag_abuser src_inc_gpc0(abuse) ge 0 acl is_abuser src_get_gpc0(abuse) gt 0 ## Define Abusive behavior acl path_too_long path_len 15: acl above_request_limit src_conn_rate gt 20 acl wp_login path_beg /wp-login.php acl valid_auth_cookie req.cook(flash) thunder ## Evaluate abuse ACL's dropping the connection where appropriate. http-request silent-drop if path_too_long flag_abuser http-request silent-drop if wp_login flag_abuser http-request silent-drop if !valid_auth_cookie { path_beg /admin } flag_abuser http-request silent-drop if !valid_auth_cookie { path_beg /moderation } flag_abuser http-request silent-drop if above_request_limit flag_abuser ## Categorical denial at the TCP levels for abusers tcp-request connection silent-drop if is_abuser server web01 web01:80 check server web02 web02:80 check server web03 web03:80 check backend abuse stick-table type ip size 10k expire 120m store gpc0
OK wow. There’s a whole lot more to dig into:
- We’re creating two stick tables,
- Both are using IP addresses for the lookup key.
- One stick table for the high frequency connection rate tracking. We’re capable of tracking 307,200 connections with entries expiring after 10 minutes and connection rates being tracked within a 1 second window.
- The other is in an otherwise empty
backend
section calledabuse
. haproxy only allows a single stick table per section but allows you to refer to other sections’ stick tables so this is the only way to create a second one. No actual configuration of backend servers is required though, it’s just an empty container for the stuff we’re putting into it.
- We initiate the actual tracking with
http-request track-sc0 src
which loads the actual requester’s IP address (again fromsrc
) into memory as the lookup key and retrieves the entries in the two tables specified with our previousstick on
keywords once layer 7 processing has started. At this pointsrc_*
fetch methods will pull data from these two tables. - We define two ACL’s for modifying and querying the abuse status for this user
- The
flag_abuser
ACL will executesrc_inc_gpc0
in order to increment thegpc0
counter in all tables. The ACL itself will always evaluate toTRUE
due to the actual ACL logic checking to see if the value returned (the value ofgpc0
) is above zero which it will always be even if the entry was previous non-existent (which evaluates to zero). This code will only be evaluated if the ACL itself must be evaluated. This becomes important behavior in a bit. - The
is_abuser
ACL is a true ACL in the sense that it checks a boolean status instead of being an indirect means of executing code. In this case it checks the current value ofgpc0
and returnsTRUE
only ifflag_abuser
has been called on this IP address before.
- The
- We then define three ACL’s that would qualify a user as “abusive” with our app. Real world examples would likely be more complicated. I won’t dissect each one here since the syntax should be obvious. If you’re uncertain of what an ACL is doing, please refer either to the haproxy documentation (linked below) or the “Access Control Lists” section above.
- We then use a
http-request silent-drop if
pattern for each of our ACL’s.- Technically, we could do this inline as well instead of defining named ACL’s like I’ve done but when configurations get complicated I prefer named ACL’s as they help produce self-documenting code.
- In each case, we’re stringing together multiple ACL’s with logical AND operations. With haproxy, if you list multiple ACL values without any sort of logical operator there’s an implicit AND operation between the ACL’s where the preceding ACL expression must return
TRUE
before the following ACL’s will be evaluated. - The first directive enforces a maximum URL length of 15 characters (including preceding slash). This is useful for attacks which involve crafting abnormally large URL’s. To determine the application’s enforced maximum URL length allow your application to be used under realistic conditions for at least a month, gather all the paths from your logs, take the longest path length present and add 10-15 characters of buffer space.
- The second directive simply will first check if the request path looks like someone trying to find a wordpress login page. If this evaluates to
TRUE
thenflag_abuser
will be called, otherwiseflag_abuser
is left alone. - The third and fourth directive complicate things a bit by introducing both negation and the mixing anonymous ACL’s and named ACL’s. In this one if the
valid_auth_cookie
ACL does NOT returnTRUE
(i.e. the user’s cookie is invalid) then their request will be checked to see if it’s for a path that’s sensitive within our application if that DOES returnTRUE
thenflag_abuser
will be called. - Finally, we check the
above_request_limit
ACL which will returnTRUE
if the user has issued more than 20 requests during the rolling window for monitoring theconn_rate
for this user. - In every directive
flag_user
is only called if ACL validation makes it that far which means all of our criteria for an abusive user was met and we’re good to incrementgpc0
and perform the default action for the directive. - The
silent-drop
directive has the effect of immediately dropping the TCP connection without sending any TCP reset packets or HTTP errors to the client. In effect their connection must just time out. - If each connection is established and then just dropped though, this can leave stale connections in intermediate routers and firewalls though. This in turn may create an opportunity for Denial-of-Service if they know that’s what you’re doing. For this reason you may prefer
tarpit
instead ofsilent-drop
which simulates a failure of a backend server and issues a 500 error. The question of which you would want is dependent upon what kinds of attacks you expect to happen.
- Regardless of whether the HTTP connection is silently dropped or tarpitted, you can safely drop all future connections from abusive users. When the connections don’t complete routers and firewalls should purge them from their tables as required.
- We’re doing this using
tcp-request connection silent-drop
based upon the client IP’s abuse status.
- We’re doing this using
As you can imagine it can get much much more complicated than the above, especially when you’re load balancing multiple applications, but the above should give you a firm basis to start securing your applications on the load balancer itself.
HAProxy API
Often times you might want to administer your load balancer using kid gloves. For instance, you may want to modify the behavior of a particular website without affecting absolutely all websites being load balanced through haproxy or safely take a node offline by first draining any open connections from it while leaving the load balance itself online.
To accomplish this haproxy provides a simple text-based command API for interacting with the running state of the load balancer. There’s also a RESTful API available for the enterprise ALOHA implementation but I won’t be describing that here for the same reason I didn’t describe the nginx API either.
Admin Socket Configuration
There are two ways to interact with the API: over a unix domain socket or over a TCP socket.
You configure each the same way, by including a stats socket
directive in the global
section of your configuration file and ensure that its privilege level is set to admin
. Take this example globals
section:
globals stats socket ipv4@0.0.0.0:1234 level admin stats socket /run/haproxy/admin.sock mode 0600 level admin
This creates two different admin gateways:
- The first directive instructs haproxy to open a
stats
socket on all available IP addresses binding to port1234
and giving connections to this port admin-level access to the load balancer.- You can interact with this socket by way of netcat. For example, to print the load balancing metrics over TCP:
echo show stat | nc localhost 1234
- Given the lack of encryption or authentication, you probably want to either forgo this option in production setups or at least configure some strict HBAC controls (such as port knocking) so that only trusted systems can communicate with the client.
- You can interact with this socket by way of netcat. For example, to print the load balancing metrics over TCP:
- The second directive creates the same admin socket using a UNIX Domain socket located
/run/haproxy/admin.sock
- In addition to
level admin
we can set the mode of the socket to0600
so only the root user can communicate with it. - To communicate over this socket you can use the
socat
utility. For example:echo show stat | socat /run/haproxy/admin.sock -
- In addition to
In the case of the TCP socket, additional security can be enabled by enabling SSL on this socket the same as mentioned above. For example:
stats socket ipv4@0.0.0.0:1234 level admin ssl crt /etc/haproxy/server.pem
If you enable SSL you will need to use openssl s_client
instead of netcat to communicate with haproxy. For example:
echo show stat | openssl s_client -ign_eof -connect localhost:1234
The -ign_eof
overrides some non-intuitive behavior with openssl
that breaks communication with the API gateway.
As you can probably glean from the above, the text-based API operates by way of simple commands and arguments (similar to the bash
command line). To get a full list of possible commands just use the help
command with no options or check haproxy’s management guide.
ACL Manipulation
As stated much earlier in the post, ACL’s can execute arbitrary code (as with the gpc0
stick table counter before) but ultimately provide TRUE
or FALSE
results that later conditionals can reference to determine their appropriate behavior via if
and unless
clauses. haproxy doesn’t support ACL creation through the API but matching criteria can be added or removed as necessary. This allows for conditional logic to be primarily defined in the textual configuration but tweaked via the API.
Let’s take the example of being able to put a website into “lock down” mode if we determine a credibly threat is attempting to exploit the application. The lockdown mode should cause all requests to /admin
(the hypothetical application’s administration area) to be rejected. Let’s assume the listen
block looks like this:
listen testsite mode http bind 0.0.0.0:80 acl site_lockdown dst_port,mul(0) 1 http-request deny if { path_beg /admin } site_lockdown server web01 web01:80 check server web02 web02:80 check server web03 web03:80 check
The above has two important parts:
- A new ACL called
site_lockdown
is created.- It takes the local port the client has connected to (which is port
80
above but the important part is that it’s always an integer) and then multiples that integer with zero, thus always yielding zero. - This result is then compared against the last argument which is hardcoded to be
1
.- Since obviously
0 != 1
this ACL as written will always yield aFALSE
result.
- Since obviously
- It takes the local port the client has connected to (which is port
- We use
http-request
to deny access to any URL that begins with/admin
ifsite_lockdown
isTRUE
OK now that the ACL is in place and always returning a FALSE
(indicating the website is not in lockdown mode) let’s explore replacing this value.
First let’s enumerate the ACL’s defined on the load balancer with the show acl
command:
root@lb01:~# echo show acl | socat /run/haproxy/admin.sock - # id (file) description 0 () acl 'dst_port' file '/etc/haproxy/haproxy.cfg' line 9 1 () acl 'path_beg' file '/etc/haproxy/haproxy.cfg' line 11
OK, we can see our two ACL’s: the always-false dst_port
ACL and the inline anonymous ACL for our http-response
directive. Let’s take a closer look at the dst_port
ACL which has an ID of 0
above again using show acl
but this time giving the ACL index we’re interested in:
root@lb01:~# echo 'show acl #0' | socat /run/haproxy/admin.sock - 0x556df6060eb0 1
OK above we have a two column output. The first column is the memory location the ACL pattern is stored at and the second one is the matching pattern for this particular ACL. Let’s add a new matching pattern to the ACL with the add acl
command instructing haproxy to match against 0
instead:
root@lb01:~# echo 'add acl #0 0' | socat /run/haproxy/admin.sock - root@lb01:~# echo 'show acl #0' | socat /run/haproxy/admin.sock - 0x559e4a6d2f30 1 0x559e4a718760 0
Now that we’ve done that all patterns are logically OR’d together this new pattern will cause site_lockdown
to start being evaluated to TRUE
since dst_port,mul(0)
actually does match 0
. In turn this causes all directives conditional on site_lockdown
being true (such as our http-request
) to come into effect.
Once the drama subsides and you can delete the new pattern by instructing haproxy to delete all ACL entries that have a pattern of 0
like so:
root@lb01:~# echo 'del acl #0 0' | socat /run/haproxy/admin.sock -
Which should result in normal website functionality being restored.
Updating Maps
As mentioned before, you can use the map
fetch method to return arbitrary values for those returned by the standard fetch methods. For instance, let’s say we have the following listen
block:
listen testsite mode http bind 0.0.0.0:80 http-response add-header X-Extra %[src,ipmask(16),map(/var/tmp/mapfile.txt,"Default Value")] server web01 web01:80 check server web02 web02:80 check server web03 web03:80 check
The relevant part in this block is the addition of the X-Extra
response header:
- We use the
%[]
construct to embed the fetch method and converters’ return data.- This is as opposed to the
{}
construct used for anonymous ACL’s that we used in the previous section. There is no comparison yielding aTRUE
orFALSE
here (hence not an “ACL”).
- This is as opposed to the
- I’ve chained two converters here, this isn’t specific to the
%[]
construct or the use ofmap
per se, but is often useful in conjunction withmap
so that values undergo some level of sanitization so that we maximize the odds of correctly mapping to something inside of our map file - For the map itself, it is looking into the
/var/tmp/mapfile.txt
text file upon startup and loading all the map data inside that.- If no map key is located, a string of
Default Value
will be returned instead.
- If no map key is located, a string of
Let’s look at the contents of my mapfile.txt
file:
8.8.0.0 United States 192.166.0.0 Germany 10.0.0.0 Private Network 192.168.0.0 Private Network 201.8.2.0 Brazil
But let’s imagine the haproxy instance has been started and we just now noticed the problem with that last line. Our ipmask
converter reduces everything down to two octets meaning it will never match a201.8.2.0/24
because our previous converter will just reduce it to 201.8.0.0
. Let’s use the API to delete this effectively dead map entry and add the entry we meant to add.
First let’s enumerate all the maps currently running using show map
with no arguments:
root@lb01:~# echo 'show map' | nc localhost 1234 # id (file) description 0 (/var/tmp/mapfile.txt) pattern loaded from file '/var/tmp/mapfile.txt' used by map at file '/etc/haproxy/haproxy.cfg' line 9
OK so the only map available is the one we’re after and it’s been assigned an ID of 0
. Let’s look at the current contents of that map:
root@lb01:~# echo 'show map #0' | nc localhost 1234 0x5561ed38ab90 8.8.0.0 United States 0x5561ed392280 192.166.0.0 Germany 0x5561ed392300 10.0.0.0 Private Network 0x5561ed392380 192.168.0.0 Private Network 0x5561ed392400 201.8.2.0 Brazil
root@lb01:~# echo 'del map #0 #0x5561ed392400' | nc localhost 1234 root@lb01:~# echo 'show map #0' | nc localhost 1234 0x5561ed38ab90 8.8.0.0 United States 0x5561ed392280 192.166.0.0 Germany 0x5561ed392300 10.0.0.0 Private Network 0x5561ed392380 192.168.0.0 Private Network root@lb01:~# echo 'add map #0 201.8.0.0 Brazil' | nc localhost 1234 root@lb01:~# echo 'show map #0' | nc localhost 1234 0x5561ed38ab90 8.8.0.0 United States 0x5561ed392280 192.166.0.0 Germany 0x5561ed392300 10.0.0.0 Private Network 0x5561ed392380 192.168.0.0 Private Network 0x5561ed392400 201.8.0.0 Brazil
Alright, so we’ve fixed the map in-memory. We can now modify the flat-file to protect against regressions should haproxy
be restarted.
Managing Stick Tables
Let’s return to our previous stick table example where we had two separate stick tables one of which contains a gpc0
counter for flagging a user as abusive:
listen testsite mode http bind 0.0.0.0:80 stick-table type ip size 300k expire 60m store conn_rate(1s) http-request track-sc0 src ## Set or Retrieve abuser status. When flag_abuser is evaluated gpc0 will be incremented acl flag_abuser src_inc_gpc0(abuse) ge 0 acl is_abuser src_get_gpc0(abuse) gt 0 ## Define Abusive behavior acl path_too_long path_len 15: acl above_request_limit src_conn_rate gt 20 acl wp_login path_beg /wp-login.php acl valid_auth_cookie req.cook(flash) thunder ## Evaluate abuse ACL's dropping the connection where appropriate. http-request silent-drop if path_too_long flag_abuser http-request silent-drop if wp_login flag_abuser http-request silent-drop if !valid_auth_cookie { path_beg /admin } flag_abuser http-request silent-drop if !valid_auth_cookie { path_beg /moderation } flag_abuser http-request silent-drop if above_request_limit flag_abuser ## Categorical denial at the TCP levels for abusers tcp-request connection silent-drop if is_abuser server web01 web01:80 check server web02 web02:80 check server web03 web03:80 check backend abuse stick-table type ip size 10k expire 120m store gpc0
Let’s assume as administrators we’ve determined a particular IP address is abusive even though it doesn’t meet any of our regular criteria. To do so we must manually set gpc0
for the entry to a non-zero number which will cause our is_abuser
ACL to begin returning TRUE.
First let’s enumerate the tables and then inspect the one we’re interested in both with the show table
command:
root@lb01:~# echo show table | socat /run/haproxy/admin.sock - # table: testsite, type: ip, size:307200, used:1 # table: abuse, type: ip, size:10240, used:1 root@lb01:~# echo show table abuse | socat /run/haproxy/admin.sock - # table: abuse, type: ip, size:10240, used:1 0x559e0fe7a0d8: key=192.168.122.1 use=0 exp=7162942 server_id=1 gpc0=0
OK so we’re seeing both the abuse
table and the IP we’re wanting to flag (192.168.122.1
) above. To manually flag the user, we can just set gpc0
to a 1
using the set table
command:
root@lb01:~# echo set table abuse key 192.168.122.1 data.gpc0 1 | socat /run/haproxy/admin.sock - root@lb01:~# echo show table abuse | socat /run/haproxy/admin.sock - # table: abuse, type: ip, size:10240, used:1 0x559e0fe7a0d8: key=192.168.122.1 use=0 exp=7193806 server_id=1 gpc0=1
And that’s it, user has been blocked.
Please note that in our current configuration this won’t close any connection the abuser has currently open. In order to immediately reject all further communications from abusive parties you should change the tcp-request connection silent-drop if is_abuser
directive from connection
to content
(or add a duplicate directive that only changes that one thing) so that any data sent over TCP triggers the ACL check and closes the TCP connection.
API Limitations
As stated before there are many things this API can or can not do. For instance, you can not create new ACL’s or modify their fetch method/converter configuration. Similarly you can modify map files but you can’t create new ones. You also can’t make any configuration change that requires new directives to be introduced. Use of the API should be restricted to dynamic data that changes often and you don’t want said change to result in either a process reload (as with maps) or a configuration reload (as with most other changes). Thankfully configuration reloads are mostly non-destructive/non-invasive.
Outside of what has been mentioned above, the API can do basic backend server management, which we will go into more detail in the “Dynamic Member Management” section.
Logging, Alerts, and Monitoring
So once your application is up and running you need to be able to both monitor the load balancers themselves and log the traffic that’s going through them. haproxy is a little peculiar in both those categories so it bears going into.
Enabling Logging
One of the most peculiar things to me is haproxy’s approach to log management. The only means of getting log data out of haproxy is through syslog and unlike nginx
there’s no native option for either combined or common log formats. That means whatever you’re using for log extract (to elasticsearch or what have you) has to be capable of understanding the format haproxy give us.
The most basic way to enable logging globally, this is useful for monitoring the haproxy application itself (rather than log traffic going through the load balancer) and is configured as simply as providing the log
keyword in the global
section of your configuration:
global log /dev/log local0
The above instructs haproxy:
- To communicate (using the syslog protocol) with syslog over the
/dev/log
UNIX domain socket (the default syslog socket on most Linux systems) - To use facility
local0
. Obviously if you’re usinglocal0
for something already, change that part to one of the otherlocalX
facilities. - In the context of haproxy running in a container, your best bet is probably to either have something like
logstash
running sidecar in the same pod or establish a central syslog service and replace/dev/log
withipv4@x.x.x.x
wherex.x.x.x
is the IPv4 address of the service.
For request logging, logging must be enabled in either the frontend
or listen
block receiving the connection the same way it was enabled in global
. Meaning you can’t assume that just because you have a log
directive in your global
section that it will be inherited by all other relevant blocks in your configuration (which is how the other directive work). You can still point to the global log in your proxy sections but with the global
argument. For example this would be a sparse but complete configuration:
global log /dev/log local0 listen testsite mode http bind 0.0.0.0:80 log global server web01 web01:80 check server web02 web02:80 check server web03 web03:80 check
In the above we define the global logging mechanism as being syslog with local0
as the facility. This configuration produces this for requests HTTP that come in:
Aug 04 23:26:20 lb01 haproxy[7558]: Connect from 192.168.122.1:50240 to 192.168.122.11:80 (testsite/HTTP)
Obviously, the above is about as basic as a log gets, just letting you know the client socket and local socket they connect to (along with the listen
/frontend
block associated with that socket).
In the next section we’ll cover log formats getting more useful log information.
Log Formats
Alright so the default haproxy request log format is boring to say the least. So let’s see if we can do better. First let’s assume we have a HTTP web application that we’re load balancing. The most least-effort way of logging this information might be:
listen testsite mode http bind 0.0.0.0:80 log /dev/log local0 option httplog server web01 web01:80 check server web02 web02:80 check server web03 web03:80 check
The option httplog
is new can causes haproxy to produce logs such as:
Aug 05 20:24:30 lb01 haproxy[8565]: 192.168.122.1:55900 [05/Aug/2018:20:24:30.279] testsite testsite/web01 0/0/1/1/2 200 352 - - ---- 1/1/0/0/0 0/0 "GET / HTTP/1.1"
OK so now we’re getting more information albeit in the idiosyncratic haproxy format. This particular log entry breaks down to this:
Aug 05 20:24:30 lb01 haproxy[8565]:
normal syslog preamble.192.168.122.1:55900
The client IP address+port they’re connecting from[05/Aug/2018:20:24:30.279]
Additional date informationtestsite testsite/web01
Thefrontend
name followed by thebackend
name and the actual backend application server the request was proxied to.0/0/1/1/2
Request timing metrics:- First
0
is the time spent receiving the HTTP request from the client - Second
0
is the time the request spent waiting in queue. This would be non-zero if requests are being backlogged on the load balancerto spare the application server. - First
1
is the number of milliseconds it took haproxy to establish a connection toweb01
- Second
1
is the number of milliseconds haproxy spent waiting for the backend server to send an HTTP response - The final field, the
2
is the total number of milliseconds spent on this request. In this case it’s just the 1ms for connecting to the backend server and the 1ms for waiting untilweb01
sent a response.
- First
- In this case we have a
200
status code indicating success. - Our response from the point of haproxy to the client (including headers haproxy adds) is 352 bytes.
- The first two
-
is only used if you usecapture cookie
which I haven’t included a description of here. - The
----
is a placeholder for the connection state which would contain descriptive information if the connection aborted abnormally. 1/1/0/0/0
Connection Metrics- The first field is the number of active connections to the load balancer at the time of the request
- The second field is the number of active connections to this particular
frontend
orlisten
block. - Third field represents the number of backend connections that remain open to the application server at the point of logging. Since by default logging only happens after the request is completed and there are no other requests going to this load balancer it’s zero in my case.
- Fourth field is the number of connections still active to the backend server
- Finally, the fifth field is the number of retries haproxy had to make for this request. Typically zero unless something’s broken.
0/0
Backend Queue Metrics- First field is the “server queue” metric. If this is non-zero then it’s a measure of how many requests were queued for the backend server on the load balancer.
- If the second field is non-zero it’s the same measure but for all requests to the same pool.
- Finally, we have the first part of the HTTP request made so we can see what they actually were doing. In this case, just accessing the main index page at
/
.
OK, phew. Not that we understand that, let’s see if we can coerce it to look more like Apache or nginx logs using a custom format:
listen testsite mode http bind 0.0.0.0:80 log /dev/log local0 option httplog log-format "%ci - - [%trg] \"%r\" %ST %B \"-\" \"-\"" server web01 web01:80 server web02 web02:80 server web03 web03:80
I’m not going to explain Combined Log Format here (there are other resources for that) but let’s make note of some important points:
- The second and third fields (ident and REMOTE_USER respectively) are categorically nulled out since haproxy doesn’t have a way of logging this information. This is available to HTTP but haproxy has no way of referring to them.
- Similarly both the HTTP referer and User-Agent fields can’t be included.
- Please note that when a value can’t be included a hyphen represents null value and string values (like referer and User-Agent) are still quoted even if they’ll never be available.
OK but let’s say you were load balancing MySQL database access instead of HTTP. Luckily you can still log at the TCP level with haproxy:
listen testsite mode tcp bind 0.0.0.0:3306 log /dev/log local0 option tcplog server web01 web01:3306 server web02 web02:3306 server web03 web03:3306
Which produces logs such as:
Aug 05 21:15:34 lb01 haproxy[8760]: 192.168.122.1:35518 [05/Aug/2018:21:15:28.375] testsite testsite/web03 1/0/5652 273 -- 1/1/0/0/0 0/0
You’ll notice immediately that it looks almost exactly like the default HTTP log format. That’s more or less what it is with a few exceptions:
- The
1/0/5652
compound field represents “Queue time”, “Miliseconds to connect to the backend server”, and “Milliseconds lifespan off the TCP connection” respectively - The HTTP-specific fields such as status code, request, and cookie fields are not present
At this point you should have a basic understanding to get started on logging. Like I said early haproxy is a perhaps too complex topic but if you understand the above you should be able to address any logging problems you have to the furthest extent possible.
Alert Management
OK so we now have logging setup, let’s move onto getting alerted when something changes with the state of a load balance pool. Currently haproxy only supports email alerts natively, but external checks can also be written to include any manner of custom notification options outside of email. Since email alerts are the only native option, that’s the only one I’ll concentrate on here.
Let’s look at an example configuration that includes email alerts:
mailers localsmtp mailer mysmtp smtpgw.example.com:25 listen testsite mode http bind 0.0.0.0:80 email-alert mailers localsmtp email-alert to me@example.com email-alert from me@example.com server web01 web01:80 check server web02 web02:80 check server web03 web03:80 check
Breaking down the new parts:
- We’ve introduced a
mailers
section I’ve calledlocalsmtp
- Optionally, we can specify multiple SMTP servers here, but I’ve only created a single one called
mysmtp
- Optionally, we can specify multiple SMTP servers here, but I’ve only created a single one called
- We’ve introduced the
email-alert
keyword- The first one ties email alerts for this listen block to the
localsmtp
mailers we defined up top. - We also set the “To” and “From” (respectively)
- The first one ties email alerts for this listen block to the
- Finally, we ensure health checks since that’s what will trigger the alert.
The above generates an email similar to:
Subject: [HAproxy Alert] Server testsite/web03 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue Body: Server testsite/web03 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue
It’s impossible to change either the subject or the body of the email (this is baked into the haproxy binary itself) but as stated before, if you need customized alerts you can issue them from an external check script.
Status and Basic Performance Monitoring
OK now that we’re logging traffic and being notified when we lose a backend server, the final piece of the puzzle when it comes to basic monitoring is having some sort of dashboard view so we can get a quick summary of how our load balance pools are doing. You’ll likely have the most success with custom monitoring tools, but haproxy comes with a baked-in dashboard system for monitoring your pools.
Here is an example configuration file:
listen statspage bind 0.0.0.0:8080 mode http stats enable stats uri / stats refresh 5s stats show-node listen testsite mode http bind 0.0.0.0:80 server web01 web01:80 check server web02 web02:80 check server web03 web03:80 check
The listen
block at the bottom is fairly normal, but the listen
block at the top is where the magic happens:
- We
bind
like normal, and enablehttp
mode so that our later commands can make reference to HTTP-level information. - The statistic page is then configured with various
stats
keywords:stats enable
signifies that a statistics page will be served through this frontend.stats uri /
specifies that the stats page will be located at the HTTP root.stats refresh 5s
is an optional parameter that causes the stats page to automatically refresh ever five seconds.- Finally,
stats show-node
will print the server hostname at the top of the page so that it’s easier to keep track of which load balancer you’re looking at. This is useful when you’ve set up an haproxy cluster as mentioned later.
Some people feel the need to enable stats auth
to prompt for a username or password to access the page, but the page doesn’t contain sensitive information (outside of backend server hostnames) so security is probably better served through serving the stats over a particular IP address and implementing some sort of HBAC to restrict access to the stats page to internal users only.
Clustering
Overview
Of course part of the benefit of a load balancer is shielding users from temporary disruptions in the backend server. If you have to reboot the VM serving a particular application you want everyone to be moved to the servers that aren’t rebooting. Part of this availability goes away though if the load balancing function itself isn’t highly available.
To enable high availability for load balancers you need to implement clustering. For a load balancer to be considered properly clustered all haproxy instances need to share at least three things:
- Configuration Meaning configuration changes are deployed to the load balancers (via ansible, git, etc) rather than being manually maintained which introduces the possibility of human error. If your load balance includes external checks, you need to deploy those along with configuration as a package rather than piecemeal.
- State Which can be understood as both the data populating key stick tables (such as abusive users) but also the backend application state.
- VIP Addresses External users shouldn’t be aware of planned change overs from one load balancer to another. You may mitigate this somewhat by managing many different VIP’s and draining/migrating them one-by-one but ultimately you’ll need to implement a VIP management solution like
keepalived
orucarp
for performing the automatic recovery.
The first and last are out of scope for haproxy, but haproxy does natively support state sharing with passive nodes which is what we’ll concentrate on here. Configuration management can take many forms and I haven’t written an article on that yet. For VIP management you can also read my keepalived and ucarp articles.
State Transfer
For peer state transfer to happen, you first need to configure the peers haproxy actually has using a peers
block and share a particular stick table out to a particular set of peers. Without a stick table to share, no transfer will take place. Let’s take a look at a simple configuration:
global stats socket /run/haproxy/admin.sock mode 0600 level admin peers clusterpeers peer lb01 lb01:1024 peer lb02 lb02:1024 listen testsite mode http bind 0.0.0.0:80 stick-table type ip size 30k expire 5m peers clusterpeers store gpc0 http-request track-sc0 src acl is_flagged src_get_gpc0 gt 0 http-request reject if is_flagged server web01 web01:80 check server web02 web02:80 check server web03 web03:80 check
Breaking this down:
- We establish a peer group called
clusterpeers
- Each peer is given an “haproxy” name which is used during state transfer to identify where a particular update is coming from so it’s important that it’s unique for the cluster
- We then specify the hostname and port combination for peer communication to take place over.
- If an haproxy instance finds a peer instance that whose haproxy name matches its default hostname (or an IP address that appears on a local interface) it will assume this is its peer entry and open the port specified in the second argument.
- If there are no stick tables shared out these ports will remained closed.
- We create a basic stick table in the
testsite
listen block- uses
ip
address for entry key and stores only thegpc0
counter - Each entry expires after five minutes (
5m
) and the table can store 30,720 entries. - We share this stick table out to the peers listed in the peer group above with
peers clusterpeers
- uses
- we then enable associating the visiting user with stick table entries and use an ACL called
is_flagged
to determine if thegpc0
for the given IP is non-zero. - If it is non-zero we reject the HTTP request.
Assuming both lb01
and lb02
have identical configurations this will result in the stick table testsite
being migrated to both nodes. For instance, setting the gpc0
counter on lb01
should result in it being immediately available and set on lb02
:
root@lb01:~# echo set table testsite key 192.168.122.1 data.gpc0 1 | socat /run/haproxy/admin.sock - root@lb01:~# echo show table testsite | socat /run/haproxy/admin.sock - # table: testsite, type: ip, size:30720, used:1 0x559d8f0314e4: key=192.168.122.1 use=0 exp=297594 gpc0=1
meanwhile on lb02
:
root@lb02:~# echo show table testsite | socat /run/haproxy/admin.sock - # table: testsite, type: ip, size:30720, used:1 0x55ba2fe3fd54: key=192.168.122.1 use=0 exp=297394 gpc0=1
There are no native encryption or authentication mechanisms available for this peer communication, so if you need to be concerned about the security of your load balancer’s state information (for example you’re on a public cloud provider) you’ll need to setup a point-to-point network using something like weave
or IPSec.
Monitoring Cluster Status
There is no native monitoring for peer status changes, so you have to get creative. One way to monitor peer status is to create a null backend checking port status sending email alerts on changes. An example configuration might look like this:
global stats socket /run/haproxy/admin.sock mode 0600 level admin peers clusterpeers peer lb01 lb01:1024 peer lb02 lb02:1024 mailers clusterpeers mailer gateway smtpgw.example.com:25 backend clusterpeers stick-table type ip size 1 expire 60m peers clusterpeers store gpc0 server lb01 lb01:1024 check server lb02 lb02:1024 check email-alert mailers clusterpeers email-alert to support@example.com email-alert from support@example.com listen testsite mode http bind 0.0.0.0:80 server web01 web01:80 check server web02 web02:80 check server web03 web03:80 check
The above creates an incredibly small stick table so that the peer state transfer ports will stay open regardless of what other stuff you eventually take out of put into the other listen
/frontend
/backend
blocks. It is then designed to send email alerts when the TCP check fails with a remote peer. I’ve yet to devise a ping/pong external check for peers so a TCP check is as close as I’ve ever been able to get it.
Dynamic Member Management
OK so now we’ve reached the capstone of the article. Now that we have a pretty functional intermediate-to-advanced knowledge of how haproxy functions, let’s bring it together into the most comprehensive scenario that’s still realistic to learn about: dynamic autoscaling.
Let’s construct a scenario:
- You administer a monolithic web application (i.e not SOA) where the worker nodes can be scaled up on demand with no ill effects on the fellow workers.
- Each worker sits on its own virtual machine, and a new instance needs to be deployable by deploying a new server template in VMware.
- When a new VM is provisioned, it should automatically come onto the network and connect to the database on its own and join the load balance.
OK so as for what’s left to be done on the “ops” side is getting onto the network automatically can be done through DHCP. Once we have our IP address though how do we get haproxy to begin delivering load?
Using Server Templates
The way that’s both the most direct and involves the least infrastructure would be to configure server templates in the haproxy configuration and use the API to manually set the IP addresses and add them to the load balancing pool for the application and then re-disable them once the worker node leave the load balance during the “scale down” portion of the autoscaling.
Following this approach, might look something like this:
global stats socket ipv4@0.0.0.0:1234 level admin listen testsite mode http bind 0.0.0.0:80 server-template web 300 web01:80 check disabled
The above is pretty succinct and straight forward:
- We enable the administrative API on port
tcp/1234
- We establish a server template called
web
- Upon startup this template will cause haproxy to generate
300
slots for backend pools. - It will fill each slot with a server named “template name”+ index (where the index is the current iteration it’s on) with a server definition set as what follows the server count.
- In the above each server will have a default
web01:80
backend, use TCP checks for health checks, and most importantly start in disabled status so that we’re only load balancing toweb01
once rather than 300 times.
- Upon startup this template will cause haproxy to generate
So when haproxy starts up it has a bunch of backend servers defined but none of them are active. The benefit of this approach is that it creates a slot for our new servers to occupy and we can use the API to modify and enable the backend server definition.
First let’s manually enable web01
just because that’s a fun thing to do:
root@lb01:~# echo "set server testsite/web1 state ready" | nc localhost 1234 root@lb01:~#
At this point the load balance should switch from “Unavailable” to exclusively load balancing to web01.
Let’s now manually set an new IP address on web2 and enable it to receive new connections:
root@lb01:~# echo "set server testsite/web2 addr 192.168.122.22" | nc localhost 1234 IP changed from '192.168.122.21' to '192.168.122.22' by 'stats socket command' root@lb01:~# echo "set server testsite/web2 state ready" | nc localhost 1234 root@lb01:~#
At the time of this writing setting the FQDN (as previous config examples have done) causes unstable pool behavior likely due to a bug so I’ve switched to modifying the IP address instead.
At this point the web02 application server should be part of the load balance and serving requests. Let’s say your logic has now determined that there isn’t as much of a need for workers here and needs to scale down that process.
First you start that process by setting the backend server’s status to drain
which prevents new user requests from going to it:
root@lb01:~# echo "set server testsite/web2 state drain" | nc localhost 1234 root@lb01:~#
At this point, you would start some process to monitor your application’s network usage. In the case of web servers you may wait for a long pause in HTTP requests (indicating that the client has disconnected) or the total data transfer rate to the load balancer to drop below a certain level (or potentially both). What you should not do (per the documentation) is assume your TCP connection count to go down to reach zero as the load balancer may still attempt to create persistent connections (such as HTTP keep-alive) or perform health checks.
Once you’re satisfied that no users are actually access the web application through this VM, you can then finally set the application server’s state to MAINT
to permanently remove it from the load balance:
root@lb01:~# echo "set server testsite/web2 state maint" | nc localhost 1234 root@lb01:~#
To recap the overall approach here is:
- Establish enough servers in the configuration to handle more than the peak load of the application but start them off in
disabled
status. - When new VM’s are added to the load balance, they receive an IP address via DHCP.
- The VM’s startup scripts contact haproxy over the API to determine an open slot
- The VM then uses the API again to set an open slot to use its IP address at the known port and change it’s status to
ready
. - Once the worker isn’t needed anymore the VM’s shutdown scripts call out to the API to change its status to
drain
so that requests stop coming in and shutdown stalls on waiting for some criteria to be met proving that requests are not being serviced by this application server anymore. - Once the criteria has been met, the API is called once more to set the status to
maint
indicating purposeful absence (rather thanDOWN
which indicates a failure of something that should be working).
And in so doing the above, you’re able to add and remove backend web servers on the fly without modifying the haproxy configuration or potentially disrupting other applications that might also be going through the same load balancer.
Using DNS-based Service Discovery
Now the approach above is functional but there are a few issues that jump out:
- The slot selection process requires a good deal of custom code. All auto-scaling will consist of some amount of custom code but maximizing your use of standard tools helps improve readability.
- It requires API access to haproxy, when you may want to limit a particular set of VM’s to merely being able to add and remove iterations of itself and not potentially have administrative access to the entire load balancer.
- If you have to rapidly autoscale with new VM’s potentially coming up concurrently then you run the risk of race conditions where two new VM’s may inadvertently pick the same slot and then you have two instances running over top of each other with unpredictable results.
Given the above let’s explore using DNS for service discovery, specifically Consul. Using DNS has the benefit of being able to programmatically do the above without race conditions, without undue levels of access being given to automated processes, and without allocating more than we need. Additionally, since haproxy is determining this via DNS we can setup all web applications to load balance this way and then the net number of changes to haproxy that are required is actually zero. Neato.
Setting up Consul is a little out of scope for this article and so we will concentrate only on the parts haproxy will see. For the time being assume that Consul has been setup to use Access Control Lists that limit the worker VMs’ Consul API access to only the webapp
service and anonymous users can only query DNS.
A working configuration might look something like this:
global stats socket ipv4@0.0.0.0:1234 level admin resolvers consul nameserver consul 192.168.122.15:8600 hold valid 10s listen testsite mode http bind 0.0.0.0:80 server-template webapp 4 webapp.service.consul:80 check resolvers consul
OK so let’s dig into the new stuff here:
- We now have a
resolvers
section for defining DNS resolution outside of the normal system DNS.- In our case we point it at the Consul instance where our webapp is defined by specifying the consul server’s IP address and the default port for Consul’s DNS service (
8600
). - We instruct haproxy to
hold
onto (i.e cache) valid results for 10 seconds. This stops an undue amount of traffic going out to Consul from the load balancers. - You can also instruct haproxy to cache NXDOMAIN (i.e hostname doesn’t exist yet) responses from DNS but in general it’s usually best to assume things will generally be configured correctly and you won’t be pointing to hostnames in Consul that don’t actually exist. Therefore I would only cache invalid response (errors, NXDOMAIN, etc) if you were running into an issue.
- In our case we point it at the Consul instance where our webapp is defined by specifying the consul server’s IP address and the default port for Consul’s DNS service (
- Down in the listen block we have
server-template
again- The template named
webapp
with four preallocated slots. - The FQDN is
webapp.service.consul
with a port of80
with TCP-based health checking enabled. - We set the DNS
resolvers
for this server toconsul
which isn’t strictly speaking needed but useful if you end up defining multiple resolvers. - At the time of this writing, installing
1.8.13
was required to get around initial DNS resolution failures which attempt to use the system’s DNS resolver instead of those configured viaresolvers
. If you must use1.8.8
please addinit-addr none
to suppress DNS lookups for this server at startup. - Upon startup Consul will return however many backend servers there are as A records, and haproxy will fill the
4
available slots with them. Upon removal from Consul, they’ll stop showing up in DNS and therefore haproxy will remove them from any slots they occupy. records in excess of available slots in haproxy will be silently ignore and only used if one of the active backends are removed.
- The template named
and there you go. Your application is now setup to automatically reconcile the load balance pool with a record of note with no unnecessary access being given and no possibility for race condition.
Where to Go From Here
As hard as it may be to believe this guide, as comprehensive as it is, doesn’t cover haproxy completely. I’ve abbreviated many of the examples and the description of several of the keywords have intentionally left out possible options. To get a full sense of haproxy, though, you’ll need to just dive into it, do something useful, then rely heavily on this guide and the two official guides available on github (Management API and Configuration) to fill in what I didn’t cover above.
So in addition to completing your understanding of the various keywords you end up finding most useful where can you go? Well no cluster would be complete without VIP management. I would suggest you look at my ucarp and keepalived articles to see how to implement that.
By default haproxy is a single process multi-threaded application. You may find it useful to benchmark performance by tweaking the process management directives or the performance tuning options I’ve completely omitted from this guide.
I’ve also intentionally glossed over SSL because usually the basic configuration listed above is enough for most users. However you may want to tweak various parameters associated with SSL or implement the SSL client authentication I mentioned before. This would be a good option for implementing some level of client authentication for API access.
Beyond that the #haproxy
freenode channel is always open and I’ve always found them very responsive. Whatever you do with haproxy though, I hope you found this guide useful and have a productive time.
Further Reading
- (rascaldev) Load Balancing For High Availability With nginx
- Load Balancing, Affinity, Persistence, Sticky Sessions: What You Need To Know
- Official haproxy Configuration Guide Official Documentation that has been rendered as HTML by Cyril Bonté
- Official haproxy Management API Guide Official Documentation that has been rendered as HTML by Cyril Bonté
- Dynamic Configuration with the HAProxy Runtime API