Load balancing an application requires some forethought. Some applications are simple and can handle their load on their own, taking traffic directly from the users and if they go down from time-to-time then oh well, just bring it back soon, I guess.

Other applications need to scale more dynamically with higher availability requirements. Automatically scaling the application might include being able to spread the load across a static number of nodes or allow us to take nodes in and out as needed (often programmatically).

As you can imagine, the solution will increase in complexity in a manner that’s directly proportionally to the performance requirements of the application. For this reason it’s always helpful to have an understanding of how it all fits together and judge for yourself what your availability requirements are. In this article I’m going to try my best to cover the main areas you’re likely to care about. You may care about some more than others and for that reason I would ask that you be judicious in which sections you choose to read.

Basic HTTP Load Balancing
Weighted Preference and Backup Nodes
Sticky Sessions
“Least Connections” vs “Round Robin”
Load Balancing Non-HTTP Traffic
Dynamic Member Management
Health Checks
Gatekeeping
Further Reading

Basic HTTP Load Balancing

Before we get too deep into the weeds, let’s start off slow with some fundamentals. Let’s say we have a basic PHP application. As far as ensuring a consistent user session state is concern, let’s also assume that it’s full represented with browser cookies and database content (for instance Core WordPress functions this way). So all that’s required for a load balance is that when a request comes in, it can be serviced by any of several backends.

Assuming nginx is already installed, let’s setup a basic load balance. When the client connects to the load balancer, it will try to locate the particular server block to use for the response using the normal rules. The server block itself can make reference to an upstream block that is a sibling underneath the larger http configuration block.

Let’s take a look at a basic load balancing configuration:

upstream web_servers {
  server 10.10.10.90:80;
  server 10.10.10.91:80;
}

server {

  listen 80 default_server;
  root /var/www/html;
  server_name example.com;

  location / {
    proxy_pass http://web_servers/;
  }

}

You’ll notice that not much is really required that’s all that out of the ordinary. The server block looks much like any other except with our default location block we are now specifying a proxy_pass directive. This directive causes all HTTP traffic that hits this host to by default be proxied to something with the name web_servers. If nginx can’t locate any internal mapping of this text name it will default to DNS and failing that the daemon will begin failing syntax checks.

In our example though, at the top you’ll notice where we’ve defined what the web_servers name should be mapped to by naming an upstream block such. An upstream, put simply is a configuration black box where you list possible servers/ports and nginx uses some means of picking one of the members but all your proxy_pass needs to know is the web_servers will be expanded into a server name that may vary between requests. This upstream set of servers is simply two nodes and with no options describing what kind of load balance we want then nginx will assume that a road robin load balance is sufficient.

Keeping upstream and server in the same file is just a convention I follow, it’s not strictly speaking required. All that’s important is that the upstream is defined as underneath either the http or stream (discussed later) blocks. You’ll also notice that the application protocol used on the backend is defined in the proxy_pass directive. If we wanted SSL communication to the backend server we would need to change the URL to be https:// instead.

So if you were already familiar with nginx configuration, there’s nothing particular mind blowing here as defining the backend servers in upstream is likely the only part that’s new to you. When a request comes in, the request is proxied to whichever server name and port is returned by the upstream called web_servers.

Considerations for HTTP Headers and Logging

Introducing a load balancer fixes some problems but it also introduces problems. One such problem is the backend server’s lack of visibility on the client connection due to said connection now being terminated on the load balancer instead of a local web server.

Three important consequences of this are:

Since our upstream group name and the client side hostname aren’t the same we should manually set the Host header in our proxy otherwise the backend server will receive a request for web_servers which may or may not work. Besides potentially not routing to proper host on the backend, this can prevent the backend application from being able to properly construct links back to itself in its response. In our example, it may instead attempt to link to http://web_servers which means nothing to the client browser.
The application server now can no longer see the protocol the client used to connect. This is important for many applications (such as previously mention URL construction). If your frontend terminates an HTTPS connection but your backend connection is HTTP then unless you hard code a different value the backend may begin pointing people to HTTP.
The backend application server itself can now no longer see the IP address of each remote client connecting as now each application server’s TCP connection is with the load balancer instead. One possible remedy for this one is to do your HTTP logging on the load balancer and ship it to something like Elastic for central monitoring.

To work around all three issues though, we can also introduce custom HTTP headers. In the case of the lost client IP this means setting a X-Forwarded-For header (you can pick your own but that’s the standard name) on the load balancer which is then proxied to the backend application server. You may additionally need to also create a X-Real-IP header of the same value, depending on the application requirements.

For identifying the client’s connection protocol, the standard practice is to set the X-Forwarded-Proto header on the load balancer and the backend server can take whatever steps are required to do the right thing on the backend server. One example would be adding logic to the wp-config.php for a WordPress website or settings.php for Drupal which will allow it to construct links back to itself that contain the proper application protocol for clients to use.

As an example, we may modify our location block to look something like this:

location / {
  proxy_set_header Host $host;
  proxy_set_header X-Forwarded-For $remote_addr;
  proxy_set_header X-Real-IP $remote_addr;
  proxy_set_header X-Forwarded-Proto $scheme;
  proxy_pass http://web_servers/;
}

Breaking this down:

We override the Host header that we would normally use for our backend request and instead copy whatever Host value the client used into it.
We then set X-Forwarded-For and X-Real-IP to give the backend server some visibility on which IP is making the given request.
Finally, we relay the connection protocol by passing the value of the current $scheme for the request via the X-Fowarded-Proto request header.

Weighted Preference and Backup Nodes

OK so now we have a round robin load balance setup. Big whoop, what next? Well let’s say you knew particular backends were more powerful than others. For instance, you may have two backend servers, one on a new machine with two sockets and four cores each, with the other is a single socket dual core. Obviously, if that’s all you have then you have to work with then that’s all that you have.

That presents a problem though. In our current configuration we’re redirecting traffic equally to each node even though we already know one is usually going to be the better choice. Ideally, you’d want the dual socket machine to handle most requests, with the single socket machine just easing the pressure off its big brother or potentially take over the workload entirely if the higher capacity backend needs to be restarted.

The answer to this is to modify the upstream block to put a thumb on the scale in favor of the big brother. For instance:

upstream web_servers {
  server 10.10.10.90:80;
  server 10.10.10.91:80 weight=2;
}

By default, each member of an upstream effectively has a weight of one, so the above essentially amounts to saying “for every three requests that come in, two should be handled by the .91 node and one by the .90 node.” If you want the traffic to be skewed even more you increase the desired node’s weight even more until the ratio works out.

Now let’s assume that you didn’t want that single socket system to ever handle traffic unless the dual socket server was just absolutely down for the count. This configuration is called having an active-passive configuration and is configured in nginx by marking the upstream node as backup in the upstream configuration. For example:

upstream web_servers {
  server 10.10.10.90:80 backup;
  server 10.10.10.91:80;
}

In the above, traffic will now only ever reach the .90 node if the .91 is considered a failed node. By default, nginx will give a backend node 10 seconds to respond to a user’s request and a single failure to do so will result in the node being marked as unavailable, with nginx checking again every 10 seconds to see if it’s back up. These thresholds can be tweaked, though:

upstream web_servers { 
  server 10.10.10.90:80 backup; 
  server 10.10.10.91:80 max_fails=3 fail_timeout=20s;
}

In the above, .90 is once again a passive node that will only take over the workload once .91 fails to allow a TCP connection within 20 seconds with nginx checking .91 every 20 seconds thereafter to see if the node is back up.

Sticky Sessions

Ideally, when you’re load balancing between multiple instances of an application, each node should be interchangeable with each other. This can be achieved without additional configuration if your application is stateless (such as a mostly static html website). If your application needs to keep track of user sessions, though, each node will need to share the same user session information. An explanation of how to do that is out of scope for this section (and article).

Occasionally, though, you can’t build your clusters that way. For example, maybe it’s a vendor-controlled system. You can set up multiple instances point at the same database but session information can’t be shared or maybe files generated are saved to directories that can’t be safely shared between instances. In this cases, you’ll need to ensure that users that start out on one node, keep going to that same backend node.

With nginx (and most load balancing solutions) there are two main ways of doing this: setting a cookie or routing based on the client’s IP address. In general, both are about as functional but cookie-based a few drawbacks:

IP hashing doesn’t expose any details out to the user. Some would consider this “cleaner.”
Cookie persistence is a little more complicated at at the time of this writing the sticky directive used is a Plus-only feature.
IP hashing persists until nginx is restarted or you migrate from one load balancer to another (such as in the case with a clustered load balancer) whereas cookies age out naturally if you don’t access the site frequently.

Theoretically IP hashing also has the benefit of maybe one day being extended to stream load balances (described in “Load Balancing Non-HTTP Traffic” below) but currently both cookies and hashing are HTTP-only.

However cookie persistence has one benefit that IP hashing does not: your load balancing the actual clients and not whole IP addresses under the assumption that all requests from that IP are a single user. If the client’s source IP changes cookie persistence keeps the session alive whereas it’s lost on IP hashing. When deciding between IP hashing and setting a cookie you just have to decide which behavior is more important to the application you’re managing. If a lot of users might share an IP (NAT for instance) then using a cookie to store persistence might make more sense.

To configure IP hashing persistence, just add the ip_hash directive (no arguments) to your upstream block. In our continuing example that would look like:

upstream web_servers {
  ip_hash;
  server 10.10.10.90:80;
  server 10.10.10.91:80;
}

With that simple modification, you should notice that the load balancer is now sending you to the same node repeatedly no matter how many times you refresh.

Since the sticky directive used in cookie persistence is still Plus-only I’m going to refrain from describing it until it gets incorporated into the open source version of nginx. If you’re legitimately curious though there is documentation for it online.

“Least Connections” vs “Round Robin”

Alright, so now we can set up a load balance a bit and tweak backend selection a bit to give one node preference or to ensure users stay on the same backend node. Is there more we can do though? For instance, not all load is equal. Some requests are users are just browsing from page to page, others are simple REST API calls but then others are generating large PDF’s or long-running statistical analysis. Is there a way we can try to communicate back to the load balancer that one of the previous requests actually accounts for a good deal of work?

Well, not really. Especially since the act of communicating this back to the load balancer would be futile as it’s subject to change. We can however make a vague approximation of the amount of work being done by taking into account the requests that are still outstanding. For instance, if we have two backends, one has ten connections while the other has five, then the load balancer should give traffic to latter node. This doesn’t take into account requests that are just blocking for some reason and thus the related server could take more work as it’s just waiting on something anyways. It’s the closest you can reasonably get in an automated way, though.

To enable this behavior, simple add the least_conn directive to the upstream group:

upstream web_servers {
  least_conn;
  server 10.10.10.90:80;
  server 10.10.10.91:80;
}

and that’s it.

Load Balancing Non-HTTP Traffic

In relatively recent history, nginx (even the open source version) has made great strides in expanding its load balancing ability outside of just HTTP traffic. The majority of load balancing is with web traffic but as time wears on many load balancing solutions (such as F5 Networks’ “BIG-IP LTM” load balancer) are being used as edge devices. Additionally, by redirecting the traffic through a load balancer backend network services of all kinds can be added or removed without updating IP addresses or hostnames placed in scripts or configuration files. It also allows the backend servers to be patched and rebooted without affecting service availability.

The first thing to understand about load balancing non-HTTP traffic is that since we don’t want nginx to do anything with the application-layer protocol, we need to place both the upstream group definition and the virtual server definition inside of a stream block rather than the usual HTTP block. There are installable modules for other application protocols but since none are native I won’t cover them in this overview.

Let’s say you have a series of SSH jump boxes: jump1, jump2, and jump3. A possible load balance could be:

stream {

  upstream ssh_servers {
    least_conn;
    server jump1.rascaldev.io:22;
    server jump2.rascaldev.io:22;
    server jump3.rascaldev.io:22;
  }

  server {

    listen 22 default_server;
    proxy_pass ssh_servers;

  }

}

If you’ve been following along, the above should look pretty darn familiar. We have an upstream server that is load balancing based on least_conn (since SSH sessions can obviously be long lived) to our three backend SSH jump boxes.

The key differences here are that we’re now encasing everything in the stream block, we’re no longer specifying a protocol in our proxy_pass directive and listen is now configured for port 22. After reloading I can now see nginx listening for ssh connections:

root@node01:~# netstat -tlpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      7867/nginx -g daemo
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      7867/nginx -g daemo

Please remember, that for SSH load balancing to work, each of the backend server’s host keys must be identical, otherwise clients will refuse to connect

I’ve used SSH here but the same principles apply to any sort of TCP load balancing. UDP load balancing works almost exactly the same, except the listen directive must have udp option attached. For instance listen 53 udp; for DNS lookups.

Dynamic Member Management

OK so now we’re getting into the more advanced topics. At this point we have a load balancer going between our application servers and we can distribute load according to where we need it to be. Let’s take it a step further and try to ensure our nginx load balance can adjust its list of upstream servers according to outside sources. If we were somehow to dynamically adjust the members of the load balance we could run scripts at shut down or at start up that would add and remove application servers as they’re spun up whether they were on VM’s or containers. This lays the ground work for later implementing what’s called autoscaling where new worker nodes are spun up as capacity or performance thresholds are met or surpassed.

The Plus version of nginx includes an API gateway for managing things such as this, but since that’s a paid product I won’t go over that solution here.

Reconfiguring nginx through scripts…

I won’t go on at length here due to the variety of ways of following this approach but it’s perfectly possible to programmatically regenerate server and upstream blocks from a template and pending the results of nginx -t issue a nginx -sreload command. Such as way of doing things does make upstream membership dynamic but presents a few problems:

This represents original code which isn’t wrong per se, but generally not a good idea since original code is definitionally less vetted than generally available production code used by many different organizations and represents additional learning for all new hires.
This requires the backend servers to themselves be able to reach into the load balancer and issue commands which have the authority to reconfigure/break nearly everything else the load balancer might be doing.
Even if nginx -t returns positive, that doesn’t mean whatever changes have been made to the configuration file will actually work, it just means your syntax is alright.
You’re putting faith in your automated changes being the only changes going on at the same time. Otherwise your scripts are effectively enforcing application changes that have hitherto lay inert on the filesystem. Reloading the entire web server’s configuration could potentially result in unrelated server blocks now failing due to misconfiguration or service unavailability issues that would have otherwise been caught in time with no impact on end users.

Still you can do it this way, but anyone who’s likely interested in this level of availability and scalability likely isn’t going to want to take the risks mentioned above.

Defining Members in DNS…

One way of doing this is our old friend DNS. As alluded to before when I explained how proxy_pass is evaluted, nginx natively supports the ability to query DNS for name resolution. If you specify a hostname in an upstream block then nginx will use the DNS servers listed at /etc/resolv.conf to resolve the names to IP’s.

Various solutions exist that make dynamic updates to DNS possible (including literal “Dynamic DNS”). The most popular solutions, though, are the “DNS service discovery” products such as kube-dns which comes with Kubernetes or products such as Hashicorp’s Consul service. The former is the most popular for containerized service discovery whereas the latter is more generally popular and applicable as a common service for hybrid deployments of containers and VM’s.

For example, you might use Consul’s DNS interface for your nginx configuration, and have each VM or physical machine execute a script upon startup or shutdown to gracefully leave or join the load balance as required. This would mean deploying a new application server could be as simple as deploying a new template for the application in VMWare (a process that in autoscaling is even scripted).

The simplest way of sourcing DNS in your nginx configuration, is simply by using hostnames in your configuration. For example:

upstream web_servers {
  server node01.rascaldev.io:80;
  server node02.rascaldev.io:80;
}

server {

  listen 80 default_server;
  root /var/www/html;
  server_name example.com;

  location / {
    proxy_pass http://web_servers/;
  }

}

The above is almost identical to one of our previous load balances, except now we we can see hostnames in the upstream group. When nginx starts and thereafter every time it reloads its configuration it will query the operating system’s DNS and if multiple A records are returned it will treat each individual record as its own upstream member. Meaning if node02.rascaldev.io actually returns two IP’s then my upstream block will effectively have a minimum of three members rather than just two.

This has the drawback of being dynamic in configuration but since picking up changes requires reloading the configuration (the problem with automating static configuration with scripts). To get nginx to resolve hostnames at run-time before you issue the proxy_pass directive, create a variable containing the A record in question and configure a resolver directive outside of the server block to point towards the DNS. For example:

resolver 10.10.10.5:8600 valid=5s;

server {

  listen 80;
  root /var/www/html;
  server_name example.com;

 location / {
    set $dynamic_webservers "rails001.service.dc1.consul";
    proxy_pass http://$dynamic_webservers/;
  }

}

Breaking this down:

The resolver directive sets the DNS server to use for hostname lookups. By default nginx will cache DNS results attained via resolver for the full TTL value but we’re overriding this to five seconds using the valid=5s option. This is optional an in the case of something like Consul (which gives a TTL of zero seconds) it’s probably better to leave it unspecified in your configuration.
Inside our location block we create a variable called $dynamic_webservers which is merely a text string representing the FQDN of the A records that will represent the backend service. In our case the hostname rails001.service.dc1.consul will resolve to all the application backends for a particular website.
We then use the variable in lieu of a hostname or upstream name to the proxy_pass directive. Using a hostname in a variable causes nginx to resolve the IP address when on a request-by-request basis rather than only during instantiation or configuration reload.

You’ll notice that here we also omitted the upstream group entirely. Ideally you would want to mix these two (dynamic upstream member adjustment and a specific load balancing policy) but as you might guess, combining active DNS updates with upstream groups is a Plus-only feature. For that reason, I’m not going to cover it here but it works pretty much how you would think it would work (using the variable in upstream instead of proxy_pass).

Health Checks

Active Health Checks…

By default, nginx health checks are passive. This means that a user makes their request, nginx then tries to relay that to a backend node only to discover it’s not there anymore. This forces the client to wait while their request is re-routed to another backend node. Ideally, nginx should be figuring this out ASAP and a client’s request should never view that server as a candidate.

Luckily you can specify that nginx should be actively looking for down servers by simply providing the health_check directive in the same location block as your proxy_pass directive. For example:

location / {
  proxy_set_header X-Forwarded-For $remote_addr;
  proxy_set_header X-Real-IP $remote_addr;
  proxy_set_header X-Forwarded-Proto $scheme;
  health_check;
  proxy_pass http://web_servers/;
}

and that’s it. Instead of waiting for a client request to come in, every five seconds nginx will request / on each server and if the response code is 200 then it will be considered healthy enough to receive requests.

We can tune the active health check however we’d like. For example:

location / {
  proxy_set_header X-Forwarded-For $remote_addr;
  proxy_set_header X-Real-IP $remote_addr;
  proxy_set_header X-Forwarded-Proto $scheme;
  health_check fails=3 passes=2 interval=2 uri=/nginx;
  proxy_pass http://web_servers/;
}

In the above, health_check will request /nginx from each backend server every 2 seconds, with three failures marking a node as failed whereas 2 consecutive successful passes will result in it being added back into the load balance again.

Additionally, there are match directives for writing your own liveness checks for HTTP and non-HTTP load balances but at the time of this writing it’s a Plus-only feature.

Gatekeeping

OK so now in terms of continuous availability and spreading application demand around intelligently we’re pretty well covered. Having a load balancer presents us with another opportunity, though. By having a load balancer out front we have now have the ability to gatekeep our application. Meaning we can run interference for the application server and enforce the policies we want users to have to follow. Policies can include:

Decreasing latency by caching static assets in memory.
Preserving bandwidth by implementing different “Quality of Service” policies.
Shielding the application from unwanted traffic such as HTTP verbs you know you’ll never use or to block certain portions from certain geographic regions.

Let’s take a look at each one in detail.

Caching

With a load balancer out front, we’re now in a prime position to cache responses from the backend server. As with any caching your should consider the following before implementing a solution:

Does this data need to be dynamic. If so am I able to invalidate the cache entry to re-populate it should some bit of data change?
Do the vast majority requests tend to be repeated? Implementing a cache will usually slow down performance if the majority of requests you receive usually won’t be found more often than checking the cache.
If there is sensitive information being returned by the backend, have I exempted it from caching?

Once you’ve answered these questions, you can start designing your cache behavior. Let’s start by establishing a cache for a web application that’s purely static assets (.html, .css, .js, .png, etc) that rarely change. An example virtual host configuration for that type of application would be:

proxy_cache_path /var/cache/nginx keys_zone=rascaldev:10m;

upstream upstreams {

  server node02:80;
  server node03:80;

}

server {

  location / {
    proxy_cache rascaldev;
    proxy_cache_valid 10m;
    proxy_cache_valid 404 1m;
    proxy_pass http://upstreams;
  }

}

Breaking down the above:

proxy_cache_path establishes both the filesystem directory where proxied content will be saved (/var/cache/nginx) and the shared memory zone (rascaldev) used for storing the etnry keys for content cached therein along with the shared memory size (10m or 10MB).
upstream defines an upstream group of servers. We’ve seen this one before obviously.
Inside the server block:
- proxy_cache sets the particular cache to use for this location, identified by the name of the shared memory zone
- The first proxy_cache_valid sets the default response TTL to 10 minutes.
- proxy_cache_valid 404 sets the TTL for 404 responses at 1 minute. By default, 404’s aren’t cached at all but caching 404’s enables you to prevent asking the backend servers for resources they’ve just recently told you do not exist. Therefore this one is optional but it’s something I would do.
- Finally, the proxy_pass actually proxies the request to the backend web server and relays the response back to the client.

If you’ve done any sort of caching with nginx before (such as the FastCGI caching I’ve previously covered) most of the above should look pretty familiar.

Let’s examine a more complex scenario. You want wordpress users with an authentication cookie to bypass the cache for logged in users as well as limit caching to only GET requests so that anonymous users submitting a web form aren’t given a stale response with the backend server never seeing the request. You also don’t want the cache to grow larger than 50MB. That kind of configuration might look like:

proxy_cache_path /var/cache/nginx keys_zone=rascaldev:10m max_size=50m;

upstream upstreams {

  server node02:80;
  server node03:80;

}

server {

    ## Try to use the cache wherever possible
  set $bypass_cache 0;

    ## Logged in users never use the cache
  if ($http_cookie ~* "wordpress_logged_in_.+"){
    set $bypass_cache 1;
  }


  location / {
    proxy_cache rascaldev;
    proxy_cache_valid 10m;
    proxy_cache_methods HEAD GET;
    proxy_ignore_headers Cache-Control Expires Set-Cookie;
    proxy_no_cache $bypass_cache;
    proxy_bypass_cache $bypass_cache;
    add_header X-Cache $upstream_cache_status;
    proxy_pass http://upstreams;
  }

}

Walking through the differences from our more basic example:

The proxy_cache_path directive has been modified to add max_size=50m which caps the nginx cache at 50MB maximum. Once this limit is reached, the least recently accessed resources are pruned until
We’re now using some logic to set the variable $bypass_cache to a 1 for all users with a cookie matching the wordpress login cookie (this cookie will be cleared upon logout) and setting to 0 for all others. At the bottom, this number is given to proxy_no_cache and proxy_cache_bypass to prevent the requests from ever hitting the cache. no_cache results in the response being omitted from the cache while cache_bypass results in responses being taken directly from the backend even if there already exists an entry in cache for this request.
proxy_cache_methods sets an explicit limit to only consider HEAD and GET requests as candidates for caching.
We’re instructing nginx to ignore any Cache-Control or Expires headers sent by the backend server. This makes nginx the determining factor for whether a request is cached and not the application (WordPress in this case). We’re ignoring the Set-Cookie header because nginx’s default behavior is to exempt any responses that involve setting a cookie. Be aware of the security implications here.
Finally add_header is used to deliver the contents of the $upstream_cache_status variable to users’ browsers by way of a header called X-Cache when implemented exemptions from caching it’s usually advisable to send the browser this information should you need to later troubleshoot and want to know if the user is getting cached data.

So there you have it. Caching strategies can obviously get more complicated than that but the above should give you a decent understanding of what needs to be done on a basic level. At some point in the future, I may write an article dedicated solely to nginx’s caching ability.

Rate Limiting

OK but protecting performance is about more than reducing unnecessary requests to the backend. You also may want to protect and shape your overall bandwidth. For instance, let’s say that you offer a mirror for some Linux distribution’s package tree. In this case you want to provide some of the bandwidth your company isn’t really using but at the same time you need to preserve normal operations. This would involve throttling the downloads such that you weren’t be continually pinged with new requests.

Let’s take a look at a simple example:

limit_req_zone $binary_remote_addr zone=rascaldev:10m rate=4r/s;

upstream upstreams {

  server node02:80;
  server node03:80;

}

server {

  root /var/www/html;
  index index.php;

  location / {

    limit_req zone=rascaldev;
    proxy_pass http://upstreams;

  }

}

The above is pretty standard for what we’ve seen so far except for two new things:

limit_req_zone establishes a shared memory zone for keeping track of the number of requests. The first option is the criteria to use for identifying when two requests belong to the same throttling group. Here we’re using $binary_remote_addr so that all requests (regardless of method or URI) from the same source IP are grouped together and share the same “four requests a second” (rate=4r/s) quota. This zone is named rascaldev and is 10MB in size (similar to other zones we’ve seen).
limit_req indicates that requests to this location block are subjected to the request throttling we’ve configured above. Optionally we can move this directly underneath server and requests will be throttled site-wide regardless of which location block in your configuration ends up matching.

If you test the above configuration you’ll notice how nginx keeps the requests within their quota. By default, if a user issues requests at rate that exceeds their quota they are returned a 503 Service Unavailable error. Even though the above only limits requests to four requests a second, the way nginx preserves this quota is by slicing the second into equally long periods equal to the number of allowed requests with nginx only processing one request per time slice, returning 503 for all others. This means you can be limited to four requests a second, and only perform three requests and still get a 503 if you had two requests within the same time slice (in the case of 4r/s the time slice is 2500ms).

This is undesirable for a lot of modern applications, though. Usually when a web page first loads you’re going to have several subsequent requests to load things like embedded images and style sheets. That means when the page loads the browser is immediately going to issue new requests for everything.

To get around this problem we can pass two options to the limit_req directive:

burst is a parameter that specifies how many requests clients are able to queue on the frontend before they start getting the aforementioned 503 errors. In our example of 4r/s if two requests come in within the same 2250ms time slice, the second request simply stalls until the next 250ms time slice is available.
To get around this stalling we can set the nodelay parameter. This causes valid requests to be served immediately with no waiting period. When used with burst this allows you to give clients an initial period during which they can issue whichever requests they want, whenever they want, thereafter their activity will be governed by the above 503 error.

Given the above information here is a more realistic example:

limit_req_zone $request_uri zone=rascaldev:10m rate=10r/s;

upstream upstreams {

  server node02:80;
  server node03:80;

}

server {

  root /var/www/html;
  index index.php;

  location /news {
    limit_req zone=rascaldev burst=20 nodelay;
    proxy_pass http://upstreams;
  }

  location / {
    proxy_pass http://upstreams;
  }

}

The above, applies the aforementioned burst and nodelay parameters to our limit_req invocation such that users can immediately make requests, and thereafter will be limited to 10r/s.

Let’s say you wanted to be more precise about what you were wanting to control. You didn’t want to control traffic on a requests-per-minute or even a requests-per-second basis. You want users to be limited to 200kbps regardless of how many requests are being made or how often they’re being made. For these sorts of problems with have limit_rate. For example:

upstream upstreams {

  server node02:80;
  server node03:80;

}

server {

  root /var/www/html;
  index index.php;

  location /downloads {
    limit_rate 200k;
    limit_rate_after 1024k;
    proxy_pass http://upstreams;
  }

  location / {
    proxy_pass http://upstreams;
  }
}

The above looks pretty close to our last limit_req example except we’re not capping the download rate for anything inside /downloads at 200kbps should they end up being larger than a megabyte (1024k).

For more conditional situations, you can also set the $limit_rate variable for the same effect:

server {

  root /var/www/html;
  index index.php;

  location / {

    if ($request_uri ~ ^/downloads){
      set $limit_rate 200k;
      limit_rate_after 1024k;
    }

    proxy_pass http://upstreams;

  }

}

The above should be functionally identical to our first block, except we’ve managed to remove the duplicate proxy_pass directive. Since it’s being set according to contents of a variable, there’s no limit to the criteria you can use for which requests are throttled and which aren’t.

Given that, you can set bandwidth throttling according to key-value pairs accessible by remote processes via the nginx plus api, but since that’s a paid feature I won’t be covering that here.

Security

So finally we come to security. Even if you only have a single application server, chances are you’ve at least toyed with the idea of putting a load balancer out front just to filter requests as they come in. The load balancer is the ideal place for that, after all.

The first opportunity for security is to limit what locations are accessible by outside parties. Just like with Apache httpd you do this using the deny and allow directives. For example this:

server {

  root /var/www/html;
  index index.php;

  location /private {

    allow 10.0.0.0/8;
    allow 192.168.0.0/16;
    deny all;

    proxy_pass http://upstreams;

  }

  location / {

    proxy_pass http://upstreams;

  }

}

The above restricts access to private to only the private IP addresses listed. You’ll notice this looks pretty similar to Apache and that similarity follows through. Each rule is evaluated in the order they appear in the configuration file and the first HBAC rule that matches the requester causes nginx to stop evaluating HBAC rules within that context.

Along a similar line you can restrict the methods you’re going to relay down to a select few based on these HBAC controls by using a limit_except clause. For example:

server {

  root /var/www/html;
  index index.php;

  location / {

    limit_except GET HEAD POST {
      deny all;
    }

    proxy_pass http://upstreams;

  }

}

The above will block all HTTP requests outside of simply GET, HEAD, or POST regardless of its origin or what URL it’s acting upon. Usually you will want the denial to be categorical as I have it above (truthfully most public web applications only need HEAD, GET, or POST) but the allow directive is also valid in this context. The preference for categorical denials is that when it comes to limit_except generally you’re blocking requests that have no legitimate purpose with your application. Only rarely will you need to whitelist a particular IP range.

Another possible attack vector is the malicious use of HTTP headers. This involves using both add_header to send security information to the client as well as proxy_set_header to blot out or sanitize headers that may have problematic values. Non-obvious examples of this attack vector would be using browser cookies for SQL injection.

Injection attacks (SQL and otherwise) often need a certain amount of preamble before they can begin delineating their payload. For instance, escaping/encoding the character strings being sent or the like. In the case of shellshock, this required a preamble to escape to a shell first. For this reason, if the maximum character space is artificially lowered to just above what’s needed for any valid request, injection attempts are particularly sensitive. Additionally, if your application limits the data clients must send to complete a request (such as utilizing session variables instead of storing data in cookies), then you can set the maximum limit take even lower.

Let’s take the following location block for instance:

location ~ \.php$ {

    ## Enforce Maximum URI Length
  if ( $request_uri ~ "/.{100,}" ){
    add_header Content-Type "text/plain";
    return 403 "Denied Access.";
  }

    ## Enforce Maximum Size of all cookies altogether
  if ( $http_cookie ~ ".{500,}" ){
    add_header Content-Type "text/plain";
    return 403 "Denied Access.";
  }

  proxy_pass http://upstreams;

}

Due to the lack of any directive/function to determine a string’s length in each of the above if statements’ support of regex to make use of regex’s ability to match ranges.

In the above configuration sample, we severely limit two non-obvious attack vectors:

In the first if statement the request URI is limited to 100 characters after the first slash. This eliminates a lot of possible application attacks that rely on GET variables being uncritically used as-is by the backend application. In the regexp for the matching condition we’re matching any string that consists of any class of characters (the .) that is 100 or more (the {100,}). Since each character counts for one, that means it counts all characters, only matching if it’s 100 (or more).
In the second if statement, we’re now doing the same thing by limiting the total length of cookie data by matching a series of any character that is 500 long. In this case we could enforce limits on particular cookies if we wanted (via $cookie_cookiename. In the above I did all cookies when counted together though.

Obviously, the numbers I’ve provided are just ballpark numbers. I would just be aware of how much of each type of data your application actually needs for the largest possible legitimate request and then increase that number by 15-20%. Be aware that you don’t get extra credit for setting limits just barely above what’s needed and that the security benefit of this drops off precipitously once attackers just aren’t able to send almost arbitrarily large strings to the application (for instance the maximum cookie size is 4096 characters which in my experience most applications don’t actually use).

A keen observer will notice that in the above I’m using $request_uri rather than something like $args or $uri. That is because $uri undergoes a process of “normalization” (meaning taking directives like alias and location into account, adjusting the variable to it’s perceived “real” value) meaning it may no longer be what the user actually sent us. In the case of $args the issue is that many applications implement “clean URL’s” which won’t populate $args but will be used as such by the application and actually are present in the immutable $request_uri.

A keen observer will also note the lack of any checks/limits on POST data. This reason is threefold:

When a legitimate request is large, it’s usually done as a POST. Usually cookies and GET parameters are only intended to be small bits of key information, where the weight of the transaction is in the server’s response back to the client. Whereas your typical POST requests could go either way. For example, file uploads are often done via POST.
nginx actually seems to lack a reliable mechanism for determining the length of POST data. For instance, $request_body actually being populated seems rather hit-or-miss, $request_length includes all headers (including GET parameters and cookies), and $content_length is dependent upon the value of a request header, making it unacceptable for a security control.
Generally speaking, both GET and POST requests are more obvious avenues for injection so you might be able to bank on the hope that the developer didn’t make a mistake and trust user input unconditionally. In the case of GET requests we blocked it above because nginx made it easy enough and it at least eliminates the need for hope.

If that’s not good enough for you and you want to control POST data being sent through the load balancer, you can still do that with the nginScript controls I mentioned before. Since that’s a whole other scripting language though, I’ll leave that for another post.

I’ll close out this section just mentioning two additional security controls you might find useful:

Load balancer cycling is useful for partial mitigation of compromised load balancers. As mentioned before a lot of commonly used HA configurations involve the website’s domain resolving to several IP addresses to different load balancers.
- This allows you to take each load balancer and individually drain it of any long-running requests (such as downloading large files) by taking it out of the DNS load balance, but leaving the VIP active on the egressing node. Once it’s drained of all TCP connections, the VIP can be migrated to a replacement load balancer and the VIP re-added to DNS.
- This process can be automated reliably as long as the new load balancer passes some series of smoke tests proving it to be fully functional before adding it back to DNS. Once replaced, any exploit that compromised the previous load balancer must now be repeated once they notice what happened.
- This process can also reduce the amount of time between a patch being released and becoming effective in production. The
- Adding additional load balancer capacity in this configuration is as simple as adding another VIP candidate to the DNS load balance.
fail2ban is a tool that is often very useful. Obviously a full explanation of fail2ban would be its own post but I mention it here for completeness.
- This allows malicious users to be identified and programmatically blocked at the firewall level before they have a chance to find a weakness to exploit.
- The expiry involved in fail2ban also allows false flags (such as attacks from shared IP’s) to not turn into denial of service attacks.