Application Routing in nginx

Usually an application can handle its own routing. After all, the developers created the application so it stands to reason that they’ll know what URL’s correspond to what components in their application.

Occasionally, though you’ll need to delve into the server configuration so that HTTP requests are properly mapped to application requests. The reasons can be varied but usually amounts to not having sufficient control of the application itself, such as a vendor application. My hope here is that no matter why you need to manage application routing, you’re able to get a firm grasp on the subject after reading this.

Contents

Variables Relevant to Application Routing

It’s important to understand what variables are often useful with application routing. The code examples in this post are easier to understand if you’re familiar with the following:

  • $uri :: A mutable variable that represents the current target of the request. It can be modified by several other directives including rewrite and try_files.
  • $request_uri :: An immutable variable containing the unaltered user request (i.e what the user has typed into the URL on their browser).
  • $scheme :: The protocol in use when communicating with the client. Is usually either http or https.
  • $query_string :: The string of text in the URL after a question mark, non-inclusive. Also sometimes called $args.
  • $server_name :: The virtual host’s value for the server_name directive.

A Note About Variables Generated from Regular Expressions…

One of the features of regular expressions is the possibility of making back references to previously delineated groups of characters. I won’t go over the details of back references (that’s a different article) but suffice it to say that enclosing patterns in parentheses forms a match group and thus produces a back reference. These back references can also be named by prefixing the pattern with ?<variableName> for example:

(?<myVar>[a-zA-Z0-9]+)

Would match an alphanumeric string of text and store it in an nginx variable called $myVar.

Directive: try_files

Syntax:

  try_files file ... uri;
  try_files file ... =code;

With no configuration options, nginx will only return files that are 100% matches for what the user gave as their request URI. This works at a basic level but often you’ll want your application to be able to handle things like requests for unknown files (to generate 404’s) or to generate pages that don’t exist on the physical filesystem but are internal resources as indicated by the user’s request URI.

The function of try_files is to configure an order of precedence for satisfying a request by testing successive locations for the file. If it can’t find a good result by any of the given means it just returns the output of the last option give. An example might be:

try_files $uri $uri/ /index.php?$query_string;

Which would first attempt to locate a regular file matching the given URI (outside of anything set by try_files), failing that it’ll try to find a directory named that, and finally it’ll resort to just invoking the sites /index.php and feed it the query string the visitor gives.

As you can imagine, you don’t strictly speaking need a try_files directive in your configuration, but you probably should have one and most applications won’t function unless you have a correctly configured try_files configured. For instance, Drupal will try to generate “clean URL’s” for the various pages with URI’s like /user/2 or /node/3 and without using try_files to default to asking Drupal’s index.php (as show above) to return an appropriate response all nginx will do is check to see if a file exists at /node/3 and when it doesn’t, it’ll generate an nginx-based 404 message. No bueno.

It should also be noted that due to the internal redirect try_files will modify the content of $uri meaning anything that needs to match what the user has typed in their web browser will thereafter need to reference $request_uri instead as the two won’t be identical anymore (this is what fastcgi.conf does internally).

Directive: return

Syntax:

  return code [text];
  return code URL;
  return URL;

This directive is fairly basic, it stops nginx execution and either redirects the user elsewhere (for 30x HTTP return codes), returns a page specified by error_page or returns some text directly to the requester.

For example, a simple redirect from one URI to another could be done in a location block:

location /oldie {
  return 301 /admin/structure;
}

The URL given to redirect can also include variables and full URL’s which makes it easy to setup redirects from one application page to the analogous page on an application which supplants the older one (just as an example). One way of doing something like that might be:

location /user/index {
  return 301 $scheme://$server_name/admin/people?$query_string;
}

Will redirect users to the appropriate page on the same hostname, preserving HTTPS or HTTP (via $scheme) any query string given (via $query_string). This will potentially sending the users back through a load balancer and ultimately redirected to a different backend server.

Upstream documentation often recommends using redirect 301 in lieu of using something like rewrite for redirects due to how nginx configuration is evaluated and the possibility for redirect loops therein. However there are some instances where you will need to concisely redirect users by reformatting the URI’s for which there’s no real alternative to rewrite given that return doesn’t support regular expressions.

Directive: alias

Syntax:

  alias fullPath;

Redirecting the users back through the load balancer with return may be undesirable or the client software may have unpredictable behavior in response to redirect, though. In those cases, handling the changed URL purely through an internal redirect may be what you need. An “internal redirect” is nginx-speak for any redirection that only affects nginx’s behavior without exposing anything to the client (such as via a HTTP/1.0 301 response).

The alias directive provides this ability at a pretty basic level. It’s purpose is to remap the current $uri such that the file it ends up looks for doesn’t necessarily match what would be nginx’s default behavior.

For example, let’s say that applications you don’t control might link to static assets (images, .webm, etc) that used to be housed in $scheme://$server_name/files but now are saved at $scheme://$server_name/sites/default/files. You could do this with a return 301 directive (as demonstrated above) but supposing you’re not sure what effect directing the client to a different URI would have on the requesting software, you may want to redirect internally rather than trusting the client to do so properly:

location /files/ {
  alias /var/www/html/sites/default/files/;
}

When combined with the normal ability to use any named back reference generated by its location directive’s regexp you can map one URI space to a radically different one:

location ~ ^/files/(?<repoName>[a-zA-Z0-9-]+)/(?<fileName>[a-zA-Z0-9-]+).(?<fileExtension>[a-zA-Z]+)$ {
  alias /nfs/$repoName/$fileExtension/$fileName.$fileExtension;
}

In the above requests for /files/repository/loading.gif will translate into a final path of /nfs/repository/gif/loading.gif while /files/repository/waiting.png will result in /nfs/repository/png/waiting.png being returned. This allows you to structure your static assets according to what makes sense at the file-level and use nginx to translate between the two (albeit at the cost of additional nginx configuration).

A more complex example of the above might be:

location ~ ^/files/(?<repoName>[a-zA-Z0-9-]+)/(?<fileName>[a-zA-Z0-9-]+).(?<fileExtension>[a-zA-Z]+)$ {

  set $parentDirectory "/nfs";

  if ($repoName = "repository"){

    set $parentDirectory "/var/www/html/sites/default/files";

  }

  alias $parentDirectory/$fileName.$fileExtension;

}

Which redirects the request to an entirely directory tree based on the first “directory” present in the requested URI. For instance:

  • /files/repository/waiting.png returns the file /var/www/html/sites/default/files/waiting.png
  • /files/new-files/loading.gif returns the file /nfs/loading.gif

Directive: rewrite

Syntax:

  rewrite regex replacement [last|break|redirect|permanent];

As it stands at this point we already have a great deal of flexibility in routing requests:

  • Delegating routing to the application with try_files
  • Redirecting users’ browsers with a return 301
  • Using alias in conjunction with location back references to map static resources to a vastly different files paths.

We do kind of have a problem, though.

In the case of generating internal redirects with alias we’ll not always have the luxury of simply pointing it at a separate file. For example, the request may be for a PHP script you wish to execute but if you put alias within a location block, it’s going to return the PHP script as a file to the end user instead of hitting any PHP-matching location block you have specified later on. What’s more we may not know before hand what URL’s we need to redirect and only know “well these sorts of URL’s should actually look like this” but neither alias nor return support regular expressions without putting them into a location block which might introduce yet more problems.

Enter the rewrite directive.

The rewrite directive allows us to apply PCRE to a user requested URI and transform it on the fly into another URI/URL with or without redirecting the user. Where rewrite falls short are for instances where you need to return files outside of your normal $document_root or are afraid of running into the redirection loops mentioned in the return directive section.

The valid values for flag are:

  • break: stops processing the rewrite rules in the current block.
  • last: Same as break except it will jump to the location block that matches the new value of $uri
  • redirect: Sends the user  a HTTP/1.1 301 redirect response instructing their browser to go to the new $uri . Continues processing rewrite rules that match $uri.
  • permanent: Sends the user a HTTP/1.1 302 redirect response instructing the client that the URL is no longer considered valid and to use the redirect URL instead. Useful to keep search engines from keeping a deprecated page out of its index. Continues processing rewrite rules that match $uri.

Enough talk, let’s look at some examples:

rewrite /person/([0-9]+).* /user/$1 redirect;

This will redirect the user’s browser using a HTTP/1.1 301 response. For example, they would be redirected from /person/343 to /user/343 without the admin needing to know what user ID’s someone was going to request beforehand. An alternative way of writing the above would be:

location ~ /person/(?userID[0-9]).* {
  return 302 /user/$userID;
}

Which avoids the issues with rewrite (since return always stops execution) but adds more lines to our configuration with regexp that’s more complicated than would likely be required for a rewrite directive (since we have to name our backreferences to make use of them). This pattern is workable if you only have a few redirects, but anything more than 3-4 redirects and your configuration is going to start looking unwieldy.

The documentation recommends against using rewrite if you can avoid it but I live by the principle that a properly formed rewrite directive will never hurt you. As long as you only never omit the flag (instead of break-ing for internal redirects) then redirection loops purely due to nginx should be unlikely.

Directive: fastcgi_split_path_info

Syntax:

  fastcgi_split_path_info regex;

Some web applications make use of a (Fast)CGI feature known as PATH_INFO. PATH_INFO is the embedding of internal application resources within the URL after the fully qualified path to the file. For instance in this URL:

http://example.com/update.php/status

The update.php portion is the (Fast)CGI script to execute, and /status is some identifier that has some meaning to the application as to what it’s a reference for. Not all PHP applications need this, WordPress doesn’t, and Drupal 7 didn’t. Earlier versions of Drupal and the latest (version 8) do need it at parts though.

Whether your application needs it or not will have to be something you determine on your own. I would recommend against setting up PATH_INFO if you don’t need it. The reasons being:

  • If you’re not using PATH_INFO then I would wager most people would expect that despite its name update.php is actually a directory that has a file called status in it.
  • If you’re not using PATH_INFO then adding a directive for it just adds another line to your configuration the uninitiated are going to have to look up and understand. Better to stick to only the stuff that does something useful.

In practice, setting up PATH_INFO is generally harmless though so it’s really the administrator preference.

To actually set it up though, you use the fastcgi_split_path_info directive which instructs nginx (via regular expressions) which part of the $uri is a path to a script, and which is the PATH_INFO portion. A common setting for PATH_INFO is:

fastcgi_split_path_info ^(.+?\.php)(|/.*)$;

which specifies that the file path will be something that ends in .php while the PATH_INFO portion will be an optional piece that follows that and begins with a forwards slash.

It should be noted that if you’re supporting PATH_INFO you may have to adjust your location blocks. Many people use location‘s matching criteria to selectively run only PHP scripts through FastCGI. Often the pattern tested will involve .php being at the end of the location string. This needs to be updated to allow PATH_INFO URL’s to also match this block.

For example a block that looks like:

location ~ \.php$ {

  include fastcgi.conf;
  fastcgi_pass unix:/run/php/php7.0-fpm.sock;

}

will need to be modified to something that looks more like this:

location ~ ^/[a-zA-Z0-9-]+\.php/?.* {

  include fastcgi.conf;
  fastcgi_split_path_info ^(.+?\.php)(|/.*)$;
  fastcgi_pass unix:/run/php/php7.0-fpm.sock;

}

Which allows certain common file name characters and will consider it a PHP script as long as it ends in .php whether or not there’s a forward slash that follows or not. Feel free to tweak the regexp so that it catches all your possible script names though.

Case Study: Drupal 8.x

Internal Redirects That Don’t Break The Site…

Let’s assume that for whatever reason, you wanted to redirect /user-profiles/$userID to the normal /user/$userID page. To do so, you might produce an nginx configuration similar to the following:

server {

  listen 80 default_server;
  root /var/www/html;
  index index.php;
  server_name localhost;

  set $destURI $request_uri;

  location ~ /user-profiles/(?<userID>[0-9]+) {
    set $destURI "/user/$userID";
    rewrite /user-profiles/([0-9]+) /index.php;
  }

  location / {
    try_files $uri $uri/ /index.php?$query_string;
  }

  location ~ (\.php$|^/update.php) {
    include fastcgi.conf;
    fastcgi_param REQUEST_URI $destURI;
    fastcgi_split_path_info ^(.+?\.php)(|/.*)$;
    fastcgi_pass unix:/run/php/php7.0-fpm.sock;
  }

}

For the most part this looks like a normal Drupal 8 configuration. We have a location block that redirects requests for PHP over to a local FastCGI instance after setting PATH_INFO.

There are a few new twists to enable our desired functionality, though:

  • Early on we’re saving $request_uri to a variable called $savedURI
  • We have a new location block that matches our new /user-profiles URI root.
    • The matching criteria extracts a named variable called $userID from the numeric string that follows the forward slash.
    •  We set a new value for $destURI so that the final destination seen by drupal will be /user/$userID
    • We do a rewrite so that the request immediately drops down to the PHP location block without going through try_files again.
  • Once our /user-profiles/$userID request makes it down to our PHP block, we’re ready to have Drupal retrieve the page we’re really after.
    • Only problem is that the default for PHP-FPM’s REQUEST_URI parameter (as set in fastcgi.conf) is to set it equal to $request_uri which in our case might not necessarily be true.
    • We re-set that particular parameter after including the normal set of parameters. By setting it to $destURI we have the effect of setting it to regular ol $request_uri in most cases unless our /user-profiles location block matched this request in which case it will be some version of /user/$userID.
    • We use $destURI instead of a built-in variable because $request_uri is obviously immutable and though $uri is mutable if we try to change it (with rewrite for instance) we’ll change the nginx routing behaviour which at this late stage we don’t want.

Further Reading

Where to go next? Learn more about nginx routing by picking your favorite application or framework (Joomla, Yii2, Django, etc) and try to come up atypical with routing issues and try to solve them with nginx.

Outside of that you can refer to the following to become more familiar with the theory of operation: