Usually an application can handle its own routing. After all, the developers created the application so it stands to reason that they’ll know what URL’s correspond to what components in their application.
Occasionally, though you’ll need to delve into the server configuration so that HTTP requests are properly mapped to application requests. The reasons can be varied but usually amounts to not having sufficient control of the application itself, such as a vendor application. My hope here is that no matter why you need to manage application routing, you’re able to get a firm grasp on the subject after reading this.
Contents
- Variables Relevant to Application Routing
- Directive: try_files
- Directive: return
- Directive: alias
- Directive: rewrite
- Directive: fastcgi_split_path_info
- Case Study: Drupal 8.x
- Further Reading
Variables Relevant to Application Routing
It’s important to understand what variables are often useful with application routing. The code examples in this post are easier to understand if you’re familiar with the following:
$uri
:: A mutable variable that represents the current target of the request. It can be modified by several other directives includingrewrite
andtry_files
.$request_uri
:: An immutable variable containing the unaltered user request (i.e what the user has typed into the URL on their browser).$scheme
:: The protocol in use when communicating with the client. Is usually eitherhttp
orhttps
.$query_string
:: The string of text in the URL after a question mark, non-inclusive. Also sometimes called$args
.$server_name
:: The virtual host’s value for theserver_name
directive.
A Note About Variables Generated from Regular Expressions…
One of the features of regular expressions is the possibility of making back references to previously delineated groups of characters. I won’t go over the details of back references (that’s a different article) but suffice it to say that enclosing patterns in parentheses forms a match group and thus produces a back reference. These back references can also be named by prefixing the pattern with ?<variableName>
for example:
(?<myVar>[a-zA-Z0-9]+)
Would match an alphanumeric string of text and store it in an nginx variable called $myVar
.
Directive: try_files
Syntax: try_files file ... uri; try_files file ... =code;
With no configuration options, nginx will only return files that are 100% matches for what the user gave as their request URI. This works at a basic level but often you’ll want your application to be able to handle things like requests for unknown files (to generate 404’s) or to generate pages that don’t exist on the physical filesystem but are internal resources as indicated by the user’s request URI.
The function of try_files
is to configure an order of precedence for satisfying a request by testing successive locations for the file. If it can’t find a good result by any of the given means it just returns the output of the last option give. An example might be:
try_files $uri $uri/ /index.php?$query_string;
Which would first attempt to locate a regular file matching the given URI (outside of anything set by try_files
), failing that it’ll try to find a directory named that, and finally it’ll resort to just invoking the sites /index.php
and feed it the query string the visitor gives.
As you can imagine, you don’t strictly speaking need a try_files
directive in your configuration, but you probably should have one and most applications won’t function unless you have a correctly configured try_files
configured. For instance, Drupal will try to generate “clean URL’s” for the various pages with URI’s like /user/2
or /node/3
and without using try_files
to default to asking Drupal’s index.php
(as show above) to return an appropriate response all nginx will do is check to see if a file exists at /node/3
and when it doesn’t, it’ll generate an nginx-based 404 message. No bueno.
It should also be noted that due to the internal redirect try_files
will modify the content of $uri
meaning anything that needs to match what the user has typed in their web browser will thereafter need to reference $request_uri
instead as the two won’t be identical anymore (this is what fastcgi.conf
does internally).
Directive: return
Syntax: return code [text]; return code URL; return URL;
This directive is fairly basic, it stops nginx execution and either redirects the user elsewhere (for 30x
HTTP return codes), returns a page specified by error_page
or returns some text directly to the requester.
For example, a simple redirect from one URI to another could be done in a location block:
location /oldie { return 301 /admin/structure; }
The URL given to redirect
can also include variables and full URL’s which makes it easy to setup redirects from one application page to the analogous page on an application which supplants the older one (just as an example). One way of doing something like that might be:
location /user/index { return 301 $scheme://$server_name/admin/people?$query_string; }
Will redirect users to the appropriate page on the same hostname, preserving HTTPS or HTTP (via $scheme
) any query string given (via $query_string
). This will potentially sending the users back through a load balancer and ultimately redirected to a different backend server.
Upstream documentation often recommends using redirect 301
in lieu of using something like rewrite
for redirects due to how nginx configuration is evaluated and the possibility for redirect loops therein. However there are some instances where you will need to concisely redirect users by reformatting the URI’s for which there’s no real alternative to rewrite
given that return
doesn’t support regular expressions.
Directive: alias
Syntax: alias fullPath;
Redirecting the users back through the load balancer with return
may be undesirable or the client software may have unpredictable behavior in response to redirect, though. In those cases, handling the changed URL purely through an internal redirect may be what you need. An “internal redirect” is nginx-speak for any redirection that only affects nginx’s behavior without exposing anything to the client (such as via a HTTP/1.0 301
response).
The alias
directive provides this ability at a pretty basic level. It’s purpose is to remap the current $uri
such that the file it ends up looks for doesn’t necessarily match what would be nginx’s default behavior.
For example, let’s say that applications you don’t control might link to static assets (images, .webm
, etc) that used to be housed in $scheme://$server_name/files
but now are saved at $scheme://$server_name/sites/default/files
. You could do this with a return 301
directive (as demonstrated above) but supposing you’re not sure what effect directing the client to a different URI would have on the requesting software, you may want to redirect internally rather than trusting the client to do so properly:
location /files/ { alias /var/www/html/sites/default/files/; }
When combined with the normal ability to use any named back reference generated by its location
directive’s regexp you can map one URI space to a radically different one:
location ~ ^/files/(?<repoName>[a-zA-Z0-9-]+)/(?<fileName>[a-zA-Z0-9-]+).(?<fileExtension>[a-zA-Z]+)$ { alias /nfs/$repoName/$fileExtension/$fileName.$fileExtension; }
In the above requests for /files/repository/loading.gif
will translate into a final path of /nfs/repository/gif/loading.gif
while /files/repository/waiting.png
will result in /nfs/repository/png/waiting.png
being returned. This allows you to structure your static assets according to what makes sense at the file-level and use nginx to translate between the two (albeit at the cost of additional nginx configuration).
A more complex example of the above might be:
location ~ ^/files/(?<repoName>[a-zA-Z0-9-]+)/(?<fileName>[a-zA-Z0-9-]+).(?<fileExtension>[a-zA-Z]+)$ { set $parentDirectory "/nfs"; if ($repoName = "repository"){ set $parentDirectory "/var/www/html/sites/default/files"; } alias $parentDirectory/$fileName.$fileExtension; }
Which redirects the request to an entirely directory tree based on the first “directory” present in the requested URI. For instance:
/files/repository/waiting.png
returns the file/var/www/html/sites/default/files/waiting.png
/files/new-files/loading.gif
returns the file/nfs/loading.gif
Directive: rewrite
Syntax:
rewrite regex replacement [last|break|redirect|permanent];
As it stands at this point we already have a great deal of flexibility in routing requests:
- Delegating routing to the application with
try_files
- Redirecting users’ browsers with a
return 301
- Using
alias
in conjunction withlocation
back references to map static resources to a vastly different files paths.
We do kind of have a problem, though.
In the case of generating internal redirects with alias
we’ll not always have the luxury of simply pointing it at a separate file. For example, the request may be for a PHP script you wish to execute but if you put alias
within a location
block, it’s going to return the PHP script as a file to the end user instead of hitting any PHP-matching location
block you have specified later on. What’s more we may not know before hand what URL’s we need to redirect and only know “well these sorts of URL’s should actually look like this” but neither alias
nor return
support regular expressions without putting them into a location
block which might introduce yet more problems.
Enter the rewrite
directive.
The rewrite
directive allows us to apply PCRE to a user requested URI and transform it on the fly into another URI/URL with or without redirecting the user. Where rewrite
falls short are for instances where you need to return files outside of your normal $document_root
or are afraid of running into the redirection loops mentioned in the return
directive section.
The valid values for flag are:
break
: stops processing the rewrite rules in the current block.last
: Same asbreak
except it will jump to thelocation
block that matches the new value of$uri
redirect
: Sends the user aHTTP/1.1 301
redirect response instructing their browser to go to the new$uri
. Continues processing rewrite rules that match$uri
.permanent
: Sends the user aHTTP/1.1 302
redirect response instructing the client that the URL is no longer considered valid and to use the redirect URL instead. Useful to keep search engines from keeping a deprecated page out of its index. Continues processing rewrite rules that match$uri
.
Enough talk, let’s look at some examples:
rewrite /person/([0-9]+).* /user/$1 redirect;
This will redirect
the user’s browser using a HTTP/1.1 301
response. For example, they would be redirected from /person/343
to /user/343
without the admin needing to know what user ID’s someone was going to request beforehand. An alternative way of writing the above would be:
location ~ /person/(?userID[0-9]).* { return 302 /user/$userID; }
Which avoids the issues with rewrite
(since return
always stops execution) but adds more lines to our configuration with regexp that’s more complicated than would likely be required for a rewrite
directive (since we have to name our backreferences to make use of them). This pattern is workable if you only have a few redirects, but anything more than 3-4 redirects and your configuration is going to start looking unwieldy.
The documentation recommends against using rewrite
if you can avoid it but I live by the principle that a properly formed rewrite
directive will never hurt you. As long as you only never omit the flag (instead of break
-ing for internal redirects) then redirection loops purely due to nginx should be unlikely.
Directive: fastcgi_split_path_info
Syntax: fastcgi_split_path_info regex;
Some web applications make use of a (Fast)CGI feature known as PATH_INFO
. PATH_INFO
is the embedding of internal application resources within the URL after the fully qualified path to the file. For instance in this URL:
http://example.com/update.php/status
The update.php
portion is the (Fast)CGI script to execute, and /status
is some identifier that has some meaning to the application as to what it’s a reference for. Not all PHP applications need this, WordPress doesn’t, and Drupal 7 didn’t. Earlier versions of Drupal and the latest (version 8) do need it at parts though.
Whether your application needs it or not will have to be something you determine on your own. I would recommend against setting up PATH_INFO
if you don’t need it. The reasons being:
- If you’re not using
PATH_INFO
then I would wager most people would expect that despite its nameupdate.php
is actually a directory that has a file calledstatus
in it. - If you’re not using
PATH_INFO
then adding a directive for it just adds another line to your configuration the uninitiated are going to have to look up and understand. Better to stick to only the stuff that does something useful.
In practice, setting up PATH_INFO
is generally harmless though so it’s really the administrator preference.
To actually set it up though, you use the fastcgi_split_path_info
directive which instructs nginx (via regular expressions) which part of the $uri
is a path to a script, and which is the PATH_INFO
portion. A common setting for PATH_INFO
is:
fastcgi_split_path_info ^(.+?\.php)(|/.*)$;
which specifies that the file path will be something that ends in .php
while the PATH_INFO
portion will be an optional piece that follows that and begins with a forwards slash.
It should be noted that if you’re supporting PATH_INFO
you may have to adjust your location blocks. Many people use location
‘s matching criteria to selectively run only PHP scripts through FastCGI. Often the pattern tested will involve .php
being at the end of the location string. This needs to be updated to allow PATH_INFO
URL’s to also match this block.
For example a block that looks like:
location ~ \.php$ { include fastcgi.conf; fastcgi_pass unix:/run/php/php7.0-fpm.sock; }
will need to be modified to something that looks more like this:
location ~ ^/[a-zA-Z0-9-]+\.php/?.* { include fastcgi.conf; fastcgi_split_path_info ^(.+?\.php)(|/.*)$; fastcgi_pass unix:/run/php/php7.0-fpm.sock; }
Which allows certain common file name characters and will consider it a PHP script as long as it ends in .php
whether or not there’s a forward slash that follows or not. Feel free to tweak the regexp so that it catches all your possible script names though.
Case Study: Drupal 8.x
Internal Redirects That Don’t Break The Site…
Let’s assume that for whatever reason, you wanted to redirect /user-profiles/$userID
to the normal /user/$userID
page. To do so, you might produce an nginx configuration similar to the following:
server { listen 80 default_server; root /var/www/html; index index.php; server_name localhost; set $destURI $request_uri; location ~ /user-profiles/(?<userID>[0-9]+) { set $destURI "/user/$userID"; rewrite /user-profiles/([0-9]+) /index.php; } location / { try_files $uri $uri/ /index.php?$query_string; } location ~ (\.php$|^/update.php) { include fastcgi.conf; fastcgi_param REQUEST_URI $destURI; fastcgi_split_path_info ^(.+?\.php)(|/.*)$; fastcgi_pass unix:/run/php/php7.0-fpm.sock; } }
For the most part this looks like a normal Drupal 8 configuration. We have a location
block that redirects requests for PHP over to a local FastCGI instance after setting PATH_INFO
.
There are a few new twists to enable our desired functionality, though:
- Early on we’re saving
$request_uri
to a variable called$savedURI
- We have a new
location
block that matches our new/user-profiles
URI root.- The matching criteria extracts a named variable called
$userID
from the numeric string that follows the forward slash. - We set a new value for
$destURI
so that the final destination seen by drupal will be/user/$userID
- We do a
rewrite
so that the request immediately drops down to the PHPlocation
block without going throughtry_files
again.
- The matching criteria extracts a named variable called
- Once our
/user-profiles/$userID
request makes it down to our PHP block, we’re ready to have Drupal retrieve the page we’re really after.- Only problem is that the default for PHP-FPM’s
REQUEST_URI
parameter (as set infastcgi.conf
) is to set it equal to$request_uri
which in our case might not necessarily be true. - We re-set that particular parameter after including the normal set of parameters. By setting it to
$destURI
we have the effect of setting it to regular ol$request_uri
in most cases unless our/user-profiles
location block matched this request in which case it will be some version of/user/$userID
. - We use
$destURI
instead of a built-in variable because$request_uri
is obviously immutable and though$uri
is mutable if we try to change it (withrewrite
for instance) we’ll change the nginx routing behaviour which at this late stage we don’t want.
- Only problem is that the default for PHP-FPM’s
Further Reading
Where to go next? Learn more about nginx routing by picking your favorite application or framework (Joomla, Yii2, Django, etc) and try to come up atypical with routing issues and try to solve them with nginx.
Outside of that you can refer to the following to become more familiar with the theory of operation:
- (rascaldev) Configuring nginx for PHP-FPM
- Converting rewrite rules (reference for converting Apache rewrites into nginx rewrites)
- How to Create NGINX Rewrite Rules
- Update Your Nginx Config for Drupal 8