Speed Up Your Web App With Redis

When a website serves around 25 unique visitors a day we might be alright some latency/overhead as long as it’s not so slow as to be completely unusable. However the problem gets exponentially worse as you start talking about 300+ unique visitors each day (or worse, per hour). Depending on the web application this could also involve expensive computations that have to be perform and then redone repeatedly.

This is the crux of the issue with “scaling up” a web application. Subpar performance with a low demand application is tolerable, but as the application becomes more important, more people are going to use it and then inefficiencies that used to be tolerable stack up one on top of one another into larger issues. You can ease “scaling up” operations several ways: throwing more resources at the bottleneck, by workload shaping within the application, shaping at a system level, or you can attempt to lower the transaction costs of your more CPU or I/O intensive operations. We’re going to concentrate on the latter.

Specifically we’re going to explore speeding up backend processing by using redis to store arbitrary data structures in memory for later retrieval by similar operations in the future. When possible, caching application data structures is usually preferable to full page caching or the caching of blocks of HTML by a proxy server (such as Squid or Varnish) or FastCGI caching (as with nginx). This is because data structures are often re-used between pages/screens while varnish or nginx will treat each response as potentially unique to the given request and issue unnecessary calls to backend resources when it should already have the data needed to generate the webpage.

This article is written with the expectation that readers will skim the headers for stuff that looks interesting or relevant to them (because many will). So you should read the general sections but should only need to read the non-CLI language-specific sections that pertain to the web app you’re trying to create.

Contents

  1. What is Redis?
  2. Getting The System Ready
  3. Running the Server
  4. (Optional) Installing the client tools
  5. Redis Basics
  6. Basic Redis Operations
  7. Using Redis From Python
  8. Using Redis From Ruby
  9. Using Redis From PHP
  10. Using This Knowledge With Your Web App
  11. Further Reading

What is Redis?

Redis is an in-memory key-value store with optional disk persistence. According to their website it’s also used as a application message broker and “database” (whatever that means) but the majority of Redis deployments seem to be as in-memory caching of an application’s most expensive data structures.

For this reason, I’ve assumed that beyond setting/getting data, you’re probably also interested in setting an expiration so that stale data is purged automatically.

Getting The System Ready

Running the server

Going over three possible means of getting redis installed.

1) Docker:

# docker run -d --name redis -p 6379:6379 redis

2) Ubuntu:

# apt-get install -y redis

3) RHEL/CentOS:

The redis package is available via the EPEL repository:

# yum install -y redis

*(Optional) Installing the client tools

Unfortunately, unlike Ubuntu, Red Hat doesn’t separate the redis command line utilities out from the server process. so even if you went with docker above, you still need to install the redis package, you would just skip the service configuration steps.

For Ubuntu the command is simply:

# apt-get install -y redis-tools

Redis Basics

Basic Redis Operations

Regardless of what language you’re planning on using in the end, I’m using this section to explain core redis concepts in a language neutral way using the redis-cli utility installed above. Similar to the ssh command, the redis-cli command can be ran ad hoc where you give it a single operation or you can run it without specifying an instruction which will result in an interactive prompt being launched. If no host is given to redis-cli (via the -h parameter) then localhost is assumed. In my case I have to specify the IP address I wish to connect to:

root@2287b3975d3a:/# redis-cli -h 994.55.193.122

994.55.193.122:6379> get newKey

"temporaryValue"

In the above example, I connected to the redis instance and retrieved a pre-existing key.

Atomic Transactions

Redis allows many commands to be queued and issued at once. This enables applications written in higher level languages to atomically perform many independent operations at once. This reduces the number of TCP round trips required for a single transaction and helps improve the scalability of applications that use redis.

To perform a transaction, you just issue a MULTI instruction (no arguments) and the server begins queuing the operations you send it until you issue an EXEC call. For example:

994.55.193.122:6379> MULTI

OK

SET "firstPipeKey" "First CLI Value"

QUEUED

994.55.193.122:6379> SET "secondPipeKey" "Second CLI Value"

QUEUED

994.55.193.122:6379> SET "thirdPipeKey" "Third CLI Value"
QUEUED
994.55.193.122:6379> EXEC
1) OK
2) OK
3) OK
994.55.193.122:6379>

In the above, we could have issued a DISCARD instruction at any time before the EXEC and all queued operations would have been discarded.

Using Redis From Python

Without delving into Flask or Django, we’re going to write a simple shell script that shows how you connect to redis from python generally. If you’re running a web application in one of the above frameworks, taking the steps here should be relatively easy.

Using pip you should be able to install the redis-py module (which is the de facto standard for most people using redis from Python) by just issuing the command pip install redis

Now that we have the module for connecting to redis we can do some of our test work. Let’s start out simple. In the Python command line interpreter:

>>> import redis
>>> redisConnection = redis.Redis(host='994.55.193.122')
>>> redisConnection.set('newKey', 'oldValue')
True
>>> redisConnection.get('newKey')
'oldValue'

In the example above, I had the redis instance running on a different IP than localhost so I had to override the host= parameter when instantiating the class to point to the IP/hostname of the redis server. Similarly you can use port= to change the port number, or db= to change the database you’re connecting to. It’s worth noting that on the backend a connection pool will be created but a connection to the remote service won’t happen until you begin actually using redis in your script.

After that we just issue save a soon-to-be-replaced value at newKey and then retrieve it again. Nothing to spectacular. Let’s set a value with an 15 second expiration:

>>> redisConnection.set('newKey', 'temporaryValue', ex=15)
True
>>> redisConnection.get('newKey')
'temporaryValue'
>>> sleep(15)
>>> redisConnection.get('newKey')
>>>

::Pipelines

To start a pipeline transaction, you just need to create a special object with redisConnection.pipeline() and then treat it like you would any other connection object. Let’s test it out by setting a bunch of values and retrieve a single pre-existing value:

>>> redisPipeline = redisConnection.pipeline()

>>> redisPipeline.set('firstPipeKey', 'First Python Value')
Pipeline<ConnectionPool<Connection<host=994.55.193.122,port=6379,db=0>>>

>>> redisPipeline.set('SecondPipeKey', 'Second Python Value')
Pipeline<ConnectionPool<Connection<host=994.55.193.122,port=6379,db=0>>>

>>> redisPipeline.set('thirdPipeKey', 'Third Python Value')
Pipeline<ConnectionPool<Connection<host=994.55.193.122,port=6379,db=0>>>

>>> redisPipeline.get("newKey")
Pipeline<ConnectionPool<Connection<host=994.55.193.122,port=6379,db=0>>>

>>> redisPipeline.execute()
[True, True, True, 'temporaryValue']
>>>

You can see that we use our connection object to redis to create a pipeline object pointed at our redis server and database. This object is also returned with each get and set operation. Once executed it returned a list of values, ordered according to how they were issued, of all the return values that would have been returned if we had ran the executed commands synchronously.

Please note that the default behavior of the redis-py module is NOT to perform pipeline operations atomically. If you need the operations to be wrapped in a MULTI/EXEC block when communicating to the server you need to call .multi() on your pipeline object (redisPipeline.multi() above) in order to switch the pipeline into “buffered” mode where changes can be discarded at any time.

Using Redis From Ruby

Similar to the above, I’m not going to show any examples within Sinatra or Rails even though redis is primarily useful for web apps. The generic examples here should be easily replicable within the context of any decent framework you could be running within.

Install the ruby gem:

root@2287b3975d3a:/# gem install redis
Fetching: redis-4.0.1.gem (100%)
Successfully installed redis-4.0.1
Parsing documentation for redis-4.0.1
Installing ri documentation for redis-4.0.1
Done installing documentation for redis after 2 seconds
1 gem installed

Sample script:

require 'redis';

redisConnection = Redis.new(host: "994.55.193.122")
redisConnection.set("newKey", "value from ruby")
redisConnection.expire("newKey", "15")

keyValue = redisConnection.get("newKey") ? redisConnection.get("newKey") : "No Key With That Name"
puts "newKey Value: "+ keyValue
sleep 16
keyValue = redisConnection.get("newKey") ? redisConnection.get("newKey") : "No Key With That Name"
puts "newKey Value: "+ keyValue

The above script sets a temporary value at newKey which expires in 15 seconds. We then print the value redis is returning for that key, wait 16 seconds (to allow reclaim and avoid a race) and attempt the same retrieval. If you run the above locally, you should see it able to issue a successfully .get method call the first time but not the second time.

::Pipelines

Similar to the python module, the redis gem supports the use of pipelines to issue several redis commands in a single TCP trip. For example:

require 'redis';

redisConnection = Redis.new(host: "994.55.193.122")

redisConnection.pipelined do
  redisConnection.set "firstPipeKey", "First Ruby Value"
  redisConnection.set "secondPipeKey", "Second Ruby Value"
  redisConnection.set "thirdPipeKey", "Third Ruby Value"
end

puts "firstPipeKey value:"+ redisConnection.get("firstPipeKey")
puts "secondPipeKey value:"+ redisConnection.get("secondPipeKey")
puts "thirdPipeKey value:"+ redisConnection.get("thirdPipeKey")

If you read the python section you’ll notice this is is functionally analogous to our python pipeline. It works in ruby much the same way except where the equivalent of .execute() is implied when the iterator block ends. The above script sets three keys at once asynchronously and then uses a synchronous .get call to retrieve the values from the server.

Using Redis From PHP

Install the PECL extension:

 root@2287b3975d3a:/# pecl install redis
 downloading redis-3.1.4.tgz ...
 Starting to download redis-3.1.4.tgz (199,559 bytes)
 ..................done: 199,559 bytes
 20 source files, building
 running: phpize
 Configuring for:
 PHP Api Version: 20151012
 Zend Module Api No: 20151012
 Zend Extension Api No: 320151012
 enable igbinary serializer support? [no] :
 building in /tmp/pear/temp/pear-build-defaultuserPAA1mj/redis-3.1.4
 running: /tmp/pear/temp/redis/configure --with-php-config=/usr/bin/php-config --enable-redis-igbinary=no
 checking for grep that handles long lines and -e... /bin/grep
 checking for egrep... /bin/grep -E
 [...snip...]

Output above clearly truncated to keep this article underneath 100 pages. Now we need to enable the extension in the appropriate php.ini file (or inclusion thereof). Knowing the proper .ini file to write to varies according to the install so you’ll have to figure that out on your own. Let’s enable it on the CLI php.ini here though:

root@2287b3975d3a:/# echo "extension=redis.so" >> /etc/php/7.0/cli/conf.d/90-redis.ini

If everything went well, we should now see redis in the module listing:

root@2287b3975d3a:/# php -m | grep -i redis
redis
root@2287b3975d3a:/#

Now let’s try a basic script to manipulate redis from PHP:

<?php

$redisConnection = new Redis();
$redisConnection->connect('994.55.193.122');

$redisConnection->set('newKey', 'Value From PHP');
$redisConnection->setTimeout('newKey', 15);

$keyValue = $redisConnection->get("newKey");
print "newKey Value: ". (strlen($keyValue) ? $keyValue : "No value set for that key.") ."\n";

sleep(16);

$keyValue = $redisConnection->get("newKey");
print "newKey Value: ". (strlen($keyValue) ? $keyValue : "No value set for that key.") ."\n";

?>

The same as the Ruby or Python examples (if you read those sections) the above script merely sets a new value at newKey and sets it to expire (or “timeout” in the PHP’s case) in 15 seconds. We then immediately retrieve the value from the server, then pause for 16 seconds before attempting to retrieve it again. If you get a value the first time but not the second time everything worked as expected.

::Pipelines

Similar to Ruby and Python above, you can instruct the redis PHP extension to issue all the statements as part of a transaction. You do this by making a call to $redisConnection->multi() which returns a new version of the redisConnection, except this one is in “Multi mode”:

<?php

$redisConnection = new Redis();
$redisConnection->connect('994.55.193.122');

$redisConnection->multi()
 ->set('firstPipeKey', 'First PHP Value')
 ->set('secondPipeKey', 'Second PHP Value')
 ->set('thirdPipeKey', 'Third PHP Value')
 ->exec();

print "firstPipeKey Value: ". $redisConnection->get("firstPipeKey") ."\n";
print "secondPipeKey Value: ". $redisConnection->get("secondPipeKey") ."\n";
print "thirdPipeKey Value: ". $redisConnection->get("thirdPipeKey") ."\n";

?>

Using this knowledge with your webapp

Obviously, I can’t write an article that walks you through how to best implement Redis on your web app, but I can offer some general advice:

  • Remember: Cache misses are expensive, so your number one goal when implementing any sort of caching mechanism in a software stack is to end up with many cache hits and very few misses.
    • Try to restrict your use of Redis to operations that are either taking too long or generating too much load.
      • The more you have to communicate with Redis, the busier it’ll be. If the operation isn’t particularly expensive, just do it on each request.
      • If you begin putting low traffic items into your cache you’ll reduce overall system performance by forcing your application to check for the existence of an item that will hardly actually ever be there already.
  • In-memory databases like memcached and redis have little to no security mechanisms.
    • Neither has a particular strong sense of “user.” They each have users obviously, but the notion is not well fleshed out.
    • Password protection in Redis is more appropriate for concerns about about system stability than actual security (i.e keeping honest people from accidentally using the wrong thing).
    • Redis doesn’t support any form of encryption natively. The recommendation of Redis Labs is to use a third party tools to proxy the connection from trusted to untrusted networks. This also means passwords sent to Redis are sent in the clear (again why it’s a stability mechanism and not a security one).
    • For these reasons, you should only expose Redis to systems that actually need to be able to access it.
  • Monitor your key performance metrics
    • This is valuable feedback for refactoring your code to match actual workfloads
      • You may have calls to Redis where they don’t need to be or they’re missing from where they need to be.
      • You may also have to migrate to a Redis cluster or add a node to an existing cluster.
    • For ad hoc monitoring, you can use the info stats command inside of redis-cli and check the values for keyspace_hits and keyspace_misses to see what your hit ratio actually is.
  • No matter what you’re planning on using Redis for, we’ve just scratched the surface with the above. At this point though you should be able to setup a basic Redis instance and use it to implement

Further Reading