Redis + Node.js: Introduction to Caching

I think understanding and using caching is a very important aspect of writing code, so in this article, I’ll explain what caching is, and I'll help you to get started with Redis + Node.js.

What is caching?

Data goes in, data comes out. A simple concept that has been around for quite a while but according to this Node.js survey, many developers don't take advantage of it.

  • Do developers think that caching makes their applications a lot more complex?
  • Is this something that is either done from the beginning or not at all?

Through this introduction we will see that:

  1. Caching can be easily integrated into your application.
  2. It doesn't have to be added everywhere, you can start experimenting with just a single resource.
  3. Even the simplest implementation can positively impact performance.

Integrating with third-party APIs

To show the benefits of caching, I created an express application which integrates with GitHub's public API and retrieves the public repositories for an organization (more precisely only the first 30, see default pagination options).

const express = require('express');
const request = require('superagent');
const PORT = process.env.PORT;

const app = express();

function respond(org, numberOfRepos) {
    return `Organization "${org}" has ${numberOfRepos} public repositories.`;
}

function getNumberOfRepos(req, res, next) {
    const org = req.query.org;
   request.get(`https://api.github.com/orgs/${org}/repos`, function (err, response) {
        if (err) throw err;

        // response.body contains an array of public repositories
        var repoNumber = response.body.length;
        res.send(respond(org, repoNumber));
    });
};

app.get('/repos', getNumberOfRepos);

app.listen(PORT, function () {
    console.log('app listening on port', PORT);
});

Start the app and make a few requests to
http://localhost:3000/repos?org=risingstack
from your browser.

response of caching without redis

Receiving a response from GitHub and returning it through our application took a little longer than half a second.

When it comes to communicating with third-party APIs, we inherently become dependent on their reliability. Errors will happen over the network as well as in their infrastructure. Application overloads, DOS attacks, network failures, not to mention request throttling and limits in cases
of a proprietary API.

How caching can help us to mitigate these problems?

We could temporarily save the first response and serve it later, without actually requesting
anything from GitHub. This would result in less frequent requests, therefore less chance for any of the above errors to occur.

You probably think: we would serve old data which is not necessarily accurate, but think about the data itself.

Is the list of repositories going to change frequently? Probably not, but even if it does, after some time we can just ask GitHub again for the latest data and update our cache.

Redis + Node.js: Using Redis as cache in our application

Redis can be used in many ways but for this tutorial think of it as a key-value (hash map or dictionary) database-server, which is where the name comes from, REmote DIctionary Server.

We are going to use the redis Node.js client to communicate with our Redis server.

To install the Redis server itself, see the official Quick Start guide.

From now on, we assume that you have it installed and it is running.

Let's start by adding the redis client to our dependencies:

npm install redis --save

then creating a connection to a local Redis server:

const express = require('express');
const request = require('superagent');
const PORT = process.env.PORT;

const redis = require('redis');
const REDIS_PORT = process.env.REDIS_PORT;

const app = express();
const client = redis.createClient(REDIS_PORT);

Caching the data

As I already pointed out, Redis can be used as simple as a hash map. To add data to it use:

client.set('some key', 'some value');

if you want the value for 'some key' to expire after some time use setex:

client.setex('some key', 3600, 'some value');

This works similar to set except that some key is removed after the duration (in seconds) specified in the second parameter. In the above example, some key will be removed from Redis after one hour.

We are going to use setex because the number of public repositories for an organization might change in the future.

var repoNumber = response.body.length;
// for this tutorial we set expiry to 5s but it could be much higher
client.setex(org, 5, repoNumber);
res.send(respond(org, repoNumber));

For this demo we are using organization names as keys, but depending on your use-case, you might need a more sophisticated algorithm for generating them.

Retrieving the cached data

Instead of implementing the caching logic inside the app.get callback, we are going to take advantage of express middleware functions, so the resulting implementation can be easily reused in other resources.

Start by adding a middleware function to the existing handler:

app.get('/repos', cache, getNumberOfRepos);

cache have access to the same request object (req), response object (res), and the next middleware function in the application’s request-response cycle like getNumberOfRepos does.

We are going to use this function to intercept the request, extract the organization's name and see if we can serve anything from Redis:

function cache(req, res, next) {
    const org = req.query.org;
    client.get(org, function (err, data) {
        if (err) throw err;

        if (data != null) {
            res.send(respond(org, data));
        } else {
            next();
        }
    });
}

We are using get to retrieve data from Redis:

client.get(key, function (err, data) {
});

If there is no data in the cache for the given key we are simply calling next(), entering the next middleware function: getNumberOfRepos.

Results

Response with Redis Caching

The initial implementation of this application spent 2318ms to serve 4 requests.

Using a caching technique reduced this number to 672ms, serving the same amount of responses 71% faster.

We made one request to the GitHub API instead of four, reducing the load on GitHub and reducing the chance of other communication errors.

During the fifth request, the cached value was already expired. We hit GitHub again (618ms) and cached the new response. As you can see the sixth request (3ms) already came from the cache.

Summary

Although there is a whole science behind caching, even a simple approach like this shows promising results. Similar improvements can be made by caching responses from a database server, file system or any other sources of communication that otherwise would be noticeably slower.

If you have any questions about this topic, let me know in the questions!