LJ Archive

Cache in Drupal 7

Janez Urevc

Issue #888, April 2068

Drupal is a powerful and very flexible, but often heavy, platform. In this article, I show how to utilize all of its advantages and still make it perform fast.

Performance is one of the most important factors that should be considered when building Web sites. Users expect sites to be good-looking, feature-rich, loaded with multimedia and interactive. To achieve all these things, you often need to build heavy Web applications, which are not always performant by their nature. To achieve the best balance between features and performance, there are various techniques to use, and caching is one of the most popular.

It is relatively simple to build a small personal Web site or blog that will perform well. Like most other Web platforms, Drupal also provides built-in mechanisms that will allow you to achieve sufficient performance for such a site in a relatively simple way. However, things become much more interesting when you have to build a site with a few hundred-thousand content items and tens of millions of visits per month. The solution definitely will not be straightforward, but it can be done. Drupal and some of its contrib modules provide mechanisms that are powerful and flexible enough to build sites of this size successfully. I describe some of them here.

The solutions and techniques discussed in this article were tested while building a relatively large media site, which was migrated to Drupal 7 in December 2012. It currently is the biggest Drupal site in the wider Balkan region.

What Is Cache?

Cache is a frequently used technique in computing, for both hardware and software. The basic idea behind cache is to identify data that is used often and store it on a medium where it can be fetched as quickly as possible. When building Web applications, you usually want to load as much data as possible from memory (which is fast). On the other hand, you mostly need permanent storage for it. Since RAM is not such a medium, you have to keep the data on disk, which is very slow in most cases. This leads to a typical architecture for Web applications that uses disk to store data permanently and RAM as an efficient cache medium for the most-used pieces of it.

There are many cases when you usually will want to use cache in Web development, including the following:

  • Heavy database queries.

  • Output that is “expensive” to render.

  • Complex numerical calculations.

  • Code with substantial use of I/O (disk).

  • Data that was fetched from remote servers.

Two caches should be considered by every PHP+MySQL developer:

  • APC (Alternative PHP Cache) is a PHP extension that implements op-code cache. Op-code cache saves overhead that is caused by the fact that PHP compiles its scripts on every request. Because most of the scripts don't change often, APC saves time by saving compiled versions of scripts into shared memory to be available for the next requests.

  • MySQL query cache saves query results to be available the next time the same query is executed. Substantial savings can be achieved this way, because repeating queries happens a lot in Web apps.

Cache in Drupal 7

Cache is implemented as a very flexible and pluggable framework in Drupal 7. Here are three of the most important concepts to understand:

  • A cache item is a complete piece of data that is stored in cache. A developer addresses each cache item with a key that obviously should be unique.

  • Cache bin is a group of similar cache items. There are a lot of different cache bins, as Drupal allows its modules to create their own. Each cache item generally can be saved to any bin available in the system.

  • Cache back end is a technology platform that is used to store cache. Each cache bin generally can be stored in a different back end. There is support for numerous back ends in Drupal's contrib ecosystem (such as APC, Memcached, Redis and so on). It is not hard to implement support for a new back end if your project requires something that has not been supported yet.

If you develop a Drupal module and you want to utilize the advantages of Drupal's cache framework, you should do something like this:


<?php
if ($cache = cache_get('my_cache_item', 'cache')) {
  $data = $cache->data;
}
else {
  // some heavy calculations...
  
  cache_set('my_cache_item', $data, 'cache');
}

// Use $data
?>

The cache item in the above snippet is keyed with my_cache_item and stored in a bin called cache, which is obviously Drupal's default bin. Check Drupal's cache_set() (dgo.to/a/cache_set) and cache_get() (dgo.to/a/cache_get) functions if you need more information.

The following is a list of cache bins provided by Drupal core:

  • cache — default (generic) cache bin.

  • cache_block — storage for cached versions of built (rendered) blocks.

  • cache_bootstrap — bin for data required to bootstrap Drupal.

  • cache_field — storage for cached versions of Drupal's fields.

  • cache_filter — bin for the Filter module to store already filtered pieces of text.

  • cache_form — bin for the form system to store recently built forms and their storage data to be used in subsequent page requests.

  • cache_image — bin used to store information about image manipulations that are in progress.

  • cache_menu — storage for the menu system to store router information and menu link trees.

  • cache_page — bin used to store compressed pages for anonymous users.

  • cache_path — bin for path alias lookup.

  • cache_update — bin for the Update module to store information about available releases, fetched from the central server.

Cache Back Ends

A lot of different cache back ends can be used together with Drupal. I describe some of the most popular ones here that also have support in at least one Drupal module. Back ends that are most commonly used in the Drupal ecosystem also are very popular in general Web development. Each of them has its own advantages and disadvantages, and each will satisfy different needs.

Some modules implement more than just cache support. They often implement sessions and semaphore (locking) handlers too. That also will speed up your site, as a typical Drupal Web site uses those mechanisms frequently. Refer to each module's project page and README.txt file in order to utilize their advantages.

Database Cache (Default)

Drupal's default cache implementation is obviously very easy to use, as you get it “for free” when you install Drupal. It also doesn't add any additional complexity to your server stack, because it uses a database (usually MySQL). This back end should work totally fine for most smaller sites. The problem is its speed, because it writes all data directly to disk. Writes are especially very slow, as MySQL's query cache helps to boost read performance. Another disadvantage is the fact that it uses a database that already is under very heavy load on high-traffic Web sites.

APC User Cache

I already mentioned APC (drupal.org/project/apc). It provides a lot of performance improvement, as PHP files generally do not change much. It is relatively simple to install and very easy to configure. It should be used on every Web server that runs PHP (production and development).

Another feature of APC that people are not always aware of is the user cache. The user cache uses APC's shared memory to store users' (developers') custom data, and it definitely can be used for cache purposes. It is very fast, because it stores data directly in PHP's memory. Assuming that you already have APC installed on your server (and you should), it also does not add further complexity to the server stack. Data in APC is not stored permanently; you will lose all your data if the power goes down. Another disadvantage is the fact that each server maintains its own version of cache. This will cause cache warm-up to take longer in high-availability (more Web servers) setups. This also can lead to synchronization problems.

Memcached

Memcached is an open-source, high-performance, distributed memory object caching system (memcached.org). It is very popular and is used on many well-known sites. It runs as a separate dæmon, which means that it adds another level of complexity to the server stack. It is very easy to configure and administer, so that shouldn't be a problem. Applications communicate with it via the network (TCP or UDP). You will have to install another PHP extension in order to use it with PHP applications. It stores all data in memory, which makes the data non-permanent. All Web servers will use the same Memcached pool in high-availability environments, which is another big advantage. See drupal.org/project/memcache for more information.

Redis

Redis is an open-source, advanced key-value store (redis.io). It is very similar to Memcached. It is fast, centralized and relatively easy to configure. It stores your data in memory, but it sooner or later writes it to disk, which makes it permanent. Write frequency can be configured, which gives you the power to balance between performance and security of your data. It needs a PHP extension, just as Memcached does. It also comes with a PHP library. Redis is very fast, but sooner or later it will need access to I/O, which could cause some performance overhead in environments that already are under heavy load. See also drupal.org/project/redis.

MongoDB

MongoDB is a scalable, high-performance, open-source NoSQL (document-oriented) database (www.mongodb.org). It is much faster than MySQL, and it stores data permanently. You will need to install a PHP extension in order to use it. It is probably the most complex to configure and administer of all the back ends I describe here. MongoDB's other advantage is its ability to store Drupal's content, which makes for a really powerful content database in addition to a powerful cache back end.

Figure 1. This performance overview of all the back ends described in this article shows that APC is the fastest and MySQL (DB) the slowest. Redis and Memcached are relatively similar when it comes to performance, but that could change if Redis would write its data to disk often. MongoDB is very fast, especially when it comes to writes.

How to Configure Cache Back Ends

You will need to do some configuration in Drupal's settings.php in order to use different cache back ends. First, you need to include cache back-end implementations:

# Memcache
include_once('./includes/cache.inc');
include_once('./sites/all/modules/memcache/memcache.inc');
$conf['memcache_key_prefix'] = 'drupal';

# Redis
$conf['cache_backends'][] = 
 ↪'sites/all/modules/redis/redis.autoload.inc';
$conf['redis_client_interface'] = 'PhpRedis';  //Library or extension

# APC
$conf['cache_backends'][] = 
 ↪'sites/all/modules/apc/drupal_apc_cache.inc';

# Mongo
$conf['cache_backends'][] = 
 ↪'sites/all/modules/mongodb/mongodb_cache/mongodb_cache.inc';

The paths in the above example should be adjusted to your specific installation. When you include cache back ends, you need to set the default:

$conf['cache_default_class'] = 'MemCacheDrupal';
//$conf['cache_default_class'] = 'Redis_Cache';
//$conf['cache_default_class'] = 'DrupalAPCCache';
//$conf['cache_default_class'] = 'DrupalMongoDBCache';
//$conf['cache_default_class'] = 'DrupalDatabaseCache';

In the above example, Memcached is configured as a default cache back end. The appropriate line should be uncommented in order to select a different option. Besides the default back end, where all bins will go, you also can define per-bin back ends:

// cache_form bin
$conf['cache_cache_form'] = 'Redis_Cache';

// cache_menu bin
$conf['cache_cache_menu'] = 'DrupalMongoDBCache';

// etc....

In the above example, the bin cache_form is configured to go into Redis, and the bin cache_menu is set to be stored in MongoDB.

Cache Implementations in Drupal

Drupal core and some of its modules utilize the cache framework to implement user-friendly, easy-to-configure caching.

Page and Block Cache

Page and block cache are implemented in Drupal core, and you'll get them with every Drupal installation. Page cache will save the HTML output of the entire page for anonymous users and store it to the cache_page bin. This HTML will be displayed to all anonymous users, which will bypass most of the Drupal bootstrap and the entire page-generation process. This will save a huge amount of time and server resources.

Block cache, on the other hand, will save HTML output of Drupal blocks (parts of a page). Blocks also can be cached for registered users, which will give you some performance improvement for non-anonymous traffic. Blocks are stored in the cache_block bin.

Both are very easy to enable and configure by navigating to admin/config/devel/performance.

Figure 2. Configuration for Page and Block Cache

Views Cache

Views is the most frequently used Drupal module (drupal.org/project/views). It provides a flexible method for Drupal site designers to control how lists and tables of content, users, taxonomy terms and other data are presented. There probably aren't many serious Drupal Web sites that don't use this module.

Views implement a cache mechanism that will save the result of a view's query and its HTML output. Views will operate with its cache on a timely basis, but you can implement your own plugin to achieve totally customized behavior. The views cache can be configured per every view, which gives you a lot of flexibility. Cache items will be stored in the views_cache_data bin.

Usage of the views cache should be considered as an option on every Drupal Web site that uses the Views module.

Figure 3. Configuration for the Views Cache

Panels Cache

Panels is another powerful Drupal module (drupal.org/project/panels). It allows you to build complex page layouts and configure and control them in a really intuitive way. Besides its many other features, it also has a powerful cache implementation. Panels, by default, will allow you to cache almost everything and configure granulation and lifetime. Again, this can be completely customized with your own custom plugins, resulting in even more flexibility.

The panels cache is configured on the configuration page for each panel. It is usually configured on a per-pane (part of a page) basis.

Figure 4. Configuration for the Panels Cache

How to Optimize Your Drupal Page for Performance

Cache is a very powerful tool in Drupal, but it won't help you much if you don't optimize your queries and code first. Cache will just save you from running them over and over again, but you still will have to execute your slow queries and run your slow code from time to time. However, Drupal provides a lot of tools to optimize those things too.

The Devel module (drupal.org/project/devel) implements a handy query log, which will print the list of all queries that were executed during a given request in the footer of your page. This allows you to identify slow database queries and optimize them. Doing some minor modifications of a query (view) or an index on a table will help you a lot in most cases.

Figure 5. Configuration of Devel's Query Cache

Figure 6. Example of a Displayed Query Log (with Slow Queries)

Another handy tool is the profiler (en.wikipedia.org/wiki/Profiling_(computer_programming)). It will allow you to identify parts of your code that are slow or that use an abnormal amount of memory. When it comes to PHP, I recommend XHProf (https://github.com/facebook/xhprof), which is really easy to use. Once you install and configure it, the Devel module gives you Drupal integration for it. You just have to enable it, and Devel will run it automatically on every request and display a link to the profiling information at the bottom of the page.

Figure 7. XHProf Profiler

Once you have optimized your code, you can start using cache. Identify parts of the page that are candidates for caching. When you have cached all parts of the page, enable page cache or use a reverse proxy. Reverse proxy does a similar job as a page cache, but it is a separate dæmon that sits in front of the Web server and is heavily optimized for this job. The most popular reverse proxy in the Drupal community is Varnish (https://www.varnish-cache.org), which also has a contrib module that integrates it with Drupal (drupal.org/project/varnish). Another module that does a similar job is Boost (drupal.org/project/boost). Boost will save the HTML output of entire pages to ordinary HTML files, allowing the Web server to serve static files for anonymous users.

Conclusion

Drupal has been known as a relatively slow platform, which is somewhat true. But, it also is very flexible and provides a lot of room for performance optimizations. This could make it even faster than its better-optimized alternatives in the end, while still preserving its strength and flexibility.

The performance optimization tools that come with Drupal core definitely will work for small and medium Web sites. When you start dealing with larger Web sites, you'll need to know your requirements and needs and use a more customized approach. You will, however, find a lot of tools and best practices even for large projects, which makes Drupal an excellent platform for almost every type of Web project.

Janez Urevc is a Drupal developer from Slovenia, EU. He really loves the things he does, and that's why he feels that “every day is the best day of his life”. He's been dedicated to open source since high school. He graduated in Software Development from the Computer and Information Sciences department at the University of Ljubljana. His Bachelor's thesis was focused on the implementation of Agile principles and Scrum methodology in a Web development department of a large media company. Besides Drupal, he's also passionate about almost everything connected to the Web, open source, Linux and software development. He participated in the 2011 Google Summer of Code, where he worked on the Media derivatives API for Drupal.

LJ Archive