fromJune 2014
Feature:

The Automagic Speed-Up Cache

Render Cache in Drupal 8
1

Motivation

The granularity of cache expiration in Drupal has been a long-standing problem.

One can have the most effective cache in the world, but if it clears entirely on any content change, it is not really workable. A “page” in Drupal can have blocks, listing, entities, regions, and many other objects. When one contained item changes, the container of that item needs to be fully rebuilt; often, that is the whole page, a problem requiring a much-needed solution.

A page is divided into regions, blocks, listings, and content items. Only the red item needs to be re-built as only a single node has been changed; the rest can be retrieved from cache.

Why can't we just rebuild the parts that have actually changed?

Consider what would be the best case scenario here. Assume that every item listed above can be cached separately. Now if one single entity changes, the following would be our "perfect" page request:

  1. Drupal bootstraps.
  2. Drupal builds the page.
  3. Drupal notices that only the “content” region has changed and retrieves the remaining regions from cache.
  4. Drupal re-builds the content region.
  5. Drupal notices only one listing in the content region has changed and retrieves the remaining blocks from cache.
  6. Drupal builds the “missing” block.
  7. The block contains a listing of entities.
  8. Drupal re-builds the listing, and entity_view() is called on these entities.
  9. Drupal retrieves all entities except the changed one from cache.

We would have a bootstrap, then we would see just one region call, one block call, one listing call, and one entity building call. Is this really possible?

Yes and no.

There are certain implementation limitations – especially around page assets – and a unified caching strategy needs to take them into account.

State of the Art

Render Caching is the saving of HTML content in a storage cache, while retaining assets like CSS and JS files and other “out-of-band” data. It can be used for reconstructing the page content, without changing the state the page would have without render caching active. The render cached HTML markup needs to be removed from the cache, or updated in the cache when the objects used for generation of the markup change.

So much for the theory of render caching: but how does this apply to Drupal 8?

In Drupal 8, render cache is active by default for all entities, which means that the output of all entities is cached once viewed. Whenever an entity is changed, or something that is referenced by this entity changes, the cache is cleared automatically – thanks to cache tags. Currently, only entities are render cached in that way, but there is work underway to also cache other items with render caching, using a recursive render cache. Blocks, for example are cached via render arrays in Drupal 8 now, but the only cache tag on the block itself is the 'content' tag. So there is a lot in Drupal 8 that could use more cache tags and render caching, and to also ensure the render chain is not broken.

But there’s a catch: If you extend the entity rendering with custom code and you do it wrong, the render cache will break. That can be caused by something as simple as installing a non-render cache compatible contrib module.

Fortunately, as the render cache is now active by default, those problems will be found during the development (not just as an afterthought), but it can still be challenging. Reference the sidebar for a list of the most common mistakes.

Common Mistakes

  • Relying on a global/external state (path, logged in user, state of different page asset, context) that is not contained within the rendered entity itself.
  • Adding assets with drupal_add_js (now deprecated and renamed _drupal_add_js!), or trying to add assets during the theming chain (preprocess and templates currently can't add assets); however, using #pre_render works.
  • Varying the cache object so much that there is a low cache hit ratio (e.g. saving for every user and every page).
  • Code that prematurely renders data to HTML markup, which is then stored in the global state, but can no longer be stored as #assets.
  • Contrib code that changes the output in any way without adhering to the render cache principles.

Background: How does it work?

In Drupal 7, to cache a render array all you had to do was use:

<?php
$object->id = 42;
 
$build = array(
  '#cache' => array(
    'granularity' => DRUPAL_CACHE_PER_ROLE,
    'keys' => array(
    'my_module',
    'object',
      $object->id,
    )
  ),
  '#pre_render' => array(
    'my_module_render_this_element',
  ),
  '#object' => $object,
);
 
function mymodule_render_this_element($element) {
  $element['#markup'] = t('My nice output that takes very very very long to compute: ');
  $element['#markup'] .= $element['#object']->id);
  return $element;
}
?>

The only way to clear this cache in D7 is to use prefix based clearing; for example, clearing my_module:object:* to clear all object output from my_module.

If this was so simple, why wasn’t it used more?

Probably because it was not used in core, rarely used in contrib, and practically undocumented. (I found only one really good blog post explaining the full process from 2011!). Also, introducing render caching late into an already existing project is quite challenging.

Variations of Cache Objects

Each cache object has a unique Cache ID, which allows the API to identify the correct cache item to retrieve. The Cache ID is computed via the keys property in the cache render array.

However, different Cache IDs might be needed for different roles or different pages, where the same object is represented in different ways – based on which "context" it was cached in. Drupal has the granularity property for the most common cases of variations.

For example, it might be quite common to use:

<?php
function my_module_entity_view() {
  if (arg(0) == 'user') {
    // Modify entity here for display on a users page.
  }
}
?>

Doing so not only breaks the render cache, it also introduces a dependency on the page path, which means that now a granularity per path needs to be set or the entity will be wrong.

Therefore, a different view_mode should be used for the display on the user page which removes this dependency, and the view_mode is already part of the cache key in Drupal 8.
The question that remains is: How do you clear only the relevant cache objects, now that you have, potentially, thousands of variants?

Cache-Invalidation Strategies

Cache clearing – or, more properly, cache invalidation – is the biggest problem when dealing with caching in general. (There is a famous quote saying: “There are only two hard problems in computer science – cache invalidation and naming things.”)

There have been numerous modules dealing with this problem, for example: cache_actions. Likewise, many lines of custom code have been written to clear as little of the cache as necessary. The only way Drupal 7 could clear caches was using prefix-based cache clears.

One advantage of limiting render caching to entities is that it’s pretty simple to know when an entity has changed: all you need is a timestamp, and Drupal 8 now has a last-modified property for entities.

But what if you are also displaying the author of the entity and a referenced Tag as part of, say, a node? Then you also need to clear the cache when either the tag description or the author name changes.
Drupal 8 uses a Cache-Clearing strategy (i.e., finding all objects impacted by a change and clearing them from the cache), and has a great solution to conquer the problem of finding relevant objects to clear: Cache Tags[2].
With cache tags, a new cache setting property has been introduced, which makes it possible to specify what tags a certain cache item should be saved with.

For our node example, the cache tags property might look like this:

<?php
$build['#cache']['tags'] = array(
  'node' => array($node->id()),
  'user' => array($node->getOwnerId());
  'taxonomy' => array($tid),
);
?>

With that, it is possible to clear the render cache for all nodes, only nodes with a specific ID, or when the author changes. All nodes with the user property can be cleared as well. Cache Tags were an important prerequisite for render caching and caching in Drupal 8 in general.

Cache objects can be cleared by tag

It is important to understand that you want to use cache keys or granularity when you have a valid variation of the same entity, and use cache tags when you want to regenerate the cache object. See the sidebar for a summary of the different properties and when to use what.

Cache Properties

Render Caching has several properties that can be set independently:

  • The cache granularity of the cache object; e.g. PER_ROLE, PER_PAGE;
  • The cache ID (cid) generated via cache keys, which contains all variations of a cache object; e.g. language, display_mode, the general granularity, etc.;
  • The cache invalidation properties; i.e., the cache tags – how this object can be removed from cache again.

Personalized Variations

The problem with variations is that with any complex site logic, you must quickly vary per user or per page, making caching not as effective as it could be. For example, in flag module, a flag contains a personalized token to flag/unflag content, but having to restrict the entity caching to 'per user' for just one personalized token would be a waste.

To solve this, Drupal 8 introduces a new render array property: #post_render_cache.
Post-render cache can be used for attaching JS or CSS assets, or setting JS settings. It can also be used to alter the generated markup. For this task, a user specified context is passed to the post-render cache callback.

That works, as an example, for the following (where the Javascript settings are used to add some dynamic classes to entities with certain IDs):

<?php
$context =   array(
  'entity_id' => $entity->id(),
  'entity_type' => $entity->entityType(),
);
 
$build['#post_render_cache']['mymodule_attach_custom_data'] = array(
  $context,
);
 
function mymodule_attach_custom_data(array $element, array $context) {
  $element['#attached']['js'][] = array(
    'type' => 'setting',
    'data' => array(
      'mySetting' => array(
        $context['entity_type'] => my_module_calculate_class($context['entity_id']),
      ),
    ),
  );
  return $element;
}
?>

When the render cache render array is about to be rendered, all post_render_cache callbacks are called, but not necessarily on the element itself. The passed element markup could also be the whole page. So it is important not to depend on the element markup having a certain structure.

To change content within the generated HTML you would need to use a placeholder and then replace the placeholder string with the real string. Fortunately Drupal 8 has you covered here as well, and you can use the new render array type render_cache_placeholder to replace content dynamically, as in the following example:

<?php
$output['comment_form'] = array(
  '#type' => 'render_cache_placeholder',
  '#callback' => 'mymodule_replace_placeholder',
  '#context' => $context,
);
 
function mymodule_replace_placeholder(array $context) {
  $entity = entity_load($context['entity_type'], $context['entity_id']);
  return mymodule_get_logged_in_title($entity);
}
?>

Note that the same context is used as in the first example, but this time we return a new render array element, which replaces the placeholder that was previously automatically generated.

For all post-render callbacks, it’s important that you don't perform expensive operations, as those functions are always called, regardless of whether the item is cached or not. And having an expensive operation would decrease the effect of the caching dramatically.

Do-it-Yourself Caching

To cache a render element yourself, simply use the same code as in the Drupal 7 example, but add cache tags to the $build:

<?php
$build['#cache'] += array(
  'tags' => array(
    'content' => TRUE,
  );
);
?>

Let’s now assume that the rendering function needs to load another related object, which is part of the computation.

<?php
function mymodule_render_this_element($element) {
  $element['#markup'] = t('My nice output that takes very very very long to compute: ');
  $computed_object = mymodule_compute_related_object($element['#object']->id);
  $element['#markup'] .= $computed_object->name;
  $element['#cache']['tags'] += array(
    'my_module:object' => $computed_object->id,
  ),
  return $element;
}
?>

The problem now is that whenever the name of the $computed_object changes, the cache needs to be cleared. Fortunately, cache tags can be added throughout the render caching process. Even the result of a complex computation can be properly expired:

<?php
cache_invalidate(array('my_module:object' => $object->id));
?>

Conclusion

Render Cache is not yet fulfilling our fantasy of a totally cache-layered Drupal, but it could be!

All prerequisites are there, after the hard work of many years and through two major versions. By using cache tags, Drupal 8 is more flexible than ever to expire output easily and clear only what is needed. By enabling render caching on entities by default, and adding cache tags, the awareness of caching is strengthened and it will now be a part of the development process.

Comments

Just 3 comments on the article:

  1. The #type placeholder has been removed from D8 core since this article was written.

    A new approach is discussed here: https://www.drupal.org/node/2310883

  2. The best way to go ahead with this in Drupal 7 is to use the render_cache module.

    https://www.drupal.org/node/2299547#comment-8960477 explains how this feature was implemented on www.drupal.org.

  3. render_cache-7.x-2.x is underway and will support cache tags and work is underway to support recursive render caching, ESI, AJAX, etc.. Any support, testing, etc. is much appreciated.