This article focuses on the latest optimizations in the AEM dispatcher and how to best leverage those. The AEM dispatcher is a caching reverse proxy server designed for use with Adobe Experience Manager. It can be installed and runs as a module within existing web server software. At the time of writing this article, the dispatcher module is supported on Apache HTTP Server, Microsoft IIS and iPlanet.
At the most basic level, the AEM dispatcher is a reverse proxy which works by performing caching, cache flushing and cache invalidation.
See the related links for more details on dispatcher:
- How the dispatcher works and how to install it.
- Configuration options available in the dispatcher.
- Webinar on how the dispatcher works - note that some information in the presentation is based on old versions of the dispatcher.
- Gems webinar session on dispatcher features, CDN usage and security.
- Gems session on newer features in dispatcher (after v4.1.9).
Here are some ways to optimize the dispatcher cache:
- Cache almost everything - this means cache any content that would be requested more than once by users.
- Never delete the dispatcher cache on a live dispatcher - If a dispatcher is serving live content and you delete the cache, this will cause a massive flood of requests to go back to AEM. Due to this, the dispatcher cache should never be deleted on a live dispatcher.
- Prime the cache - Before deleting the dispatcher cache, pull the dispatcher off your load balancer, delete the cache, then run a web crawler tool to cache files on the dispatcher before putting it on the load balancer.
- Cache error pages - Leverage the DispatcherPassError 1 (Apache Web Server specific) directive to serve error pages such as 404s from the dispatcher cache.
- GZip compress all file types except for those that are pre-compressed - In Apache Web Server, mod_deflate could be used, but make sure that Vary: User-Agent header isn't set. In Microsoft IIS, use Dynamic Compression.
Apache configuration example (specifying only certain content types to avoid precompressed file types):
- Enable /serveStaleOnError in the /cache configuration - serve the old cache file when AEM instances are serving errors.
- Add /gracePeriod to the /cache configuration - define the number of seconds a stale, auto-invalidated resource may still be served from the cache after the last content publish event ("activation"). This reduces the number of requests that go back to the publish instances during a large content publishing activity such as a "Tree Activation".
- Add rules to /ignoreUrlParams - ignore querystring parameters that are not required or used by the application. This allows caching of URLs even when a querystring is present.
- Cache the Cache-Control and Last-Modified response headers - Use the /headers configuration to cache the HTTP response headers Cache-Control and Last-Modified (and/or ETag header if your are sending it from AEM). This helps in simplifying and optimizing caching at the CDN and browser levels. Caching these headers makes it so only AEM sets the headers, not the web server itself. Note that when you do this, then you would need to start sending the headers from your AEM application.
- Cache content for as long as possible and reduce requests that go back to AEM - Optimize flush requests by enabling refetching flush on all flush agents. Or use /enableTTL and set Cache-Control: max-age=... header to cache files as long as possible. See below for details on this topic.
As of Dispatcher version 4.1.11, /enableTTL 1 can be set in the .any file configuration. This setting makes the dispatcher respect cache expirations set in the HTTP Cache-Control response header. In other words, the dispatcher will function similar to a CDN where primary form of cache invalidation occurs when files expire. Once you implement this and start sending Cache-Control: max-age=... for all responses from AEM, then you can safely disable your dispatcher flush agents in the publish instances.
After disabling flush agents on the publish instances then you may still want to be able to flush the dispatcher cache. In that case, you can use ACS Commons - Dispatcher Flush UI. This tool is installed on the author instance. It gives users a UI where they can perform manual cache flush requests.
I. Steps to enable TTL ("Time to Live" or expiration) style invalidations:
- Modify source code in the AEM application to send Cache-Control header and Last-Modified for all requests where it's not already set.
- Install Dispatcher 4.1.11 or later.
- Set /enableTTL 1 in the .any farm configuration of the site.
- Set the /headers configuration to cache the Cache-Control and Last-Modified headers.
- Restart the web server.
II. Disable dispatcher flush agents on the publish instances:
The dispatcher will now use the Cache-Control header to control invalidation of the cache files. Since that is the case, then dispatcher flushing from the publish instances is no longer required.
- Go to /etc/replication/agents.publish.html on each publish instance.
- Go to each flush agent's configuration and disable the agent.
III. Allow manual dispatcher flush requests from the author instance:
Now that flush agents are disabled, you would rely entirely on the Cache-Control header to control when content is refreshed on the dispatcher. You can still allow users to issue manual flushes of the dispatcher cache:
- Install ACS Commons - Dispatcher Flush UI on the author instance.
- Configure flush agents on the author instance.
- In each of the agent configurations, set Triggers => Ignore Default option to enabled. This option makes the flush agents ignore when users click (Un)Publish or (De)Activate in the AEM UI.
To optimize dispatcher flush requests, all dispatcher flush agents should have a feature called refetching flush enabled.
To enable re-fetching dispatcher flush, do the following:
- Go to http://aemhost:port/crx/packmgr/index.jsp and login as admin.
- Download the package from here.
- Upload and install the package to the package manager.
- Go to your dispatcher flush agent configuration. For example /etc/replication/agents.author/flush.html
- Click Edit
- Set the following
- Serialization Type = Re-fetch Dispatcher Flush
- Extended => HTTP Method = POST
- Click Save
Note that the package installed above is just a basic example. To customize and optimize re-fetching flush you can modify the list of URIs that it sends. The code is open source and can be found here. The code adds a list of URIs to the request body as parameters telling dispatcher which paths to re-fetch. You can add more paths per your application requirements to optimize your site's caching capabilities.
Normally a dispatcher flush works by deleting files:
- Touch .stat file(s)
- Delete /content/foo.*
- Delete /content/foo/_jcr_content
Due to the fact that files are deleted in step 2, the next time a user requests a file like /content/foo.html or /content/foo.json, while the file is being "re-fetched" then subsequent requests for the same file would also be sent to the publish instances until the file is cached. For slow responses or heavy traffic pages such as home pages this can cause flooding of the publish instance tier.
To solve this issue, enable a feature of the dispatcher called re-fetching. This feature allows you to send a list of URIs that the dispatcher should proactively "re-fetch" and replace instead deleting.
See 22:41-27:05 in this presentation recording for a demo of how it works and how to configure it.