One of the most important changes in AEM 6 are the innovations at the repository level.
Currently, there are two node storage implementations available in AEM6: Tar storage, and MongoDB storage.
By default, AEM 6 uses the Tar storage to store nodes and binaries, using the default configuration options. To manually configured its storage settings, follow the below procedure:
Edit the file and set the configuration options. The following options are available for Segment Node Store, which is the basis of AEM's Tar storage implementation:
- repository.home: Path to repository home under which various repository related data is stored. By default segment files would be stored under the crx-quickstart/segmentstore directory.
- tarmk.size: Maximum size of a segment in MB. The default is 256MB.
Configure the node store by creating a configuration file with the name of the configuration you want to use in the crx-quickstart\install directory.
The Document Node Store (which is the basis for AEM's MongoDB storage implementation) uses a file called org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService.cfg
Edit the file and set your configuration options. The following options are available:
- mongouri: The MongoURI required to connect to Mongo Database. The default is mongodb://localhost:27017
- db: Name of the Mongo database. By default new AEM 6 installations use aem-author as the database name.
- cache: The cache size in MB. This is distributed among various caches used in DocumentNodeStore. The default is 256
- changesSize: Size in MB of capped collection used in Mongo for caching the diff output. The default is 256
- customBlobStore: Boolean value indicating that a custom data store will be used. The default is false.
- mongouri: The MongoURI required to connect to Mongo Database. The default is mongodb://localhost:27017
Create a configuration file with the PID of the data store you wish to use and edit the file in order to set the configuration options. For more info, please see Configuring Node Stores and Data Stores.
Red Hat Linux uses a memory management algorithm called Transparent Huge Pages (THP). While AEM performs fine-grained reads and writes, THP is optimized for large operations. Because of this, it is recommended that you disable THP both on Tar and Mongo storage. To disable the algorithm, follow these steps:
As data is never overwritten in a tar file, the disk usage increases even when only updating existing data. To make up for the growing size of the repository, AEM employs a garbage collection mechanism called Revision Cleanup. The mechanism will reclaim disk space by removing obsolete data from the repository, and has three phases: estimation, compaction, cleanup. In the past the revision cleanup was often referenced as compaction.
The are two ways of performing revision cleanup:
Offline revision cleanup is the recommended and supported way of performing revision cleanup.
For AEM 6.2 Publish instances
Offline revision cleanup is the recommended way of cleaning up revisions. This requires to shut down the instances in order to run offline revision cleanup during non business hours.
If downtimes are not possible, customers can contact Adobe Support to evaluate additional options:
- If there is more than one publish instance, one can be taken down for offline revision cleanup while avoiding replication from author. After a successful revision cleanup, the instance can be taken back into production while a clone of the clean instance would replace other remaining production ones.
- If the above is still not possible, online revision cleanup can be used under the terms and conditions of the program. This type of cleanup has restricted support in AEM 6.2.
For AEM 6.2 Author instances
Offline revision cleanup is the recommended way of cleanup for author instances as well. However, in rare cases where downtime is not possible either beacause maintenance windows were not foreseen and can have the same business impact as system outages, customers should contact Adobe Support to evaluate additional options. The additional options for performing cleanup on author instances are the same as the ones described above for publish instances.
For more information about the revision cleanup process, see the Frequently Asked Questions.
Different versions of the Oak-run tool need to be used depending on the Oak version you use with your AEM installation. Please check the version requirements list below before using the tool:
- For Oak versions 1.0.0 through 1.0.11 or 1.1.0 through 1.1.6, use Oak-run version 1.0.11
- For Oak versions newer than the above, use the version of Oak-run that matches the Oak core of your AEM installation.
Adobe provides a tool called Oak-run for performing revision cleanup. It can be downloaded at the following location:
The tool is a runnable jar that can be manually run to compact the repository. The process is called offline revision cleanup because the repository needs to be shut down in order to properly run the tool. Make sure to plan the cleanup in accordance with your maintenance window.
For tips on how to increase the performance of the cleanup process, see Increasing the Performance of Offline Revision Cleanup.
You can clear old checkpoints before the maintenance takes place (steps 2 and 3 in the procedure below). This is recommended for instances that have more than 100 checkpoints. For AEM 6.2, clean checkpoints using oak-run tool version 1.4.8 or higher.
Since version 1.0.22, the oak-run tool introduces several features with an aim to increase the performance of the revision cleanup process and minimize the maintenance window as much as possible.
The list includes several command line parameters, as described below:
- -Dtar.memoryMapped. Use this to enable memory mapped operations for tar file to greatly increase performance. You can set this as true or false. It is highly recommended you enable this feature in order to speed up compaction.
- -Dupdate.limit. Defines the threshold for the flush of a temporary transaction to disk. The default value is 5000000.
- -Dcompress-interval. Number of compaction map entries to keep until compressing the current map. The default is 1000000. You should increase this value to an even higher number for faster throughput, if enough heap memory is available.
- -Dcompaction-progress-log. The number of compacted nodes that will be logged. The default value is 1500000, which means that the first 1500000 compacted nodes will be logged during the operation. Use this in conjunction with the next parameter documented below.
- -Dlogback.configurationFile. Use a configuration file for logging. You can use the below configuration file to enable the logging of the nodes that are being compacted:
- -Dtar.PersistCompactionMap. Set this parameter to true to use disk space instead of heap memory for compaction map persistance. Requires the oak-run tool versions 1.4 and higher. For further details also see question 6 in the FAQ section.
- -Doak.compaction.eagerFlush. Set this parameter to true if you are running into OutOfMemoryError issues despite a large heap size allocation. See question 7 in the FAQ section for further details. Requires Oak version 1.2.17.
Memory mapped file operations do not work correctly on some versions of Windows. Make sure that you use the tool without the -Dtar.memoryMapped parameter on Windows platforms, otherwise the revision cleanup will fail.
java -Dtar.memoryMapped=true -Dupdate.limit=5000000 -Dcompress-interval=10000000 -Dcompaction-progress-log=1500000 -Dlogback.configurationFile=logback.xml -Xmx8g -jar oak-run-*.jar checkpoints <repository>
Use as much heap memory as possible for faster I/O operations. It is recommended you use at least eight gigabytes for most common deployments.
Online Revision Cleanup is present in AEM 6.2 under restricted support. For more information on the conditions and terms of using the feature, please contact Adobe Customer Care.
Due to the mechanics of the garbage collection, the first run will actually add 256 MB of disk space. Subsequent runs will work as expected and start shrinking the repository size.
Follow the below recommendations in order to maintain maximum efficiency while upkeeping the repository:
- Make sure you run Offline Revision Cleanup whenever possible during scheduled maintenance hours;
- If you are using an external data store, make sure you run Data Store Garbage Collection after revision cleanup has been completed.
- Follow the recommendations in this knowledgebase article for tips on improving the performance of your AEM instance.
- If you are using an antivirus software, it is recommended that you stop the software during offline compaction because it might slow down the compaction process.
- Do not run offline compaction on a freshly restored Amazon EBS volume. This is due to the volume's "lazy loading" which can impact compaction completion time. To compact an Amazon EBS volume that was freshly restored from a snapshot you need to first initialize it. For details on initializing Amazon EBS volumes, you can consult the official documentation.
1. When to use Offline Revision Cleanup as opposed to Online Revision Cleanup?
2. How frequently should Offline Revision Cleanup be performed?
- It depends on the repository growth rate. As a general rule of thumb, for average content repositories, it is recommended that you perform revision cleanup every 2 weeks for an author instance, and once per quarter for a publish instance.
3. What are the factors that determine the duration of the Offline Revision Cleanup?
- The repository size and the amount of revisions that need to be cleaned up determines the duration of the cleanup.
4. What's the worst that can happen if you do not perform revision cleanup?
- The AEM instance will run out of disk space, which will cause outages in production. It is highly recommended that you follow the monitoring best practices as mentioned in Maintenance and Monitoring.
5. What is the difference between a revision and a page version?
- Oak revision: Oak organizes all the content in a large tree hierarchy that consists of nodes and properties. Each snapshot or revision of this content tree is immutable, and changes to the tree are expressed as a sequence of new revisions. Typically, each content modification triggers a new revision. See also http://jackrabbit.apache.org/dev/ngp.html.
- Page Version: Versioning creates a "snapshot" of a page at a specific point in time. Typically, a new version is created when a page is activated. For more information, see Working with Page Versions.
6. How to speed up the Offline Revision Cleanup task if it does not complete within 8 hours ?
- If the revision task does not complete within 8 hours and the thread dumps reveal that the main hotspot is InMemoryCompactionMap.findEntry, use the following parameter with the oak-run tool versions 1.4 or higher: -Dtar.PersistCompactionMap=true. See also Performing Offline Revision Cleanup and Increasing the Performance of Offline Revision Cleanup.
7. Running Offline Revision Cleanup on large repositories causes OutOfMemoryError issues despite a large heap size allocation. How can this issue be solved?
- If the Revision Cleanup task is slow or causes OutOfMemoryError issues when running on large repositories use the following parameter with the oak-run tool (requires Oak 1.2.17 or higher): -Doak.compaction.eagerFlush=true. For additional performance tips, also see this article.