Use garbage collection to remove any unused files in the Data Store. CRX supports the Jackrabbit data store, see http://wiki.apache.org/jackrabbit/DataStore. By default, CRX uses the FileDataStore, however the DbDataStore is also supported.

To run garbage collection:

  • Log in as administrator
  • In the CRX console, click Repository Configuration.
  • Click Data Store Garbage Collection.
  • Select one or more of the following options:
Option Description
Run memory garbage collection first To run the garbage collection of the main memory first (also known as heap garbage collection). This process evicts objects that are still in the main memory, but no longer referenced. The data store garbage collection only reclaims items that are no longer in the main memory.
Delete unused items Selecting this option means that any unused files are deleted from the Data Store. If this option is disabled, only the last modified date of the used items is updated, but no files are deleted. If multiple stand-alone repositories share a data store, or if multiple distinct clusters share a data store, don't enable this option. Instead, remove old items manually or use a script. (For example, delete files older than one week after running the data store garbage collection on each repository). A cluster is one repository; that means you can enable "Delete unused items" when using multiple cluster instances that share a repository.
Use a persistence manager scan When this option is enabled, the process uses a low-level persistence manager scan if the persistence manager supports this option. Selecting this option speeds up the garbage collection process, but could slow down concurrent operations. If the option is disabled, a higher-level node travels algorithm is used.
  • Click Run.

Garbage Collection in a Clustered Environment

Shared Data Store

When using a CRX cluster, Data Store garbage collection can be run from any cluster node.

Cluster Data Store

When using the cluster data store (data store configuration option ClusterDataStore), run garbage collection on all cluster nodes separately.

Multi-Repository Data Store

If multiple distinct repositories use the same data store at the same time (such as author and publish instances), don't use the Delete option. Instead, the process is as follows:

  • Back up your data store.
  • Log in to CRX as administrator.
  • Run data store garbage collection without any of the options checked (ensure that specially 'Delete unused items' is _not_ checked).
  • Within five days, run it on _all_ repositories that share a data store. It is possible to set the delay to 0, however it increases the load on the file system. Also, running the process concurrently on all repositories is possible, but is increases the I/O on the file system that contains the data store. After this process finishes, the data store files have a new last modified date.
  • If the process takes longer than five days, increase the day count when deleting (7 by default). Adobe recommends that you use a margin of two days.
  • To list the files older than seven days using the following command line:
    find repository/datastore/* -mtime +7 -type f
  • This command prints the number of bytes that can be saved:
    find repository/datastore/* -mtime +7 -type f -exec ls -l {} \; | awk '{ s+=$5 } END { print s }'
  • Delete all files older than seven days using the following command line. WARNING: if the data store garbage collection was not run within the last days, it deletes files that are still needed.
    find repository/datastore/* -mtime +7 -type f -exec rm {} \;

Starting Garbage Collection within an Application

To run the garbage collection manually, use the following code:

GarbageCollector gc;

SessionImpl si = (SessionImpl)session;

gc = si.createDataStoreGarbageCollector();

// optional (if you want to implement a progress bar / output):
gc.setScanEventListener(this);

gc.scan();

gc.stopScan();


// could be a separate button, if multiple repositories use the same data store:

gc.deleteUnused();

Changes in Data Store Garbage Collection from CQ5.5 onwards(Applies to crx 2.3.15 or later)

From CQ 5.5 onwards the CRX that provides the repository is an OSGi service. The CRX is registered in the OSGi Service Registry as MBean service. This MBean is available in the JMX Console which exposes the datastore garbage collection attributes and operations.

  • The ui to run datastore garbage collection is http://<host>:<port>/system/console/jmx/com.adobe.granite%3Atype%3DRepository
  • The new ui has only one option "Delete unused items" which you can set to true or false based on the use case. A curl command example with option set to false is [1].
  • The other options
    • "Run memory garbage collection first" is not part of datastore garbage collection ui and is available at http://<host>:<port>/system/console/memoryusage.
    • "Use a persistence manager scan" has been removed.
    • "Run memory garbage collection first" has been moved to http://<host>:<port>/system/console/memoryusage
    • The default time delay is set to 10.  For any change, modify a bean property dataStoreGarbageCollectionDelay. The curl command to set for 15 is shown at [2].

[1] curl -u admin:admin -X POST --data delete=false -H "Referer: http://<host>:<port>/system/console/jmx/com.adobe.granite%3Atype%3DRepository" http://<host>:<port>/system/console/jmx/com.adobe.granite%3Atype%3DRepository/op/runDataStoreGarbageCollection/java.lang.Boolean

[2] curl -u admin:admin -X POST --data value=15 -H "Referer: http://<host>:<port>/system/console/jmx/com.adobe.granite%3Atype%3DRepository" http://<host>:<port>/system/console/jmx/com.adobe.granite%3Atype%3DRepository/a/DataStoreGarbageCollectionDelay

Note:

AEM 5.6.1 (CRX 2.4.30+)

The new improvised fast datastore garbage collection has been implemented in 5.6.1. Which is now enabled by default (runDataStoreGarbageCollection).  To run old method of datastore garbage collection, use runDataStoreClassicGarbageCollection which runs data store garbage collection by avoiding the use of the optimized TarPM garbage collection. 

  • If you are running "fast" garbage collection (runDataStoreGarbageCollection), make sure that you do NOT stop it using stopDataStoreGarbageCollection() until you have installed 5.6.1 Hotfix 3241 (see below for the Package Share URL and readme file)

Download Hotfix 3241 from package share by clicking here.

* cq-5.6.1-hotfix-3421-readme.txt
ReadMe for hot fix 3421

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License  Twitter™ and Facebook posts are not covered under the terms of Creative Commons.

Legal Notices   |   Online Privacy Policy