Handling large files and running workflows in Adobe Experience Manager (AEM) Assets can consume considerable CPU, memory, and I/O resources. In particular, the size of assets, workflows, number of users, and frequency of asset ingestion can affect the overall system performance. The most resource-intensive operations include AEM asset ingestion and replication workflows. Intensive use of these workflows on a single AEM authoring instance can adversely impact authoring efficiency.
Offloading these tasks to dedicated worker instances can reduce CPU, memory, and IO overheads. In general, the idea behind offloading is to distribute tasks that consume intensive CPU/Memory/IO resources to dedicated worker instances. The following sections include recommended use cases for Assets offloading.
AEM Assets implements a native asset-specific workflow extension for offloading. It builds on the generic workflow extension that the offloading framework provides, but includes additional assets-specific features in the implementation. The goal of Assets offloading is to efficiently run the DAM Update Asset workflow on an uploaded asset. Assets offloading enables you to gain greater control of the ingestion workflows.
The DAM Update Asset Offloading workflow runs on the master (author) where the user uploads the assets. This workflow is triggered by a regular workflow launcher. Instead of processing the uploaded asset, this offloading workflow creates a new job, using the topic com/adobe/granite/workflow/offloading. The offloading workflow adds the name of the target workflow -the DAM Update Asset workflow in this case, and the path of the asset to the job's payload. After creating the offloading job, the offloading workflow on the master waits until the offloading job has run.
The job manager distributes new jobs to worker instances. When designing the distribution mechanism, it is important to take topic enablement into account. Jobs can only be assigned to instances where the job's topic is enabled. Disable the topic com/adobe/granite/workflow/offloading on the master, and enable it on the worker to ensure that the job is assigned to the worker.
The offloading framework identifies workflow offloading jobs assigned to worker instances and uses replication to physically transport them, including their payload (for example, images to be ingested), to workers.
Once a job is written on the worker, the job manager calls the job consumer responsible for the com/adobe/granite/workflow/offloading topic. The job consumer then runs the DAM Update Asset workflow on the asset.
The Sling topology groups AEM instances and enables them to be aware of each other, independent of the underlying persistence. This characteristic of the Sling topology lets you create topologies for non-clustered, clustered, and mixed scenarios. An instance can expose properties to the entire topology. The framework provides callbacks for listening to changes in the topology (instances and properties). Sling topology provides the foundation for Sling distributed jobs.
Sling distributed jobs facilitate the distribution of jobs among a set of instances that are members of the topology. Sling jobs are based on the idea of capabilities. A job is defined by its job topic. To run a job, an instance must provide a job consumer for a specific job topic. The job topic is the main driver for the distribution mechanism.
Jobs are only distributed to instances that provide a job consumer for the topic. By enabling/disabling job consumers on an instance, you can define the capabilities of an instance and influence the distribution mechanism. The available job consumers of an instance are broadcast to the whole topology.
In this context, the term distribution signifies assignment of a job to a specific instance that provides a job consumer. The assignment to an instance is stored in the repository. In other words, Sling distributed jobs can be assigned to any instance in the topology by default. However, other jobs can only be run by instances that share the same repository. This implies that these jobs can only be run by instances that are part of the same cluster. Jobs assigned to instances of a different cluster are not run.
The Granite offloading framework complements Sling job distribution to run jobs that are assigned to non-clustered instances. It does not perform any distribution (instance assignment). However, it identifies Sling jobs that were distributed to non-clustered instances, and transports them to the target instance for execution. Currently, offloading uses replication to perform this job transport. To run a job, offloading defines the input and the output, which are then combined with the job to build the job payload.
Sling distributed jobs provide the job and distribution framework. Granite offloading only takes care of the transport for the special case where jobs are distributed to non-clustered instances.
In addition to transport, the offloading framework provides an extension for the workflow engine. It allows the framework to create distributed jobs as part of a workflow and wait for their completion, before the workflow advances. It is implemented using the workflow external step API from the workflow engine. One of the extensions facilitates generic distribution of workflows. Distributing single workflow steps is not supported.
The offloading framework also comes with a user interface (UI) to visualize and control job topic enablement across the entire topology. The UI lets you conveniently configure the topic enablemenet of Sling distributed jobs. You can also set up offloading without the UI.
Every implementation is unique and, as such, there is no one-size-fits-all offloading configuration. The following sections provide guidance and best practises around asset ingestion offloading.
Asset offloading also imposes overheads on the system, including operation overheads. If you encounter issues with the asset ingestion load, Adobe recommends that you first improve the configuration without offloading. Consider the following options before moving to asset offloading:
- Scale up hardware
- Optimize workflows
- Use transient workflows
- Limit the number of cores used for workflows
If you conclude that Assets offloading is an appropriate approach for you, Adobe provides the following guidance:
- A TarMK-based deployment is recommended
- TarMK-based Assets offloading is not designed for extensive horizontal scaling
- Ensure that network performance between the author and workers is satisfactory
With AEM and Oak, there are several deployment scenarios possible. For Assets offloading, a TarMK based deployment with a shared datastore is recommended. The following diagram outlines the recommended deployment:
For details around configuring a datastore, see Configuring node stores and data stores in AEM.
Adobe recommends that you turn off automatic agent management because it does not support binary-less replication and can cause confusion when setting up a new offloading topology. Moreover, it does not automatically support the forward replication flow required by binary-less replication.
On each worker, create new forward replication agent pointing to the master. The procedure is the same as creating forward agents from master to worker. See, Creating Replication Agents For Offloading for instructions around setting up offloading replication agents.
The use of binary-less replication is recommendend to reduce the transport overhead for asset offloading. To know how to set up binary-less replication for a shared datastore, see Configuring Node Stores and Data Stores in AEM. The procedure is not different for Assets offloading, except that it involves other replication agents. Because binary-less replication only works with forward replication agents, you should also use forward replication for all offloading agents.
By default, offloading creates a content package that contains the offloading job and job payload (the original asset), and transports this single offloading package using a single replication request. Creating these offloading packages is counter productive when using binary-less replication, because binaries are serialized into the package again when creating the package. The use of these transport packages can be turned off, which causes the offloading job and payload be transported in multiple replication requests, one for each payload entry. This way, the benefit of binary-less replication can be utilized.
By default, the DAM Update Asset Offloading offloading workflow adds the workflow model to call on the worker to the job payload. Because this workflow follows the out-of-the-box DAM Update Asset model by default, this additional payload can be removed.
If the workflow model is disabled from the job payload, ensure that you distribute changes to the referenced workflow model using other tools, such as package manager.
To disable the transport of the workflow model, modify the DAM Update Asset Offloading workflow.
Workflow offloading is implemented using an external workflow on the master, that polls for the completion of the offloaded workflow on the worker. The default polling interval for the external workflow processes is five seconds. Adobe recommends that you increase the polling interval of the Assets offloading step to at least 15 seconds to reduce the offloading overhead on the master.