Too many workflows in Inbox crash AEM due to pulse.data.json calls

Issue

The AEM author instance is slow and crashing due to pile up of failed workflows and workflow inbox badge pulse.data.json calls. The workflow inbox badge is the bell icon in the upper-right of the Touch UI.

In the access.log, it is observed that many calls to pulse.data.json are occurring from the same users. 

10.43.34.55 - user 06/Jan/2017:18:17:11 +0900 "GET /mnt/overlay/granite/ui/content/shell/header/actions/pulse.data.json?_=1483664547926 HTTP/1.1" 500 1234 "https://authorhost/sites.html/content/geometrixx/en" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"
10.43.34.55 - user 06/Jan/2017:18:17:13 +0900 "GET /mnt/overlay/granite/ui/content/shell/header/actions/pulse.data.json?_=1483664547927 HTTP/1.1" 500 1234 "https://authorhost/sites.html/content/geometrixx/en" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"
10.43.34.55 - user 06/Jan/2017:18:17:15 +0900 "GET /mnt/overlay/granite/ui/content/shell/header/actions/pulse.data.json?_=1483664547928 HTTP/1.1" 500 1234 "https://authorhost/sites.html/content/geometrixx/en" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"
10.43.34.55 - user 06/Jan/2017:18:17:17 +0900 "GET /mnt/overlay/granite/ui/content/shell/header/actions/pulse.data.json?_=1483664547929 HTTP/1.1" 500 1234 "https://authorhost/sites.html/content/geometrixx/en" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"

In addition, AEM server shows high CPU utilization and thread dumps captured from AEM threads like the one below: 

"192.168.10.5 [1485896535734] GET /mnt/overlay/granite/ui/content/shell/header/actions/pulse.data.json HTTP/1.1" #1969 prio=5 os_prio=0 tid=0x0000000001a39800 nid=0xaf2 runnable [0x00007f2f5ae4f000]
   java.lang.Thread.State: RUNNABLE
 at org.apache.jackrabbit.oak.spi.state.ReadOnlyBuilder.getChildNodeNames(ReadOnlyBuilder.java:101)
 at org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildNames(AbstractTree.java:145)
 at org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildren(AbstractTree.java:322)
 at org.apache.jackrabbit.oak.security.authorization.permission.PermissionStoreImpl.loadPermissionEntries(PermissionStoreImpl.java:150)
 at org.apache.jackrabbit.oak.security.authorization.permission.PermissionStoreImpl.load(PermissionStoreImpl.java:115)
 at org.apache.jackrabbit.oak.security.authorization.permission.PermissionEntryCache.getEntries(PermissionEntryCache.java:48)
 at org.apache.jackrabbit.oak.security.authorization.permission.PermissionEntryCache.load(PermissionEntryCache.java:63)
 at org.apache.jackrabbit.oak.security.authorization.permission.PermissionEntryProviderImpl.init(PermissionEntryProviderImpl.java:109)
 at org.apache.jackrabbit.oak.security.authorization.permission.PermissionEntryProviderImpl.<init>(PermissionEntryProviderImpl.java:72)
 at org.apache.jackrabbit.oak.security.authorization.permission.CompiledPermissionImpl.<init>(CompiledPermissionImpl.java:112)
 at org.apache.jackrabbit.oak.security.authorization.permission.CompiledPermissionImpl.create(CompiledPermissionImpl.java:126)
 at org.apache.jackrabbit.oak.security.authorization.permission.PermissionProviderImpl.<init>(PermissionProviderImpl.java:71)
 at org.apache.jackrabbit.oak.security.authorization.AuthorizationConfigurationImpl.getPermissionProvider(AuthorizationConfigurationImpl.java:185)
 at org.apache.jackrabbit.oak.security.authorization.composite.CompositeAuthorizationConfiguration.getPermissionProvider(CompositeAuthorizationConfiguration.java:134)
 at org.apache.jackrabbit.oak.core.MutableRoot$1.createValue(MutableRoot.java:126)
 at org.apache.jackrabbit.oak.core.MutableRoot$1.createValue(MutableRoot.java:123)
 at org.apache.jackrabbit.oak.core.LazyValue.get(LazyValue.java:53)
 - locked <0x0000000726732148> (a org.apache.jackrabbit.oak.core.MutableRoot$1)
 at org.apache.jackrabbit.oak.core.MutableRoot$2.getExecutionContext(MutableRoot.java:311)
 at org.apache.jackrabbit.oak.query.QueryEngineImpl.executeQuery(QueryEngineImpl.java:256)
 at org.apache.jackrabbit.oak.query.QueryEngineImpl.executeQuery(QueryEngineImpl.java:233)
 at org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:314)
 at org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:308)
 at org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:304)
 at org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.getTree(IdentifierManager.java:133)
 at org.apache.jackrabbit.oak.security.user.AuthorizableBaseProvider.getByContentID(AuthorizableBaseProvider.java:56)
 at org.apache.jackrabbit.oak.security.user.AuthorizableBaseProvider.getByID(AuthorizableBaseProvider.java:51)
 at org.apache.jackrabbit.oak.security.user.UserProvider.getAuthorizable(UserProvider.java:211)
 at org.apache.jackrabbit.oak.security.user.UserManagerImpl.getAuthorizable(UserManagerImpl.java:110)
 at org.apache.jackrabbit.oak.jcr.delegate.UserManagerDelegator$1.performNullable(UserManagerDelegator.java:64)
 at org.apache.jackrabbit.oak.jcr.delegate.UserManagerDelegator$1.performNullable(UserManagerDelegator.java:61)
 at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.performNullable(SessionDelegate.java:243)
 at org.apache.jackrabbit.oak.jcr.delegate.UserManagerDelegator.getAuthorizable(UserManagerDelegator.java:61)
 at com.adobe.granite.workflow.core.WorkflowSessionFactory.isSuperUser(WorkflowSessionFactory.java:356)
 at com.adobe.granite.workflow.core.WorkflowSessionFactory.getWorkflowSession(WorkflowSessionFactory.java:389)
 at com.adobe.granite.workflow.core.WorkflowSessionFactory.getAdapter(WorkflowSessionFactory.java:515)
 at org.apache.sling.adapter.internal.AdapterManagerImpl.getAdapter(AdapterManagerImpl.java:147)
 at com.adobe.granite.workflow.core.jcr.WorkItemAdapterFactory.getAdapter(WorkItemAdapterFactory.java:128)
 at org.apache.sling.adapter.internal.AdapterManagerImpl.getAdapter(AdapterManagerImpl.java:147)
 at com.adobe.granite.workflow.core.jcr.WorkItemManager.getInboxItems(WorkItemManager.java:289)
 at com.adobe.granite.workflow.core.WorkflowSessionImpl.getActiveInboxItems(WorkflowSessionImpl.java:711)
 at com.adobe.granite.workflow.core.WorkflowSessionImpl.getActiveInboxItems(WorkflowSessionImpl.java:618)
 at org.apache.jsp.libs.cq.gui.components.shell.badge.data_json_jsp._jspService(data_json_jsp.java:609)

Environment

AEM 6.2 SP1

Cause

When a user or group has many pending inbox items, it makes the badge pulse.data.json calls slow.  The calls to pulse.data.json then time out and compound which overloads AEM.

Resolution

Install the latest Cumulative Fix Pack to fix the bug.

I. Apply the Workaround

Since it is not possible to install a fix pack on a production AEM environment in a short timeframe, as a temporary workaround, do the following:

  1. Go to CRXDe http://aem-host:port/crx/de/index.jsp and log in as admin.

  2. Create this folder structure out of sling: Folder nodes /apps/granite/ui/components/shell/clientlibs/shell/js 

  3. Click "Save All"

  4. Browse to overlay for /libs/granite/ui/components/shell/clientlibs/shell/js/badge.js and modify the code as shown below:

    Before:

    setInterval(function() {
    updateBadge(el, src, true);
    }, 2000);

    After (set to update every 5 minutes):

    setInterval(function() {
    updateBadge(el, src, true);
    }, 300000);

    Download

II. Purge Old Running Workflows and Tasks

In addition to fixing the interval of the workflow notification badge, the cause for the problem is due to too many tasks pending in the user's inbox.  To address this, you have to delete workflow inbox items and tasks that are no longer needed:

  1. Go to the Workflow Maintenance JMX object:
    http://host:port/system/console/jmx/com.adobe.granite.workflow%3Atype%3DMaintenance

  2. If you don't need actively running workflows, then run the workflow purge on them by initiating purgeActive with dryRun = false.

  3. Go to http://host:port/crx/explorer/index.jsp and log in as admin.

  4. Open Content Explorer.

  5. Browse to /etc/taskmanagement/tasks.

  6. Delete tasks by right clicking the folder node and selecting Delete Recursively.

  7. Disable Preliminary Scan and run the deletion.

  8. In addition, you have more tasks under projects in /content/projects.  Use the /projects.html to remove old projects that are no longer needed.

  9. Use CRXDe to browse /content/projects subnodes and delete any tasks which are no longer required. For example: /content/projects/geometrixx/outdoors/jcr:content/dashboard/gadgets/tasks

 Adobe

Get help faster and easier

New user?