The AEM author instance is slow and crashing due to pile up of failed workflows and workflow inbox badge pulse.data.json calls. The workflow inbox badge is the bell icon in the upper-right of the Touch UI.
In the access.log, it is observed that many calls to pulse.data.json are occurring from the same users.
10.43.34.55 - user 06/Jan/2017:18:17:11 +0900 "GET /mnt/overlay/granite/ui/content/shell/header/actions/pulse.data.json?_=1483664547926 HTTP/1.1" 500 1234 "https://authorhost/sites.html/content/geometrixx/en" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0" 10.43.34.55 - user 06/Jan/2017:18:17:13 +0900 "GET /mnt/overlay/granite/ui/content/shell/header/actions/pulse.data.json?_=1483664547927 HTTP/1.1" 500 1234 "https://authorhost/sites.html/content/geometrixx/en" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0" 10.43.34.55 - user 06/Jan/2017:18:17:15 +0900 "GET /mnt/overlay/granite/ui/content/shell/header/actions/pulse.data.json?_=1483664547928 HTTP/1.1" 500 1234 "https://authorhost/sites.html/content/geometrixx/en" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0" 10.43.34.55 - user 06/Jan/2017:18:17:17 +0900 "GET /mnt/overlay/granite/ui/content/shell/header/actions/pulse.data.json?_=1483664547929 HTTP/1.1" 500 1234 "https://authorhost/sites.html/content/geometrixx/en" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"
In addition, AEM server shows high CPU utilization and thread dumps captured from AEM threads like the one below:
"192.168.10.5 [1485896535734] GET /mnt/overlay/granite/ui/content/shell/header/actions/pulse.data.json HTTP/1.1" #1969 prio=5 os_prio=0 tid=0x0000000001a39800 nid=0xaf2 runnable [0x00007f2f5ae4f000] java.lang.Thread.State: RUNNABLE at org.apache.jackrabbit.oak.spi.state.ReadOnlyBuilder.getChildNodeNames(ReadOnlyBuilder.java:101) at org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildNames(AbstractTree.java:145) at org.apache.jackrabbit.oak.plugins.tree.impl.AbstractTree.getChildren(AbstractTree.java:322) at org.apache.jackrabbit.oak.security.authorization.permission.PermissionStoreImpl.loadPermissionEntries(PermissionStoreImpl.java:150) at org.apache.jackrabbit.oak.security.authorization.permission.PermissionStoreImpl.load(PermissionStoreImpl.java:115) at org.apache.jackrabbit.oak.security.authorization.permission.PermissionEntryCache.getEntries(PermissionEntryCache.java:48) at org.apache.jackrabbit.oak.security.authorization.permission.PermissionEntryCache.load(PermissionEntryCache.java:63) at org.apache.jackrabbit.oak.security.authorization.permission.PermissionEntryProviderImpl.init(PermissionEntryProviderImpl.java:109) at org.apache.jackrabbit.oak.security.authorization.permission.PermissionEntryProviderImpl.<init>(PermissionEntryProviderImpl.java:72) at org.apache.jackrabbit.oak.security.authorization.permission.CompiledPermissionImpl.<init>(CompiledPermissionImpl.java:112) at org.apache.jackrabbit.oak.security.authorization.permission.CompiledPermissionImpl.create(CompiledPermissionImpl.java:126) at org.apache.jackrabbit.oak.security.authorization.permission.PermissionProviderImpl.<init>(PermissionProviderImpl.java:71) at org.apache.jackrabbit.oak.security.authorization.AuthorizationConfigurationImpl.getPermissionProvider(AuthorizationConfigurationImpl.java:185) at org.apache.jackrabbit.oak.security.authorization.composite.CompositeAuthorizationConfiguration.getPermissionProvider(CompositeAuthorizationConfiguration.java:134) at org.apache.jackrabbit.oak.core.MutableRoot$1.createValue(MutableRoot.java:126) at org.apache.jackrabbit.oak.core.MutableRoot$1.createValue(MutableRoot.java:123) at org.apache.jackrabbit.oak.core.LazyValue.get(LazyValue.java:53) - locked <0x0000000726732148> (a org.apache.jackrabbit.oak.core.MutableRoot$1) at org.apache.jackrabbit.oak.core.MutableRoot$2.getExecutionContext(MutableRoot.java:311) at org.apache.jackrabbit.oak.query.QueryEngineImpl.executeQuery(QueryEngineImpl.java:256) at org.apache.jackrabbit.oak.query.QueryEngineImpl.executeQuery(QueryEngineImpl.java:233) at org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:314) at org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:308) at org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:304) at org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.getTree(IdentifierManager.java:133) at org.apache.jackrabbit.oak.security.user.AuthorizableBaseProvider.getByContentID(AuthorizableBaseProvider.java:56) at org.apache.jackrabbit.oak.security.user.AuthorizableBaseProvider.getByID(AuthorizableBaseProvider.java:51) at org.apache.jackrabbit.oak.security.user.UserProvider.getAuthorizable(UserProvider.java:211) at org.apache.jackrabbit.oak.security.user.UserManagerImpl.getAuthorizable(UserManagerImpl.java:110) at org.apache.jackrabbit.oak.jcr.delegate.UserManagerDelegator$1.performNullable(UserManagerDelegator.java:64) at org.apache.jackrabbit.oak.jcr.delegate.UserManagerDelegator$1.performNullable(UserManagerDelegator.java:61) at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.performNullable(SessionDelegate.java:243) at org.apache.jackrabbit.oak.jcr.delegate.UserManagerDelegator.getAuthorizable(UserManagerDelegator.java:61) at com.adobe.granite.workflow.core.WorkflowSessionFactory.isSuperUser(WorkflowSessionFactory.java:356) at com.adobe.granite.workflow.core.WorkflowSessionFactory.getWorkflowSession(WorkflowSessionFactory.java:389) at com.adobe.granite.workflow.core.WorkflowSessionFactory.getAdapter(WorkflowSessionFactory.java:515) at org.apache.sling.adapter.internal.AdapterManagerImpl.getAdapter(AdapterManagerImpl.java:147) at com.adobe.granite.workflow.core.jcr.WorkItemAdapterFactory.getAdapter(WorkItemAdapterFactory.java:128) at org.apache.sling.adapter.internal.AdapterManagerImpl.getAdapter(AdapterManagerImpl.java:147) at com.adobe.granite.workflow.core.jcr.WorkItemManager.getInboxItems(WorkItemManager.java:289) at com.adobe.granite.workflow.core.WorkflowSessionImpl.getActiveInboxItems(WorkflowSessionImpl.java:711) at com.adobe.granite.workflow.core.WorkflowSessionImpl.getActiveInboxItems(WorkflowSessionImpl.java:618) at org.apache.jsp.libs.cq.gui.components.shell.badge.data_json_jsp._jspService(data_json_jsp.java:609)
When a user or group has many pending inbox items, it makes the badge pulse.data.json calls slow. The calls to pulse.data.json then time out and compound which overloads AEM.
Since it is not possible to install a fix pack on a production AEM environment in a short timeframe, as a temporary workaround, do the following:
In addition to fixing the interval of the workflow notification badge, the cause for the problem is due to too many tasks pending in the user's inbox. To address this, you have to delete workflow inbox items and tasks that are no longer needed: