AEM6.2 is entirely stuck after installing a hotfix for deploying application bundles. After capturing thread dumps and opening them in a thread dump analyzer (like http://fastthread.io or IBM Thread Analyzer), the thread analyzer reports a deadlock.
The reported deadlock is between two threads like the ones below:
Thread [1] calls org.apache.sling.discovery.oak.OakDiscoveryService.unbindTopologyEventListener
Thread [2] calls org.apache.sling.discovery.oak.pinger.OakViewChecker.discoveryLiteCheck
Sometimes the deadlock can be a circular deadlock with a third thread involved.
[1]
"LeaseFailureHandler-Thread" daemon prio=5 tid=0x7f34 nid=0xffffffff in Object.wait() java.lang.Thread.State: WAITING (on object monitor) at sun.misc.Unsafe.park(Native Method) - waiting to lock <0x575a7644> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "discovery.connectors.common.runner.e8dd34be-8886-4e5d-891f-5509b4dea0f0.discoveryLiteChec k" tid=0x4,330 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) at org.apache.sling.discovery.oak.OakDiscoveryService.unbindTopologyEventListener(OakDiscoveryService.java:368) ... at org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:144) at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator.unregisterFactory(ResourceResolverFactoryActivator.java:611) at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator.unregisterFactory(ResourceResolverFactoryActivator.java:602) at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator.checkFactoryPreconditions(ResourceResolverFactoryActivator.java:674) at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator.access$100(ResourceResolverFactoryActivator.java:79) at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator$1.providerRemoved(ResourceResolverFactoryActivator.java:500) at org.apache.sling.resourceresolver.impl.providers.ResourceProviderTracker.unregister(ResourceProviderTracker.java:224) - locked <0x35cd7613> (a java.util.HashMap) at org.apache.sling.resourceresolver.impl.providers.ResourceProviderTracker.access$100(ResourceProviderTracker.java:58) at org.apache.sling.resourceresolver.impl.providers.ResourceProviderTracker$1.removedService(ResourceProviderTracker.java:109) ... at org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:144) at org.apache.sling.jcr.base.AbstractSlingRepositoryManager.unregisterService(AbstractSlingRepositoryManager.java:262) at org.apache.sling.jcr.base.AbstractSlingRepositoryManager.stop(AbstractSlingRepositoryManager.java:389) ... at org.apache.felix.framework.BundleImpl.stop(BundleImpl.java:1038) at org.apache.felix.framework.BundleImpl.stop(BundleImpl.java:1024) at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService$1.handleLeaseFailure(DocumentNodeStoreService.java:413) at org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo$1.run(ClusterNodeInfo.java:696) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - locked <0x4402f4fd> (a java.util.concurrent.locks.ReentrantLock$FairSync) - locked <0x1e2230ed> (a java.util.concurrent.locks.ReentrantLock$FairSync) - locked <0x56ba270f> (a java.util.concurrent.locks.ReentrantLock$FairSync)
"discovery.connectors.common.runner.e8dd34be-8886-4e5d-891f-5509b4dea0f0.discoveryLiteCheck" daemon prio=5 tid=0x10ea nid=0xffffffff waiting for monitor entry java.lang.Thread.State: BLOCKED at org.apache.sling.resourceresolver.impl.providers.ResourceProviderTracker.getResourceProviderStorage(ResourceProviderTracker.java:364) - waiting to lock <0x35cd7613> (a java.util.HashMap) owned by "LeaseFailureHandler-Thread" tid=0x32,564 at org.apache.sling.resourceresolver.impl.ResourceResolverImpl.createControl(ResourceResolverImpl.java:154) at org.apache.sling.resourceresolver.impl.ResourceResolverImpl.<init>(ResourceResolverImpl.java:116) at org.apache.sling.resourceresolver.impl.ResourceResolverImpl.<init>(ResourceResolverImpl.java:110) at org.apache.sling.resourceresolver.impl.CommonResourceResolverFactoryImpl.getResourceResolverInternal(CommonResourceResolverFactoryImpl.java:257) at org.apache.sling.resourceresolver.impl.CommonResourceResolverFactoryImpl.getAdministrativeResourceResolver(CommonResourceResolverFactoryImpl.java:140) at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryImpl.getAdministrativeResourceResolver(ResourceResolverFactoryImpl.java:107) at org.apache.sling.discovery.oak.cluster.OakClusterViewService.getResourceResolver(OakClusterViewService.java:103) at org.apache.sling.discovery.oak.cluster.OakClusterViewService.getLocalClusterView(OakClusterViewService.java:110) at org.apache.sling.discovery.base.commons.BaseDiscoveryService.getTopology(BaseDiscoveryService.java:77) at org.apache.sling.discovery.oak.OakDiscoveryService.checkForTopologyChange(OakDiscoveryService.java:657) at org.apache.sling.discovery.oak.pinger.OakViewChecker.discoveryLiteCheck(OakViewChecker.java:232) - locked <0x740a9729> (a java.lang.Object) at org.apache.sling.discovery.oak.pinger.OakViewChecker.access$000(OakViewChecker.java:64) at org.apache.sling.discovery.oak.pinger.OakViewChecker$1.run(OakViewChecker.java:208) at org.apache.sling.discovery.base.commons.PeriodicBackgroundJob.safelyRun(PeriodicBackgroundJob.java:86) at org.apache.sling.discovery.base.commons.PeriodicBackgroundJob.run(PeriodicBackgroundJob.java:77) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - locked <0x575a7644> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
Known Apache Sling bug: SLING-5622. This issue applies only to pre-SP1 installations of AEM 6.2. If SP1 has already been applied successfully, then you would not experience this issue.
This issue occurs only prior to installing SP1. To fix this issue, contact AEM Customer Care and request hotfix 11473. The fix is already included in Service Pack 1.