Deadlock between service OakDiscoveryService.unbindTopologyEventListener and OakViewChecker.discoveryLiteCheck

Issue

AEM6.2 is entirely stuck after installing a hotfix for deploying application bundles.  After capturing thread dumps and opening them in a thread dump analyzer (like http://fastthread.io or IBM Thread Analyzer), the thread analyzer reports a deadlock.  

The reported deadlock is between two threads like the ones below:

Thread [1] calls org.apache.sling.discovery.oak.OakDiscoveryService.unbindTopologyEventListener

Thread [2] calls org.apache.sling.discovery.oak.pinger.OakViewChecker.discoveryLiteCheck

Sometimes the deadlock can be a circular deadlock with a third thread involved.

[1]

"LeaseFailureHandler-Thread" daemon prio=5 tid=0x7f34 nid=0xffffffff in Object.wait()
   java.lang.Thread.State: WAITING (on object monitor)
        at sun.misc.Unsafe.park(Native Method)
        - waiting to lock <0x575a7644> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "discovery.connectors.common.runner.e8dd34be-8886-4e5d-891f-5509b4dea0f0.discoveryLiteChec
k" tid=0x4,330
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
        at org.apache.sling.discovery.oak.OakDiscoveryService.unbindTopologyEventListener(OakDiscoveryService.java:368)
...
        at org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:144)
        at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator.unregisterFactory(ResourceResolverFactoryActivator.java:611)
        at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator.unregisterFactory(ResourceResolverFactoryActivator.java:602)
        at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator.checkFactoryPreconditions(ResourceResolverFactoryActivator.java:674)
        at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator.access$100(ResourceResolverFactoryActivator.java:79)
        at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator$1.providerRemoved(ResourceResolverFactoryActivator.java:500)
        at org.apache.sling.resourceresolver.impl.providers.ResourceProviderTracker.unregister(ResourceProviderTracker.java:224)
        - locked <0x35cd7613> (a java.util.HashMap)
        at org.apache.sling.resourceresolver.impl.providers.ResourceProviderTracker.access$100(ResourceProviderTracker.java:58)
        at org.apache.sling.resourceresolver.impl.providers.ResourceProviderTracker$1.removedService(ResourceProviderTracker.java:109)
...
        at org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:144)
        at org.apache.sling.jcr.base.AbstractSlingRepositoryManager.unregisterService(AbstractSlingRepositoryManager.java:262)
        at org.apache.sling.jcr.base.AbstractSlingRepositoryManager.stop(AbstractSlingRepositoryManager.java:389)
...
        at org.apache.felix.framework.BundleImpl.stop(BundleImpl.java:1038)
        at org.apache.felix.framework.BundleImpl.stop(BundleImpl.java:1024)
        at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService$1.handleLeaseFailure(DocumentNodeStoreService.java:413)
        at org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo$1.run(ClusterNodeInfo.java:696)
        at java.lang.Thread.run(Thread.java:745)


   Locked ownable synchronizers:
        - locked <0x4402f4fd> (a java.util.concurrent.locks.ReentrantLock$FairSync)
        - locked <0x1e2230ed> (a java.util.concurrent.locks.ReentrantLock$FairSync)
        - locked <0x56ba270f> (a java.util.concurrent.locks.ReentrantLock$FairSync)

[2]

"discovery.connectors.common.runner.e8dd34be-8886-4e5d-891f-5509b4dea0f0.discoveryLiteCheck" daemon prio=5 tid=0x10ea nid=0xffffffff waiting for monitor entry
   java.lang.Thread.State: BLOCKED
        at org.apache.sling.resourceresolver.impl.providers.ResourceProviderTracker.getResourceProviderStorage(ResourceProviderTracker.java:364)
        - waiting to lock <0x35cd7613> (a java.util.HashMap) owned by "LeaseFailureHandler-Thread" tid=0x32,564
        at org.apache.sling.resourceresolver.impl.ResourceResolverImpl.createControl(ResourceResolverImpl.java:154)
        at org.apache.sling.resourceresolver.impl.ResourceResolverImpl.<init>(ResourceResolverImpl.java:116)
        at org.apache.sling.resourceresolver.impl.ResourceResolverImpl.<init>(ResourceResolverImpl.java:110)
        at org.apache.sling.resourceresolver.impl.CommonResourceResolverFactoryImpl.getResourceResolverInternal(CommonResourceResolverFactoryImpl.java:257)
        at org.apache.sling.resourceresolver.impl.CommonResourceResolverFactoryImpl.getAdministrativeResourceResolver(CommonResourceResolverFactoryImpl.java:140)
        at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryImpl.getAdministrativeResourceResolver(ResourceResolverFactoryImpl.java:107)
        at org.apache.sling.discovery.oak.cluster.OakClusterViewService.getResourceResolver(OakClusterViewService.java:103)
        at org.apache.sling.discovery.oak.cluster.OakClusterViewService.getLocalClusterView(OakClusterViewService.java:110)
        at org.apache.sling.discovery.base.commons.BaseDiscoveryService.getTopology(BaseDiscoveryService.java:77)
        at org.apache.sling.discovery.oak.OakDiscoveryService.checkForTopologyChange(OakDiscoveryService.java:657)
        at org.apache.sling.discovery.oak.pinger.OakViewChecker.discoveryLiteCheck(OakViewChecker.java:232)
        - locked <0x740a9729> (a java.lang.Object)
        at org.apache.sling.discovery.oak.pinger.OakViewChecker.access$000(OakViewChecker.java:64)
        at org.apache.sling.discovery.oak.pinger.OakViewChecker$1.run(OakViewChecker.java:208)
        at org.apache.sling.discovery.base.commons.PeriodicBackgroundJob.safelyRun(PeriodicBackgroundJob.java:86)
        at org.apache.sling.discovery.base.commons.PeriodicBackgroundJob.run(PeriodicBackgroundJob.java:77)
        at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
        - locked <0x575a7644> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

Cause

Known Apache Sling bug: SLING-5622.  This issue applies only to pre-SP1 installations of AEM 6.2.  If SP1 has already been applied successfully, then you would not experience this issue.

Resolution

This issue occurs only prior to installing SP1.  To fix this issue, contact AEM Customer Care and request hotfix 11473.  The fix is already included in Service Pack 1.