Take thread dumps by following the steps in this article
Issue
Stopping the AEM java process gracefully is taking too long (over 10 minutes).
Environment
Cause
There can be many things that can cause AEM shutdown to take a long time. When you stop the AEM java process, it executes a java hook to shut down the Apache Felix OSGi container that AEM runs in. During the shut down of the OSGi container, the system stops all OSGi bundles and components. As part of that process, various services finish write operations, close out the open file handles, and wait until all active HTTP requests are responded to.
The most common causes of slow shutdowns are:
- The deactivate method for an OSGi component takes a long time to execute
- There are long running requests when the system is shut down
Resolution
To fix a slow shutdown issue, you need to analyze thread dumps to find out which threads are delaying the shutdown.
Follow these steps:
-
-
Open the thread dumps in a thread dump analyzer such as TDA, http://fastthread.io/, or IBM Thread Analyzer
-
Search the thread dumps for HTTP request threads in RUNNABLE state with names like the threads below:
- Threads starting with "qtp" in the name:
"qtp1926827727-86864" #86864 prio=5 os_prio=0 tid=0x00007f320894a800 nid=0x79f0 runnable [0x00007f31d7109000] java.lang.Thread.State: RUNNABLE
- Or threads with IP and request line in the name:
"10.25.10.11 [1457551498445] GET /content/dam/test.jpg HTTP/1.1" #38626 prio=5 os_prio=0 tid=0x00007fe5c854c800 nid=0x7f9c runnable [0x00007fe55f7f3000] java.lang.Thread.State: RUNNABLE
-
In addition to request threads, search for the thread with name "FelixStartLevel". That thread handles starting and stopping all the OSGi bundles and components and give some indication of what is delaying shutdown.
"FelixStartLevel" #18 daemon prio=5 os_prio=0 tid=0x00007f32ad8c2800 nid=0x6992 runnable [0x00007f32946d7000] java.lang.Thread.State: RUNNABLE
-
Look for patterns in the stack trace of the "FelixStartLevel" thread across thread dumps. See if it is stuck stopping a bundle or deactivating a particular OSGi component across many of the thread dumps. You can use a tool such as "grep" to analyze this. For example, if you observed that the SlingServletResolver OSGi component was being deactivated across multiple thread dumps then you might use the command below. The command below counts how many thread dumps have FelixStartLevel thread with SlingServletResolver in its stack trace:
grep -A 50 FelixStartLevel jstack.* | grep SlingServletResolver | awk '{print $1 }' | uniq | wc -l
-
Once you figure out what is delaying shutdown and determine if it is an application issue or not. If it is related to AEM product code then contact AEM Customer Care.
Uwaga:See this article for details on thread dump analysis.