CRX/CQ process uses 100% of the CPU, the system doesn't respond, or the system is slow
FAQ
What are the typical situations causing high CPU consumption ?
Certain maintenance activities can cause a higher CPU usage than usual : tar compaction, datastore garbage collection, online backup, tree activation, deployment of an application update causing caches to be flushed, ...
What can be a reason for a 0% CPU consumption ?
A java level deadlock can cause such situation. In this case, take a few thread dumps, raise a support ticket and restart the AEM instance.
Are there performance tuning tips available ?
Solutions
Video gem - technical deep dive session
Video gem session available at http://dev.day.com/content/ddc/en/gems/cq-aem-5-6-troubleshooting.html
Adobe Experience Manager 5.6 or later, and CRX 2.3 or later
Use http://localhost:4502/system/console/profiler for at least a few minutes during the period of slowness or high CPU usage. The output helps you determine which JVM threads are consuming most CPU cycles, and their associated packages and classes.
Up to CRX 2.2
Use the simple CPU profiling tool that is included in CRX 2.0.x. To start it, open
http://localhost:7402/crx/diagnostic/prof.jsp
CRX 1.x
To help analyzing the problem, create a few full thread dumps. Those thread dumps can then be analyzed. Creating Full Thread Dumps
Get the Process ID
To get the process id of your Java process, use
jps -l
If this doesn't work (path not set, JDK not installed, or older Java version), use
ps -el | grep java
Full Thread Dumps
To analyze a performance problem or a blocked process, create about ten full thread dumps with about one-second delay. If the problem could be related to clustering, create at least ten full thread dumps on each cluster node. If possible, the thread dumps should be created at roughly the same time (it doesn't need to be exact).
A full thread dump is starting with this information as example:
2015-07-22 10:26:30
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.65-b04):
"Thread-76273" daemon prio=3 tid=0x111061 nid=0x111061 running [0x111061]
... stack and locked object MUST be present
If your thread dump doesn't look like above, then it will not be possible to make proper investigations.
You can use the "tool" provided on the package share as described on the page. It provide a Thread dump tool that allow you to take multiple thread dumps, it will dump in the above format.
Alternatively, if installed, use jstack. This command prints the thread dumps to system out:
jstack <pid>
This command appends a full thread dump to a file:
jstack <pid> >> threadDumpNode1.txt
On some systems you may have to use: sudo -u aem jstack -J-d64 -l <pid>
If this doesn't work, use kill -QUIT. This command prints the thread dumps to the log file:
kill -QUIT <pid>
If there are no thread dumps in the standard output that last command, maybe add this to the java parameters:
-XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput -XX:LogFile=jvm.log
Note: If the steps above steps for obtaining thread dumps do not work in your environment, then see this article.
Check CPU usage
To analyze the problem, it is important to know if CRX /CQ is running in an endless loop, or if it's merely sleeping. To find it out, type
top
This command gets you the list of processes, sorted by CPU usage. If the top process is a "Java process, and if the PID matches CRX/CQ, then the process is running full speed.
If you are unsure how to interpret the results, run the following statement and then include the file top.txt in your problem report:
top -l5 -s5 > top.txt
Check session count
In many cases the problem is the number of open sessions it too large. At some point, it slows down processing. To find out if this is the case, run
jps -l (to get the process id of the Java process)
jmap -histo <pid> | grep CRXSession (to get the number of open sessions)
If this is, in fact, the problem (the number is higher than a few hundred sessions) then it needs to be analyzed. Possibly a session pool is used (depending on the version of CRX / CQ there could be a hot fix for the given problem), or an internal (possibly application level) cache references sessions. To analyze where those sessions are opened, see the 'Analyze Unclosed Sessions' page.
Do not kill the process
The CRX process should never be killed, also not when stopping takes too long. If you need to kill a process that is not responding, create a full thread dump first and log a bug.
If you do kill the CRX process, the next time you start it up the Tar PM can create backup_.tar files.
Support Tools
Use the Thread Dump Collection and Analysis tool to take thread dumps from a running CQ instance for troubleshooting the following:
- performance
- lock contention
- deadlock
- other thread-related issues