Objective

This articles describes the most common critical AEM issues and how to analyze them.

AEM Sites Performance Issues

Symptoms of a performance issue

  1. Slow loading of pages  
  2. Slow creation or editing of pages
  3. AEM response times are slow
  4. AEM is not responding for some requests
  5. The request.log on AEM shows slow response times

What causes performance issues

  1. Thread contention - long running requests such as slow searches, write-heavy background jobs, moving of whole branches of site content, etc.
  2. High CPU utilization
  3. Expensive requests such as expensive searches or inefficient application code, components, etc.
  4. Lack of proper maintenance
  5. Insufficient dispatcher caching
  6. Lack of CDN
  7. Lack of browser caching
  8. Too many scripts loaded on page and loaded at top of page
  9. CSS loaded throughout page instead of in the HTML head
  10. Insufficient server sizing or incorrect architecture
  11. Memory issues (see below)

How to analyze the performance issue

1. Capture a series of thread dumps and analyze them

2. Check at the OS level if the AEM java process is causing high CPU utilization

    Linux: use the top command to check CPU utilization.

    Window: use the Windows Task Manager

    If AEM is causing high CPU utilization then run the out-of-the-box profiling tool for a few minutes and analyze the result.

3. Analyze the request.log file for any slow requests

4. Review your system maintenance procedures and ensure that you are doing proper maintenance on AEM including the following:

  • Revision Clean Up (MongoMK and Database DocumentNodeStore's only) - daily or more frequent
  • Offline Tar Compaction (TarMK only) - bi-weekly
  • Data Store Garbage Collection (Systems with FileDataStore or S3 DataStore only) - weekly
  • Workflow Purge - weekly
  • Version Purge - weekly
  • AuditLog Purge - weekly

5. Review caching strategies implemented at the AEM dispatcher level.  The best place to start is to gain an understanding of when and how the dispatcher caches files and invalidates cached files.

6. Check if you are using a CDN

7. See if you are leveraging browser caching - check for the Cache-Control header

8. Use client-side site analysis tools such as the "Audits" feature in Google Chrome browser "Developer Tools" panel.  These tools will give you recommendations on client-side performance improvements.

Solutions to common performance issues

AEM Assets Performance Issues

Symptoms of an Assets performance issue

  • Slow file uploads to /assets.html or /damadmin UI
  • Thumbnails are taking too long to be generated
  • Assets operations such as move, delete, edit, and metadata update taking too long

What causes issues with Assets performance

  • Lack of proper maintenance
  • Latest fix packs not applied
  • Optimizations not applied
  • Inadequate server sizing for the user load

How to analyze the Assets performance issue

Solutions to common Assets performance issues

Memory Issues

Symptoms of a memory issue

  • AEM crashes randomly and in the logs OutOfMemoryError is observed
  • AEM gets slower over time and eventually crashes
  • AEM is unresponsive

Diagnosing a memory issue

  • Search the log files for OutOfMemoryError, if you find any matches then you have a memory issue
  • Review the http://aem-host:port/system/console/memoryusage screen
    If the "Old Generation" (JDK 7 and earlier) or "Tenured Generation" (JDK8 or later) usage is high then this could be a sign of a heap memory utilization issue.  Click "Run Garbage Collector" to request the JVM to run a full heap garbage collection.  If the high heap utilization stays high after requesting GC then there is likely an issue.  On an AEM instance with Oak Tar storage, if the tenured usage is higher than 3GB then there might be a problem.  High heap utilization on a system with Mongo storage could be due to the in-memory cache configuration.
  • Take thread dumps and top output and perform thread analysis.  Check if the threads causing high CPU utilization are native JVM Garbage Collection threads.  If the thread using the most CPU time are the "VM Thread" or any garbage collection threads then there is likely a memory issue.

What causes memory issues

  • Java application memory leak
  • Java Finalizer pile up due to incorrect use of finalize in custom code
  • Insufficient max heap configuration

How to analyze the cause of your memory issue

See this article for details on how to capture a heap dump.

The best way to identify the cause of a memory issue is to analyze a heap dump.  

Once you've captured a Heap Dump file then open it in Eclipse MAT or IBM Memory Analyzer tool.  In Eclipse MAT, run the Leak Suspects report and open "Thread Details" view to see potential causes for the memory issue.

Solutions to common memory issues

  • Optimize your application code to utilize less memory if you notice long garbage collection pauses.  Most Garbage Collection issues can best be solved by optimizing the application versus tuning the JVM.
  • If you have already optimized your application and still experience long GC pauses then focus on tuning the JVM.

AEM Indexing Issues

Symptoms of indexing issues

The following are signs of an issue with AEM/Oak indexing:

  • Search results are outdated by more than 10 minutes
  • There are missing search results
  • Errors are returned either in the UI or logs during search via site UI, Query Builder search, or JCR query execution
Diagnosing an indexing issue
  • To see if asynchronous indexing is slow or failing, do the following:

1. Open these URLs on your AEM instance to view stats about the Async indexer

http://aemhost:port/system/console/jmx/org.apache.jackrabbit.oak%3Aname%3Dasync%2Ctype%3DIndexStats

http://aemhost:port/system/console/jmx/org.apache.jackrabbit.oak%3Aname%3Dfulltext-async%2Ctype%3DIndexStats  - This URL only applies to AEM6.2 and later

2. On each of those pages, check these fields:

FailingSince - This indicates when indexing first started failing.

LastError - This is the stack trace showing what is causing indexing to fail.  If this is empty then indexing isn't failing.

LastErrorTime - This indicates the last time indexing threw the error.

LastIndexedTime - If the date and time of this field is over 5 minutes old then indexing is running too slow.

What causes issues with indexing

  • Improper maintenance or failure to perform maintenance such as Revision Garbage Collection, Workflow Purge, Audit Purge, Version Purge, etc.
  • Corrupt or missing segments in Tar storage
  • Revision Corruption in a clustered environment (DocumentNodeStore - Mongo or Database)
  • An issue with the cluster topology in a clustered environment

How to analyze what is causing indexing issues

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License  Twitter™ and Facebook posts are not covered under the terms of Creative Commons.

Legal Notices   |   Online Privacy Policy