Ensuring High Service Availability on AEM Instances

Best Practices for Avoiding Production Outages

Common Issues

Author/Publish instance is very slow Or High CPU usage

High Memory usage on AEM instances

  • Check the memory usage at [1]
  • Generate heap dumps using article at [2] and share it with AEM Support for further analysis

[1] http://<host>:<port>/system/console/memoryusage
[2] https://helpx.adobe.com/experience-manager/kb/AnalyzeMemoryProblems.html

High CPU usage after dispatcher cache clear

  • You can define cache invalidation by using the "/invalidate" and "/statfileslevel"
    • If you deny all for invalidation and with no /statfileslevel -> Only activated pages are deleted
    • If you allow all for invalidation and /statfileslevel defined -> Only pages will get invalidated in the same folder where the stat file was updated
    • If you allow all for invalidation and with no /statfileslevel -> All pages get invalidated wherever they are located under docroot
  • After code deployments, try to recache the pages. Immediate recaching ensures that Dispatcher retrieves and caches the page only once, instead of once for each of the simultaneous client requests.
  • Refer to the Optimizing Dispatcher Cache article for more in-depth insights.

Observed SegmentNotFound Exceptions in the logs

  • Follow steps at Resolving Segmentnotfound
  • If no good revision is found, try to find the corrupted nodes using the script mentioned in Part B of the above article.
  • If corruption is found under any of the folders except /apps, please contact AEM Support team for further guidance.

RCA for AEM outage which resolved after restart

Share the following data with the AEM Support team to analyze RCA:

  • Log files during the outage
  • Thread dumps taken during the outage
  • If available, Heap Dumps during the outage

Session leak in AEM

Check and analyze if JCR session leaks in your AEM instance

Detailed Guide on Troubleshooting Critical Issues


Get help faster and easier

New user?