Go to the OSGi console http://host:port/system/console/configMgr
AEM Communities User Synchronization stops working and users no longer get synchronized between publish instances.
Cause
There can be many causes for user synchronization to fail. The most common are:
- Misconfiguration
- An error on saving when the user is on the publish instance
- System failure to save the user package due to an error or permission issue (Author or Publish instance)
- Sling jobs getting stuck due to missing user package
Resolution
I. Follow the documentation troubleshooting guide
II. Update the VaultDistributionPackageBuilderFactory (only applies to AEM 6.2)
** This only applies to AEM 6.2.
Update the socialpubsync-vlt Vault Package Builder Factory to address these items:
- store the user synchronization packages on the file system on non-clustered instances to increase stability and performance.
- include rep:policy nodes and avoid .token and rep:cache nodes being synchronized with the user
- Avoid the error "cannot retrieve packages" [1]
-
-
Find the Apache Sling Distribution Packaging - Vault Package Builder Factory with Name field socialpubsync-vlt.
-
Select the type drop-down list and select file packages instead of jcr packages. Only perform this step on non-clustered instances.
-
In the Package Filters field, add these values:
- /home/users|-.*/rep:cache
- /home/users|-.*/.tokens
- /home/users|-.*/rep:policy
Note:Adding rep:cache here avoids the error below:
[... POST /libs/sling/distribution/services/importers/socialpubsync HTTP/1.1] org.apache.jackrabbit.vault.packaging.impl.ZipVaultPackage Error during install.
javax.jcr.nodetype.ConstraintViolationException: OakConstraint0034: Attempt to create or change the system maintained cache.
-
Select The digest algorithm to calculate the package checksum drop-down list and select md5.
-
Click Save.
If you are running AEM6.2, then install Cumulative Fix Pack 3 to all Author and Publish instances or contact AEM Customer Care to request hotfix for NPR-13034 . If you don't install these, then the above configuration would have no effect.
III. Ensure that sling: Folder nodes are distributed (only applies to AEM 6.2)
** This only applies to AEM 6.2.
There is a problem with the default configuration in User Sync where it doesn't distribute the sling:Folder nodes such as social/relationships/following.
-
Go to http://aem-host:port/system/console/configMgr/com.adobe.cq.social.sync.impl.UserSyncListenerImpl and login as admin.
-
Add sling:Folder to the Node Types.
IV. Clear out user packages
Depending on the version of Sling Distribution and AEM Social Communities you have, you might have user packages created under /etc/packages/sling (older versions - AEM6.1 with no hotfixes) or /var/sling/distribution/packages (newer versions - AEM6.1 with AEM Social Communities FP4 or later).
-
Go to http://host:port/crx/explorer/index.jsp
-
Log in as admin.
-
Browse to /etc/packages and delete the sling subfolder.
-
Browse to /var/sling/distribution and delete the packages subfolder.
-
Click Save.
If you already have the type field set to file packages in your Vault Package Builder Factory configuration, then you have to clear the packages from the temp folder:
-
Log in to your AEM server.
-
Go to the temp directory used by AEM's java process (this could be potentially defined by the JVM parameter -Djava.io.tmpdir).
-
Delete all packages from that folder. Here's an example command that could be used in Linux: for i in dstrpck*; do rm $i; done
-
Repeat for all AEM instances (publish instances and the author instance).
V. Unblock the distribution queue:
Since you have cleared out the packages, delete all the stuck Sling jobs that reference them. If old jobs under /var/eventing/jobs/unassigned are not processing due to some error, then they could cause User Sync to fail. Delete those on each AEM node to unblock the synchronization queue:
-
Go to http://host:port/crx/explorer/index.jsp
-
Log in as admin.
-
Open Content Explorer.
-
Browse to /var/eventing/jobs
-
Right click the first subfolder of /var/eventing/jobs/unassigned
-
Find the child node that starts with org.apache.sling.distribution
-
Right click that node and Delete Recursively.
-
Uncheck the Preliminary Scan box.
-
Delete.
VI. Further Analysis
If none of the above steps fixed the issue with user synchronization, then enable debug level logging for these java packages (Author and Publish instances):
-
Go to http://aem-host:port/system/console/slinglog
-
Click Add new logger.
-
Set the following values:
- Log Level -> Debug
- Log File -> logs/usersync.log
- Loggers
- org.apache.sling.distribution
- org.apache.sling.event
- com.adobe.cq.social.sync
-
Click Save.
-
Contact AEM Customer Care for assistance, include a description of the issue and attach the log files.
[1] Error in User Sync after applying 6.2 Cumulative Fix Pack 5 or later
22.08.2017 12:38:16.044 *ERROR* [sling-default-655-scheduledEventTriggerorg.apache.sling.distribution.agent.impl.TriggerAgentRequestHandler@3b05483d] org.apache.sling.distribution.agent.impl.SimpleDistributionAgent [agent][socialpubsync] cannot retrieve packages org.apache.sling.distribution.common.DistributionException: java.lang.NullPointerException at org.apache.sling.distribution.packaging.impl.FileDistributionPackageBuilder.readPackageInternal(FileDistributionPackageBuilder.java:127) at org.apache.sling.distribution.packaging.impl.AbstractDistributionPackageBuilder.readPackage(AbstractDistributionPackageBuilder.java:111) at org.apache.sling.distribution.serialization.impl.vlt.VaultDistributionPackageBuilderFactory.readPackage(VaultDistributionPackageBuilderFactory.java:243) at org.apache.sling.distribution.transport.impl.SimpleHttpDistributionTransport.retrievePackage(SimpleHttpDistributionTransport.java:156) at org.apache.sling.distribution.packaging.impl.exporter.RemoteDistributionPackageExporter.exportPackages(RemoteDistributionPackageExporter.java:82) at org.apache.sling.distribution.agent.impl.SimpleDistributionAgent.exportPackages(SimpleDistributionAgent.java:214) at org.apache.sling.distribution.agent.impl.SimpleDistributionAgent.execute(SimpleDistributionAgent.java:182) at org.apache.sling.distribution.agent.impl.TriggerAgentRequestHandler.handle(TriggerAgentRequestHandler.java:71) at org.apache.sling.distribution.trigger.impl.ScheduledDistributionTrigger$ScheduledDistribution.run(ScheduledDistributionTrigger.java:134) at org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:118) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException: null