Symptoms
WARN org.apache.jackrabbit.core.query.lucene.DocId$UUIDDocId - Unknown parent node with id ...ERROR ... failed to read bundle: ...
Cause
These problems can have multiple causes. Possible causes (depending on the CRX version) are:
- Concurrent modification using the same session: https://issues.apache.org/jira/browse/JCR-2456
- Moved node disappears after refresh: https://issues.apache.org/jira/browse/JCR-1883
The most common cause is concurrent modification using the same session. In this case, the CRX error.log file most likely contains at least one ConcurrentModificationException.
Analysis, Resolution
These symptoms indicate that the index (and possibly also the workspace) might be inconsistent.
Usually, the impact of an inconsistency is low. One problem is that the log file is filled with error messages due to an inconsistency.
Types of inconsistencies
- Orphaned child
- Error message: NodeState CHILD_UUID references inexistent parent quid PARENT_UUID
- Consistency check message: ConsistencyCheck: Not repairable: Node
CHILD_UUID has unknown parent:
PARENT_UUID (ConsistencyCheck.java, line 116) - Current solution: Use CRX Console (see article)
- Parent referencing inexistent child
- Error message: NodeState PARENT_UUID references inexistent child {}CHILD_NAME with id CHILD_UUID
- Current solution: consistency check/fix should repair such issue
- Child referencing invalid parent
- Error message: ChildNode has invalid parent quid: INVALID_PARENT_UUID (instead of VALID_PARENT_UUID)
- Current solution: Use CRX Console (see article)
- Parent not referencing existing child
- Error message: javax.jcr.ItemNotFoundException: failed to build path of CHILD_UUID: PARENT_UUID has no child entry for CHILD_UUID
- Current solution: Use CRX Console (see article)
- Node exists
- Error message: javax.jcr.ItemExistsException: <node_path> at org.apache.jackrabbit.core.NodeImpl.internalAddChildNode(NodeImpl.java:766)
- Current solution: consistency check/fix should repair such issue
- Search index inconsistency
- Error message: WARN NodeIteratorImpl: Exception retrieving Node with UUID: 003171fe-e2e8-457b-a3af-f74eed12c1b9: javax.jcr.ItemNotFoundException: 003171fe-e2e8-457b-a3af-f74eed12c1b9
- Current solution: Re-create search index or run search index consistency check and fix
- Search Inconsistency
- Error message: "Caused by: javax.jcr.ItemNotFoundException: 59192bc8-ce8b-4ed7-af8e-018f6aa2d496" where "org.apache.jackrabbit.core.query.lucene.RowIteratorImpl$RowImpl.getNode" can be seen in the stack trace.
- Current solution: check this a adhocenable="false" href="#Search_Index_Consistency_Check_and_Fix">search index consistency check and fix
- Version-related inconsistency
- Error message stack shows that it is related to org.apache.jackrabbit.core.version.InternalXAVersionManager code
- Current solution: check this article
- File not found
- Error message: ERROR TarPersistenceManager: Failed to read bundle: [quid]: java.io.IOException: File not found: nnnnn (TarPersistenceManager.java, line 1194)
java.io.IOException: File not found: nnnnn - Current solution: check if you can find that file in a backup (data_nnnnn.tar), or remove the index_*.tar to rebuild it at restart.
- Error message: ERROR TarPersistenceManager: Failed to read bundle: [quid]: java.io.IOException: File not found: nnnnn (TarPersistenceManager.java, line 1194)
Repairing your search index
Search index inconsistencies can happen because of the following:
- Unclean shutdown of the java process during a write operation. This would be a kill -9 in Linux or Unix and a task manager process kill in Windows OS.
- Some older bugs in CRX which have been fixed in the latest CQ and CRX hot fixes (for CRX2.2, 2.3 and 2.4). To prevent these, you should keep up to date on your CRX hot fixes to avoid such issues. If you think you are not on the latest hot fix, then submit a support ticket to Adobe support to request the latest fix.
There are two ways to fix search index inconsistencies:
- Run a search index consistency check and fix
- Re-create your search index
Search Index Consistency Check and Fix
You can run an index consistency check on startup. In the workspace.xml add two parameters in the <SearchIndex class="..."> element:
<param name="enableConsistencyCheck" value="true"/> <param name="forceConsistencyCheck" value="true"/>
A third parameter controls whether errors should be repaired or if they should be logged only:
<param name="autoRepair" value="false"/> <!-- default is true -->
Example
<SearchIndex class="com.day.crx.query.lucene.LuceneHandler"> <param name="path" value="${wsp.home}/index"/> <param name="enableConsistencyCheck" value="true"/> <param name="forceConsistencyCheck" value="true"/> <param name="autoRepair" value="true"/> </SearchIndex>
Re-creating your Search Index
Re-creating your search index will fix all search index inconsistencies. However, it takes considerably longer than the search index consistency check and fix. Due to this, take special care when planning a search index rebuild and if you cannot afford much downtime consider doing a check and fix instead.
Planning your Index Rebuild
Since the time required to rebuild your index depends on many different factors (such as the total number of nodes, asset file sizes, and asset file types) you should test rebuilding out in a non-production environment first. Use the non-production test to calculate how long of an outage window you will need in production. Small repositories in the 2-10GB size can take anywhere from 30 minutes to 6 hours and larger ones can take anywhere from 6 hours to 2 days.
How to Rebuild the Index
To rebuild your search index, do the following:
- Stop CRX (or CQ)
- Backup and delete the following directories in your repository: crx-quickstart/repository/workspaces/crx.default/index/ and crx-quickstart/repository/repository/index/
- Start CRX (or CQ), this will initiate the rebuild. In CRX2.2 monitor the logs/crx/error.log and in CRX2.3 monitor logs/error.log to view progress. You can tell the indexing is done when the CRX instance is accessible.
Workspace Persistence Manager consistency check and fix
If that doesn't help and it prints errors that cannot be repaired, then it is likely that the workspace data is inconsistent.
Persistence managers can check repository consistency and fix problems at startup. To enable consistency checking and automatically fix problems, add the following parameters in the repository.xml and workspace.xml within each <PersistenceManager class="..."> element, and re-start CRX.
In a default CQ5.2.X installation these files can be found under crx-quickstart/repository/workspaces/*/workspace.xml and crx-quickstart/server/runtime/0/_crx/WEB-INF/repository.xml.
In CQ5.3, repository.xml can be found under crx-quickstart/repository/repository.xml instead.
Example
<PersistenceManager class="com.day.crx.persistence.tar.TarPersistenceManager"> <param name="consistencyCheck" value="true" /> <param name="consistencyFix" value="true" /> </PersistenceManager>
Either check/fix (index or persistence manager) will be performed on the next startup. After the consistency check has finished, disable the relevant settings, otherwise the consistency check always runs when starting up CRX.
If only consistencyCheck is enabled, then a file inconsistentBundleIds.txt is created in the workspace directory.
If only consistencyFix is enabled, then the file inconsistentBundleIds.txt is read, and those nodes are fixed. The file is then deleted if everything could be fixed.
Fixing inconsistencies in a cluster
When running a consistency fix in a clustered environment, only run it on one cluster node. Don't start other cluster nodeswhile the consistency fix is running. After it is finished, the othercluster nodes can be started.
Fixing inconsistencies quickly
The consistency check scans all nodes and therefore is slow. To reduce the maintenance time, consider running the consistency check on a copy of the repository, and then just fix the nodes that are corrupt. Use the following steps to do this:
- Copy the repository using the Online Backup feature
- Run the consistency check and fix (as described above) on the copy of the repository, possibly on a different machine
- In the CRX log file, find the UUIDs of the corrupt nodes
- In the original repository, run the consistency check and fix on just those nodes
To limit the consistency check and fix to a list of nodes, add the UUIDs of those nodes in the configuration option "consistencyCheckUUIDs":
Example
<PersistenceManager class="com.day.crx.persistence.tar.TarPersistenceManager"> <param name="consistencyCheck" value="true" /> <param name="consistencyCheckUUIDs" value="ea9cb12f-8a8f-4820-88b1-6d1c496a07cd,741c905c-cfb0-422f-acd4-e0a9cbde46c6" /> <param name="consistencyFix" value="true" /> </PersistenceManager>
Preventing repository corruptions
As of CRX 2.2, you can add this jvm parameter to the CRX/CQ5 startup to prevent corruptions:
-Dorg.apache.jackrabbit.core.state.validatehierarchy=true
If you are using the quickstart CQSE server then you can add the parameter to the JVM_OPTS variable, for example:
crx-quickstart\server\server.bat (Windows):
set JVM_OPTS=-XX:MaxPermSize=256m -Dorg.apache.jackrabbit.core.state.validatehierarchy=true
or crx-quickstart/server/start (Mac & Linux):
export JVM_OPTS=-XX:MaxPermSize=256m -Dorg.apache.jackrabbit.core.state.validatehierarchy=true
Use of the validateheirarchy system property causes a slight degradation in performance of session save operations in the repository. When planning to use this feature, it is suggested that you do load testing beforehand to see what impact it has. It especially applies if you have a write-heavy application.
Make sure to disable tar optimization when the consistency check is planned to run to avoid running tar optimization & consistence check in parallel.
Affected versions
1.X, 2.X