Issue

You see one of these messages in your error.log:

*WARN * BLOBInDataStore: getSize for a586f73ec77fba3135021af01c7f09d972eb3e01 failed (BLOBInDataStore.java, line 95)
org.apache.jackrabbit.core.data.DataStoreException: Record not found: a586f73ec77fba3135021af01c7f09d972eb3e01

org.apache.jackrabbit.core.data.DataStoreException: Failed to read record modified date: /opt/author/crx-quickstart/repository/shared/repository/datastore/6f/b6/14/6fb61425b9187a97f9404cee290cc255177c702e

Caused by: java.io.FileNotFoundException: /opt/author/crx-quickstart/repository/shared/repository/datastore/6f/b6/14/6fb61425b9187a97f9404cee290cc255177c702e
at org.apache.jackrabbit.core.data.LazyFileInputStream.<init>(LazyFileInputStream.java:63)
at org.apache.jackrabbit.core.data.FileDataRecord.getStream(FileDataRecord.java:57)
... 114 more

Solution

This error means that you are missing files from your AEM datastore directory.  See this documentation for more details about what the datastore is and how it works. The loss of datastore files can be due to a failure of datastore garbage collection, a disk space outage, disk, or network share instability. Or, it could be due to a user erroneously deleting files from the server.

To recover the missing files, follow the steps below.

1. Install the Latest CRX hot fix

Before you continue, if you are using CQ5.4 or earlier then download the latest CRX hot fix and install it if your CQ instance is starting up.

You can find the latest hot fixes on Adobe Package Share.

2. Fix "File not found:" Errors

If there are missing tar files as described in this article, then address that issue first.  If there are missing tar files, then you are likely to see some errors such as "File not found:" in your error.log.

For example, here are some errors that were seen during datastore GC due to a missing tar file:

27.02.2012 13:55:01 *WARN * TarSet: File not found: 4650 for entry 6c18a73a-aba1-44b8-93c2-90fb49cac5e5 pos:4650/148692992 length: 669 from /opt/author/crx-quickstart/repository/workspaces/crx.default/copy [4651, 4652, 4653, 4654, 4655, 4656, 4657, 4658, 4659, 4660, 4661, 4662, 4663].
27.02.2012 13:55:38 *INFO * FileDataStore: Deleting old file /opt/author/crx-quickstart/repository/shared/repository/datastore/6f/b6/14/6fb61425b9187a97f9404cee290cc255177c702e modified: 2012-02-23 10:06:58.0 length: 6555 (FileDataStore.java, line 315)
28.02.2012 10:49:46 *WARN * BLOBInDataStore: getSize for 6fb61425b9187a97f9404cee290cc255177c702e failed (BLOBInDataStore.java, line 95)
org.apache.jackrabbit.core.data.DataStoreException: Failed to read record modified date: /opt/author/crx-quickstart/repository/shared/repository/datastore/6f/b6/14/6fb61425b9187a97f9404cee290cc255177c702e

3. Run a DataStore Consistency Check

The only way to get a complete list of all missing files in the datastore is to run a DataStore Consistency Check.  If your AEM instance is able to start up, then follow these steps on the running instance:

  1. Go to CRX Explorer and log in as admin.
    - http://<host>:<port>/crx (CQ5.4/CRX2.2 and earlier)
    - http://<host>:<port>/crx/explorer/index.jsp (CQ5.5/CRX2.3 and later versions)
  2. Click Repository Configuration.
  3. Click Check Repository.
  4. Select Data Store Consistency.
  5. Click Run.
  6. Keep your browser open for the duration of the process. It only outputs messages to the screen when an error is found and the process does not write any messages to the log file.  The error messages report the path of the node where the datastore record was referenced from and the record id of the missing file.
  7. When the process completes, copy the output to a text file consistency_check_output.txt and go on to the next step.

4. Create a list of paths of the missing files

The next step in recovering the missing files is to compile a complete list of the paths of the files that are missing.

1. Search the consistency check output for all occurrences of error "Record not Found."

a. In Linux or Unix: Use this command to output the list of missing files to a file missing_ds_files.txt:

grep "Record not found:" consistency_check_output.txt | grep -Eo "[0-9a-f]{40}" | awk '{ print substr($1, 0,2) "/" substr($1, 3,2) "/" substr($1, 5,2) "/" $1 }' | sort -u > missing_ds_files.txt

** If your AEM instance is not starting up due to the "Record not Found" errors, then search your log files under crx-quickstart/logs for all occurrences of the error "Record not found" instead:

grep "Record not found:" error.log* | grep -Eo "[0-9a-f]{40}" | awk '{ print substr($1, 0,2) "/" substr($1, 3,2) "/" substr($1, 5,2) "/" $1 }' | sort -u > missing_ds_files.txt

If the command worked properly, then the contents of missing_ds_files.txt would look similar to the following:

12/92/04/129204a6dd0ce2cd5ca19c721b6f52ee2b3630e2
9f/d8/38/9fd8386d20cf55e7e0024e18d0c7d4e8400454ee
7a/13/15/7a1315788f45dafd6630454f04183601682a9f80
28/37/d2/2837d24aed3ff223cd40e90222226c4ef2e2a0c6

  b. In Windows:  Use a text editor such as Textpad or Notepad++ to find all occurrences of "Record not found."  Then, after finding all such occurrences, extract the filenames using a macro, writing a script or by manually copying and constructing the filenames in a new text file.

DataStore file paths are constructed from the record name in this format:
{first two chars of record id}/{second two chars of record id}/{third two chars of record id}/{record id}

For example, in this error:

error.log: org.apache.jackrabbit.core.data.DataStoreException: Record not found: 129204a6dd0ce2cd5ca19c721b6f52ee2b3630e2

the record id is 129204a6dd0ce2cd5ca19c721b6f52ee2b3630e2 and the file path is 12/92/04/129204a6dd0ce2cd5ca19c721b6f52ee2b3630e2

5. Recover the missing files

Now use the output from the last step to hunt down the same files in other AEM (CQ) instances in your environment.  Since the datastore files are stored uniquely, you can copy them from other AEM instances in your environment.

If you cannot find some of the files in other instances, then search your backups and restore them from there if possible.

In Linux, you could log in to each of the working AEM instances and use a command like rsync to copy over any of the missing files that exist in them.  For example, by running it from the datastore directory on the server that has the missing files:

rsync -avR --files-from=missing_ds_files.txt . user@hostname-of-server-missing-files:/path/to/crx-quickstart/repository/repository/datastore/

The command run an rsync that copies over any files listed in missing_ds_files.txt that exist in the server. 

6. Clean up unrecoverable file references

If you were not able to recover some of the files from backup or from other AEM (CQ) instances then clean up or fix the bad datastore references.  Rerun the DataStore Consistency Check as we ran it in step 4.  You get a current list of missing files.

Review each of the node paths listed which are referencing missing datastore files.  Review any missing DAM assets or files uploaded to pages with your user.  Have them reupload any missing ones they need.  Any they don't need can safely be deleted via the AEM user interface.  If anything is missing from under /var/audit or /var/eventing can safely be deleted.  For any files you are not sure about, then go here and contact the AEM support team for assistance.

Applies to

CRX 2.2, 2.3

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License  Twitter™ and Facebook posts are not covered under the terms of Creative Commons.

Legal Notices   |   Online Privacy Policy