AEM 6.1, 6.2, 6.3, 6.4

How to use custom tika configuration to disable full text search based on a file's mime type in AEM

Adobe recommends disabling full text search for binary files via the tika index.  This recommendation is part of Asset Performance Tuning Helpx article.

Some common mime types to consider: mp4, pdf, zip. 

Method 1

1. Install the package provided. 

2. Through CRX/DE browse to the locations below:


3. Add the file mime type that needs to be disabled: 


4. Click Save All

5. Using CRX/DE, set this Boolean property refresh=true on these nodes and save:

6. Wait for the changes to take effect, test by searching for assets of the mime type added.


Method 2

1. In the AEM Web Console search for 'oak-lucene'.  Note the bundle number.  

2. Shutdown the AEM instance.  

3. Browse to /crx-quickstart/launchpad/felix/bundlexxx directory.  

4. cd to the subdirectory with versionX.Y in the name (e.g. felix/bundle102/version0.2):
cd version*

5. Extract the contents of the tika-config.xml file from the jar file:
jar -xvf bundle.jar org/apache/jackrabbit/oak/plugins/index/lucene/tika-config.xml

6. Edit file tika-config.xml

vi org/apache/jackrabbit/oak/plugins/index/lucene/tika-config.xml

For example, add the file mime type that needs to be disabled: 


7. Save the changes to the bundle.jar. 
jar -uvf bundle.jar org/apache/jackrabbit/oak/plugins/index/lucene/tika-config.xml

8. Restart AEM instance and test by searching for assets of the mime type added.