Search returns partial results from a large document

Issue

You have many large documents (PDFs, Word documents, and so on) uploaded to your instance. CQ indexes these documents using Lucene. However, the search returns all the documents.

Solution

Increase maxFieldLength (crx: /config/repository/search/fulltexthandler.xml, contentbus: /config/repository/search/fulltexthandler.xml). Then, reindex.

Additional information

Documents are indexed partially since maxFieldLength for your Lucene indexer is not large enough to contain the whole index. Here is the definition:

The attribute maxFieldLength sets the maximum number of terms Lucene indexes per document. For a website, this value is typically sufficient. If you have large text files with more than 20,000 words, you could have to increase the value. To check, see if your search finds words that occur only at the end of your larger documents. Increasing the value increases memory consumption.