Symptoms for index directory growth

The growing index directory will be due to

Case1:- Content is added or modified continuously in the workspace. This operation likely results in creating a new segment (the folders in the index folder) with new documents in it.

Case2:- The segment created above needs to be merged. This process occurs asynchronously and triggered by modifications to the repository. The segment merge will run until it completes and still run when no more changes are performed.

How to distinguish the above two cases

  • The new index folders and Tar PM created at a constant rate for Case1.
  • At the same time new index folders/segments are eventually merged into bigger ones indicates Case2. Most of the index merges are quick because the resulting index segment is not that big. But every once in a while larger index segments are merged together, which will take some time and also consume more disk space. An index merge is done in the background and will notice activity even though there may be no changes done to the repository. From the file system single folder/segment keep growing. That single file is the target segment of the merge.

Notă:

 

  • In case repository was shutdown before segment merge is complete. On next start of the instance the repository will start a segment merge from scratch.
  • Index merges are logged at INFO level in crx error.log like [1]. Grep from the logs "IndexMerger" and sort by the number to find number of documents that were merged.

    [1] *INFO * IndexMerger: merged 250 documents in 724 ms into _21f. (IndexMerger.java,.....

Disk space considerations

Merging index segments may temporarily use up to three times the initial index size. As an example

  1. Let us say 10 index segments each with a size of 1 GB. 
  2. During index merger process create a new segment by taking the index contents of the 10 existing segments. the resulting segment will then be up to 10 GB. it may be smaller because the merge process will not copy over nodes that were marked deleted in the 10 source segments.
  3. In the next stage the new segment is copied into a compound file format to reduce the number of file handles needed to access the index. this again requires about the same size on disk. in our example here again 10 GB.

Summation of above three now results in the new index size of approximately 30 GB of disk space. However in a final step the old index segments and the non-compound files are deleted & reduce the disk usage shrinks to 10 GB.

Controlling index merges

  • By default the maximum number of nodes in segments that will be merged is Integer.MAX_VALUE.  
  • To limit the temporary disk usage during segment merges use configuration parameter "maxMergeDocs" inside the SearchIndex element in the workspace.xml file. The parameter "maxMergeDocs" configures the maximum number of documents that should be merged together into a single segment.
  • The prefered value of maxMergeDocs deponds on repository. Analyze the existing index segment files (eg using Luke) to find out how many documents are in an index segment.

Reducing index size

Reduce number of nodes by removing nodes that are not required like completed workflow instances, audit log etc.. The article that helps are

  • https://helpx.adobe.com/cq/kb/howtopurgewf.html
  • https://helpx.adobe.com/cq/kb/how-to-optimize-lucene-index-to-gain-efficiency.html
  • https://helpx.adobe.com/communique/kb/Stopwordlist.html

Additionally re-indexing the workspace might also reduce the index size because the index will not free disk space immediately when a node is deleted. This only happens when the affected index segment that contained the node is merged.

Această lucrare este oferită sub licență Atribuire-Necomercial-FărăModificări 3.0 Ne-adaptată Creative Commons  Postările pe Twitter™ şi Facebook nu sunt acoperite de condiţiile de licenţiere Creative Commons.

Prevederi legale   |   Politică de confidențialitate online