You can reduce the time required to search a long PDF by embedding an index of the words in the document. Acrobat can search the index much faster than it can search the document. The embedded index is included in distributed or shared copies of the PDF. Users search PDFs with embedded indexes exactly as they search those without embedded indexes; no extra steps are required.
You can define a specific group of PDFs as a catalog and create a unified index for that entire collection of documents. When users search the cataloged PDFs for specific information, the index makes the search process much faster.
When you distribute the collection on a CD, you can include the index with the PDFs.
You can catalog documents written in Roman, Chinese, Japanese, or Korean characters. The items you can catalog include the document text, comments, bookmarks, form fields, tags, object and document metadata, attachments, document information, digital signatures, image XIF (extended image file format) metadata, and custom document properties.
Begin by creating a folder to contain the PDFs you want to index. All PDFs should be complete in both content and electronic features, such as links, bookmarks, and form fields. If the files to be indexed include scanned documents, make sure that the text is searchable. Break long documents into smaller, chapter-sized files, to improve search performance. You can also add information to a file’s document properties to improve the file’s searchability.
Before you index a document collection, it’s essential that you set up the document structure on the disk drive or network server volume and verify cross-platform filenames. Filenames may become truncated and hard to retrieve in a cross-platform search. To prevent this problem, consider these guidelines:
Rename files, folders, and indexes using the MS-DOS file-naming convention (eight characters or fewer followed by a three-character filename extension), particularly if you plan to deliver the document collection and index on an ISO 9660-formatted CD-ROM disc.
Remove extended characters, such as accented characters and non-English characters, from file and folder names. (The font used by the Catalog feature does not support character codes 133 through 159.)
Don’t use deeply nested folders or path names that exceed 256 characters for indexes that will be searched by Mac OS users.
If you use Mac OS with an OS/2 LAN server, configure IBM®LAN Server Macintosh (LSM) to enforce MS-DOS file-naming conventions, or index only FAT (File Allocation Table) volumes. (HPFS [High Performance File System] volumes may contain long unretrievable filenames.)
If the document structure includes subfolders that you don’t want indexed, you can exclude them during the indexing process.
To make a PDF easier to search, you can add file information, called metadata, to the document properties. (You can see the properties for the currently open PDF by choosing File > Properties, and clicking the Description tab.)
(Windows) You can also enter and read the data properties information from the desktop. Right-click the document in Windows Explorer, choose Properties, and click the PDF tab. Any information you type or edit in this dialog box also appears in the Document Properties Description when you open the file.
Use a good descriptive title in the Title field. The filename of the document should appear in the Search Results dialog box.
Always use the same option (field) for similar information. For example, don’t add an important term to the Subject option for some documents and to the Keywords option for others.
Use a single, consistent term for the same information. For example, don’t use biology for some documents and life sciences for others.
Use the Author option to identify the group responsible for the document. For example, the author of a hiring policy document might be the Human Resources department.
If you use document part numbers, add them as keywords. For example, adding doc#=m234 in Keywords could indicate a specific document in a series of several hundred documents on a particular subject.
Use the Subject or Keywords option, either alone or together, to categorize documents by type. For example, you might use status report as a Subject entry and monthly or weekly as a Keywords entry for a single document.
If you already have specialized training in Adobe PDF, you can define custom data fields, such asDocument Type, Document Number, and Document Identifier, when you create the index. This is recommended only for advanced users and is not covered in AcrobatComplete Help.
When you build a new index, Acrobat creates a file with the .pdx extension and a new support folder, which contains one or more files with .idx extensions. The IDX files contain the index entries. All of these files must be available to users who want to search the index.
If you don't see the Document Processing panel, see the instructions for adding panels at Task panes.
Under Include These Directories, click Add, select a folder containing some or all of the PDF files to be indexed, and click OK. To add more folders, repeat this step.
Any folder nested under an included folder will also be included in the indexing process. You can add folders from multiple servers or disk drives, as long as you do not plan to move the index or any items in the document collection.
If you stop the indexing, you cannot resume the same indexing session but you don’t have to redo the work. The options and folder selections remain intact. You can click Open Index select the partially finished index, and revise it.
If long path names are truncated in the Include These Directories And Exclude These Subdirectories options, hold the pointer over each ellipsis (...) until a tool tip appears, displaying the complete path of the included or excluded folder.
Do Not Include Numbers
Select this option to exclude all numbers that appear in the document text from the index. Excluding numbers can significantly reduce the size of an index, making searches faster.
Add IDs To Adobe PDF v1.0 Files
Select this option if your collection includes PDFs created before Acrobat 2.0, which did not automatically add identification numbers. ID numbers are needed when long Mac OS filenames are shortened as they are translated into MS-DOS filenames. Acrobat 2.0 and later versions automatically add identifiers.
Do Not Warn For Changed Documents When Searching
When this option is not selected, a message appears when you search documents that have changed since the most recent index build.
Use this option to include custom document properties in the index; only custom document properties that already exist in the PDFs you index are indexed. Type the property, make a selection from the Type menu, and then click Add. These properties appear as a search option in the Search PDF window's additional criteria pop-up menus when you search the resulting index. For example, if you enter the custom property Document Name and choose the string property from the Type menu, a user searching the index can then search within the custom property by selecting Document Name from the Use These Additional Criteria menu.
When you create custom fields in a Microsoft Office application in which the Convert Document Information option is selected in the PDFMaker application, the fields transfer over to any PDFs you create.
Use this option to include custom XMP fields. The custom XMP fields are indexed and appear in the additional criteria pop-up menus to be searchable in the selected indexes.
Use to exclude specific words (500 maximum) from the index search results. Type the word, click Add, and repeat as needed. Excluding words can make the index 10% to 15% smaller. A stop word can contain up to 128 characters and is case sensitive.
To prevent users from trying to search phrases that contain these words, list words that aren’t indexed in the Catalog Read Me file.
Use this option to make specific leaf-element tag nodes searchable in documents that have a tagged logical structure.
The Custom Properties, Stop Words, and Tags settings apply to the current index only. To apply these settings globally to any index you create, you can change the default settings for custom fields, stop words, and tags in the Catalog panel of the Preferences dialog box.
It is often a good idea to create a separate ReadMe file and put it in the folder with the index. This ReadMe file can give people details about your index, such as:
The kind of documents indexed.
The search options supported.
The person to contact or a phone number to call with questions.
A list of numbers or words that are excluded from the index.
A list of the folders containing documents included in a LAN-based index, or a list of the documents included in a disk-based index. You might also include a brief description of the contents of each folder or document.
A list of the values for each document if you assign Document Info field values.
If a catalog has an especially large number of documents, consider including a table that shows the values assigned to each document. The table can be part of your ReadMe file or a separate document. While you are developing the index, you can use the table to maintain consistency.
In the Index Definition dialog box, make any changes you want, and then click the function you want Acrobat to perform:
Creates a new IDX file with the existing information, and updates it by adding new entries and marking changed or outdated entries as invalid. If you make a large number of changes, or use this option repeatedly instead of creating a new index, search times may increase.
You can set preferences for indexing that apply globally to all subsequent indexes you build. You can override some of these preferences for an individual index by selecting new options during the index-building process.
In the Preferences dialog box under Categories, select Catalog. Many of the options are identical to those described for the index-building process.
The Force ISO 9660 Compatibility On Folders option is useful when you don’t want to change long PDF filenames to MS‑DOS filenames as you prepare documents for indexing. However, you must still use MS‑DOS file-naming conventions for the folder names (8 characters or fewer) even though this isn’t necessary for the filenames.
Use the Catalog feature and a catalog batch PDX file (.bpdx) to schedule when and how often to automatically build, rebuild, update, and purge an index. A BPDX file is a text file that contains a list of platform-dependent catalog index file paths and flags. You use a scheduling application, such as Windows Scheduler, to display the BPDX file in Acrobat. Acrobat then re-creates the index according to the flags in the BPDX file.
For more information on scheduling an indexing update, search for BPDX at www.adobe.com/support.
To use BPDX files, in the Preferences dialog box under Catalog, selectAllow Catalog Batch Files (.bpdx) To Be Run.
You can develop and test an indexed document collection on a local hard drive and then move the finished document collection to a network server or disk. An index definition contains relative paths between the index definition file (PDX) and the folders containing the indexed documents. If these relative paths are unchanged, you don’t have to rebuild the index after moving the indexed document collection. If the PDX file and the folders containing the indexed documents are in the same folder, you can maintain the relative path simply by moving that folder.
If the relative path changes, you must create a new index after you move the indexed document collection. However, you can still use the original PDX file. To use the original PDX file, first move the indexed documents. Then copy the PDX file to the folder where you want to create the new index, and edit the include and exclude lists of directories and subdirectories, as necessary.
If the index resides on a drive or server volume separate from any part of the collection it applies to, moving either the collection or the index breaks the index. If you intend to move a document collection either to another network location or onto a CD, create and build the index in the same location as the collection.