Filtering Documents To Be Indexed

Use

If you want to index only part of the content of the assigned data sources, you can use the document ID or the MIME type to filter the documents to be indexed. In this way the documents are filtered by their file name, and not by their content.

You can use patterns to identify quantities of document IDs or MIME types that you want to include in or exclude from the index (white-list or black-list principle).

Procedure

When you create an index, you enter one or more of the following custom properties:

Action	Custom Property	Value
Include document IDs	includeIdPattern	Comma-separated list of patterns for document IDs (see below)
Include MIME types	includeMimePattern	Comma-separated list of patterns for MIME types (see below)
Exclude document IDs	excludeIdPattern	Comma-separated list of patterns for document IDs (see below)
Exclude MIME types	excludeMimePattern	Comma-separated list of patterns for MIME types (see below)

You can define a pattern as follows:

Exact ID or exact MIME type
Prefix* : All documents whose ID or MIME type starts with Prefix
*Suffix : All documents whose ID or MIME type ends with Suffix

Example

Comma-Separated List of Patterns for Document IDs : *.doc,test.txt,today*

This example defines all documents that end with .doc , contain the ID test.txt , or whose ID starts with today .

Example

Example of a Comma-Separated List of Patterns for MIME Types : application/pdf,text*

This example defines all documents that have the MIME type application/pdf or whose MIME type starts with text , for example, text/html .

Combining More Than One Filter

You can combine all four filters. Note the following rules:

If no filter is set, the system indexes the content.
If an exclude filter fits, the system does not index the content.
If there is no exclude filter that fits and no include filter is set, the system indexes the content.
If there is no exclude filter that fits, at least one include filter is set, and there is an include filter that fits, the system indexes the content.
Otherwise the system does not index the content.

Ignoring Uppercase and Lowercase

By default, the system distinguishes between uppercase and lowercase for IDs and MIME types. To define a pattern that fits for both uppercase and lowercase, add the suffix (ci) - c ase i nsensitive to the definition of the pattern.

Example

*.doc(ci),test.txt,today*