Show TOC

 Resource FiltersLocate this document in the navigation structure

Use

You use resource filters to control the scope of the results of a crawling process.

 

Integration

You specify resource filters as result or scope filters in the configuration of crawler parameter sets (see Crawlers and Crawler Parameters ).

 

Features

Resource filters that are used as scope filters can be defined using the following parameters:

Parameters of a Scope Filter

Parameter Mandatory Description

Name

Yes

Name of the resource filter.

Case Sensitive

No

Specifies whether the system differentiates between uppercase and lowercase.

This parameter affects documents, folders, and HTML pages.

Access Path Mode

Yes

Specifies whether the specifications in the Access Path Patterns parameter are included in the results or filtered out.

exclude: The specified access path patterns are not included in the results.

include: Only the specified access path patterns are included in the results.

This parameter affects documents, folders, and HTML pages.

Access Path Patterns

No

Comma-separated list of access paths.

You can use placeholders for this entry (see Using Placeholders ).

You may not specify file names here.

Example: /mydocuments

URL (Content Link) Mode

Yes

Specifies whether the specifications in the URL Regular Expression parameter are included in the results or filtered out.

This parameter is valid for the URL stored for a document in the Content Link property.

The Content Link property describes the URL that the crawler uses to call the document.

exclude: The specified URL expressions are not included in the results.

include: Only the specified URL expressions are included in the results.

These specifications are applied to documents and HTML pages.

URL Regular Expression

No

Specifies a regular expression to be compared to the URL stored in the Content Link property of a document.

The regular expression must contain the complete URL.

 

Resource filters that are used as result filters can be defined using the following parameters:

Parameters of a Result Filter

Parameter Mandatory Description

Name

Yes

Name of the resource filter.

Include Documents/Web-Pages

No

Specifies whether documents or HTML pages are crawled.

Include Folders

No

Specifies whether folders are crawled.

Include Links

No

Specifies whether links in a hierarchical structure are crawled.

Note that this parameter cannot be used for crawling Web repositories.

Case Sensitive

No

Specifies whether the system differentiates between uppercase and lowercase.

This parameter affects documents, folders, and HTML pages.

Item ID Mode

Yes

Specifies whether the specifications in the Item ID Patterns parameter are included in the results or filtered out.

exclude: The specified item ID patterns are not included in the results.

include: Only the specified item ID patterns are included in the results.

This parameter is only applied to documents and HTML pages in both cases.

Item ID Patterns

No

Comma-separated list of file names.

You can use the placeholders * and ? here.

Example: *.zip, file??.xml

Mime Type Mode

Yes

Specifies whether the specifications in the parameter Mime Type Patterns are included in the results or filtered out.

exclude: The specified MIME type patterns are not included in the results.

include: Only the specified MIME type patterns are included in the results.

This parameter is only applied to documents and HTML pages in both cases.

Mime Type Patterns

No

Comma-separated list of MIME type patterns.

You can use placeholders for this entry (see Using Placeholders ).

Example: text/*

Minimum Content Size

No

Specifies the minimum size of documents to be crawled.

Note that it takes a certain amount of time to determine the file size of each document.

This parameter is only applied to documents and HTML pages.

Maximum Content Size

No

Specifies the maximum size of documents to be crawled.

Note that it takes a certain amount of time to determine the file size of each document.

Enter 0 for unlimited size.

This parameter is only applied to documents and HTML pages.

Maximum Age of Last Modification

No

Specifies the time in days within which the last change to a document must have taken place in order for that document to be crawled.

Enter 0 for no time limit.

This parameter is only applied to documents and HTML pages.

 

Activities

To create a resource filter, choose Content Management → Global Services → Resource Filters. Then select the resource filter in a set of crawler parameters when required.

 

Examples

Example 1

You want to exclude all files that end in .HTML_banner from indexing when indexing a repository. Configure a results filter with the following specifications:

 

Name = myresourcefilterInclude Documents/Web-Pages = activeInclude Folders = activeInclude Links = activeItem ID Mode = excludeItem ID Patterns = *.HTML_bannerMime Type Mode = exclude

 

Then select this results filter in a set of crawler parameters.

 

Example 2

To exclude all documents in folders with names ending in /_vti_cnf from indexing, configure a scope filter with the following specification:

Access Path Pattern = **/_vti_cnf