Crawler Service

Use

The crawler service allows crawlers to collect resources located in internal or external repositories, for example, for indexing purposes. A crawler returns the resources and the hierarchical or net-like structures of the respective repositories.

Services and applications that need repositories to be crawled (for example, the index management service) request a crawler from the crawler service.

Integration

The crawler service is a prerequisite for the following services.

It is a generic service that can be used by any CM service or application.

Features

You can use parameters to influence the behavior of a particular crawler (see Crawlers and Crawler Parameters ). In the configuration of the crawler service, you specify a set of crawler parameters that are used by default for index management tasks. You can also specify how many crawlers should run in parallel.

Crawler Service Parameters

Parameter	Required	Description
Maximum Number of Parallel Running Crawlers	No	Number of crawlers that run in parallel. Depending on the number specified here, the system starts and uses a separate crawler for each data source for an index. However, it can make sense to limit this number to restrict the load generated on the portal node that the index service task queue reader scheduler task is assigned to, on the database, and on the backend systems being crawled when the crawlers run in parallel. No entry or 0 indicates no restriction.
Default Crawler Parameters	Yes	Specifies a set of crawler parameters that are used by default if no other set is defined. The default setting is the standard set. You can also use this for delta crawling.

Parameter

Required

Description

Maximum Number of Parallel Running Crawlers

No

Number of crawlers that run in parallel.

Depending on the number specified here, the system starts and uses a separate crawler for each data source for an index.

However, it can make sense to limit this number to restrict the load generated on the portal node that the index service task queue reader scheduler task is assigned to, on the database, and on the backend systems being crawled when the crawlers run in parallel.

No entry or 0 indicates no restriction.

Default Crawler Parameters

Yes

Specifies a set of crawler parameters that are used by default if no other set is defined.

The default setting is the standard set. You can also use this for delta crawling.

Activities

The crawler service is preconfigured and activated in the standard KM configuration. Normally, you do not need to change its configuration. To call up the configuration, choose Content Management → Global Services → Crawler Service.