Crawler Monitor (SAP Library - Knowledge Management)

Crawler Monitor

Use

You can use the crawler monitor to monitor and control the activity of crawlers.

Integration

Crawlers are executed on the system of a load-balancing environment to which the index service task queue reader is assigned. The system ID for this system is displayed in the detailed view of a crawler task.

Features

Every crawler carries out a crawling process on the server. The crawler monitor displays a list of these crawling tasks.

You can call up information about the crawling task while it is running or after it has finished.

You can switch between three different views: Overview, Delivered, and Statistics (see below).

You can display all active, all suspended, and all previous crawler tasks. You can also select the crawling tasks for the last hour, the previous day, or the last week.

You can sort the list of crawling tasks according to different criteria. In the sort field in the upper right-hand corner, select the sort criteria you want. The arrow to the right of the sort field shows if the list is sorted in ascending or descending order. You can reverse the sort order by clicking on this arrow.

Note

Note that the crawler monitor always displays the most recent run of a crawling task. For information about previous runs, see the application log.

Overview View

This view displays the current statistics for the crawling tasks.

Name	Description
Task	Name of crawling task. The name consists of the index ID and the repository name. If more than one data source is assigned to an index, a crawling task is generated for each data source. To call up detailed information about a crawling task, click the name.
Starting Point	Information about the data source that the crawling task is processing. To open the data source, click the link.
Status	Current status of the crawler. Inactive: Process is not yet active. Starting: Process is starting. Running: Process is running. Suspending: Process is interrupted. Suspended: Process was stopped manually and can be continued by clicking Reactivate. Continuing: Process is continued. Postprocessing: The objects are undergoing postprocessing. Stopping: Process has completed its activities and is now stopping. Completed: Process has been completed. Failed: Process has failed. Stopped: Process has been stopped manually Waiting: Process is ready to start and is waiting until the number of processes running concurrently is below the specified value. You specify this number in the configuration of the crawler service.
Elapsed Time	Time that has elapsed since the crawler started in hours, minutes, and seconds (including intentional interruptions). Note that a brief period of time can elapse until the crawler starts.
Delivered	Number of documents and folders that are transferred to TREX or other applications for further processing.
Incremental	Specifies whether the task is an incremental update.
Errors	Number of errors that have occurred.
Processing Average (ms)	Average processing time per object in milliseconds. This information comprises the time between calling up the objects and passing them on. The time required for database operations is not included in this information.

Provided View

This view displays the current information for the objects delivered.

Name	Description
Task	Name of the crawling task (for the description, see the Overview View section).
Status	Current status of the crawling task (for the description, see the Overview View section).
Processed	Number of documents that the crawler has processed. This value does not have to match the number of documents and folders delivered, since no filter has been applied at this point.
Provided	Number of documents that the crawler has processed and now made available for TREX or other applications.
New	Number of new documents for an incremental update
Changed	Number of changed documents for an incremental update
Deleted	Number of deleted documents for an incremental update

Statistics View

This view displays the current statistics for the crawling tasks.

Name	Description
Task	Name of the crawling task (for the description, see the Overview View section).
Status	Current status of the crawling task (for the description, see the Overview View section).
Delivered	Number of documents and folders delivered.
Processing Errors	Number of errors that occurred during processing.
Retrieving Errors	Number of errors that occurred when retrieving objects.
Providing Errors	Number of errors that occurred when transferring objects.
Bad Links	Number of incorrect links.
Filtered	Number of documents that have been filtered.
Retrieving Time	In hours, seconds, and minutes.
Providing Time	In hours, seconds, and minutes.
Retrieving Average (ms)	Average time for retrieving a document in milliseconds.
Providing Average (ms)	Average time for passing on a processed document in milliseconds.

Note that crawlers that are used by the content exchange service or the subscription service are not visible in the crawler monitor all the time. If a restart of the portal has interrupted your crawling tasks, they are continued at the next time that is entered in the corresponding scheduler tasks.

If a restart of the portal has interrupted the subscription service’s crawling tasks, they are restarted at the next time that is entered in the corresponding scheduler tasks.

Detailed Information About the Crawling Task

When you click the name of a crawling task, further information about the crawling task selected appears in a new window. This information is presented in various groups. Here you can also call up the log files for the crawler, if available.

To refresh the display, choose Refresh. You can also arrange for the window to be updated automatically. To do this, choose your required interval from the selection box Automatic Update.

To display information about documents that the selected crawler is currently accessing, choose On in the Display Documents selection list.

Note

If the document display does not change within a few minutes and after repeated refreshes, check the data sources that the crawler is accessing. For example, a Web server may have slowed down or frozen due to a high load.

Activities

To call up the Crawler Monitor, choose System Administration ® Monitoring ® Knowledge Management ® Crawler Monitor.

You can use the following functions:

Function	Description
Suspend	This function halts the selected crawler tasks. Each crawler notes the position at which it was halted and can continue from that position later.
Resume	You can use this function to continue the activity of suspended and failed crawling tasks from the position at which they terminated or were suspended. This function does not take into account changes to the resource filter configurations. To do this, use the Resume with New Filters function.
Resume with New Filters	If you have changed the configuration of resource filters while a crawling process was running, you can suspend the affected crawling task and use this function to continue the task using the changed resource filters. The new resource filters are not applied retrospectively to documents that have already been crawled.
Stop	This function stops the selected crawling tasks. You cannot continue stopped crawling tasks. However, you can restart them using the Reindex function or the Incremental Update function in Index Administration.
Recrawl Errors	You can use this function to recrawl documents that caused errors in previous crawling runs. These errors can include: ● Timeouts accessing a document on a Web server ● Authentication problems accessing a document Once you have removed the cause of these errors, for example, by restarting the Web server or correcting the user assignment, you can choose this function. The system does not perform a complete or incremental crawling run, it recrawls only the documents with errors.
Delete	This function removes the selected crawling tasks from the list. The crawler must be stopped first. Note that deleting a larger number of documents can take several minutes. You can restart this function by choosing the Reindex function in Index Administration.

The chosen functions are started after a short time delay.