com.sapportals.wcm.service.xcrawler

Interface IXCrawlerService


public interface IXCrawlerService

Global service for crawling repositories.

Changes between major releases 7.0 and 7.X:
Added methods:

  • 'public void clearSurvivesRestart(String)' has been added to an interface
  • 'public IXCrawlerParameters createCrawlerParameters(int, int, int, boolean, boolean, boolean, boolean, boolean, boolean, boolean, boolean, boolean, long, IXCrawlerParameters$ModificationCheckMode, boolean, boolean, boolean, boolean, boolean, IResourceFilter[], IResourceFilter[], long, int, String, IXCrawlerParameters$LogLevel, long)'
  • 'public IXCrawlerParameters createCrawlerParameters(int, int, int, boolean, boolean, boolean, boolean, boolean, boolean, boolean, boolean, long, IXCrawlerParameters$ModificationCheckMode, boolean, boolean, boolean, boolean, boolean, IResourceFilter[], IResourceFilter[], long, int, String, IXCrawlerParameters$LogLevel, long)'
  • 'public RID[] getFailedResourcesOfCrawler(String, int)'

  • 'public RID[] getFailedResourcesOfEvents(String, int)'

  • 'public int getNumberOfFailedResourcesOfEvents(String)'

  • 'public boolean getSurvivesRestart(String)'

  • 'public void reportDeletedResource(String, RID, int)'

  • 'public void reportFailedResource(String, RID, RID, int)'

  • Copyright (c) SAP AG 2004


    Field Summary
    static int MAX_TASK_DISPLAY_NAME_LENGTH
               
    static int MAX_TASK_ID_LENGTH
               
    static int MAX_USER_DATA_LENGTH
               
     
    Method Summary
     void clearSurvivesRestart(String taskID)
              Set the survive restart flag of a crawler task to 0, i.e.
     IXCrawlerParameters createCrawlerParameters(int maxDepth, int retrieverCount, int providerCount, boolean useETag, boolean useChecksum, boolean useACL, boolean followLinks, boolean followRedirects, boolean crawlVersions, boolean crawlVariants, boolean crawlHidden, boolean crawlSystem, long requestDelayInMilliseconds, IXCrawlerParameters.ModificationCheckMode modificationCheckMode, boolean findAllDocsInDepth, boolean respectRobots, boolean respectNoIndex, boolean respectNoFollow, boolean test, IResourceFilter[] scopeFilters, IResourceFilter[] resultFilters, long maxLogFileSizeInBytes, int maxBacklogFiles, String logFilePath, IXCrawlerParameters.LogLevel maxLogLevel, long documentTimeoutInSeconds)
              Create crawler parameters.
     IXCrawlerParameters createCrawlerParameters(int maxDepth, int retrieverCount, int providerCount, boolean useETag, boolean useChecksum, boolean followLinks, boolean followRedirects, boolean crawlVersions, boolean crawlVariants, boolean crawlHidden, boolean crawlSystem, long requestDelayInMilliseconds, IXCrawlerParameters.ModificationCheckMode modificationCheckMode, boolean findAllDocsInDepth, boolean respectRobots, boolean respectNoIndex, boolean respectNoFollow, boolean test, IResourceFilter[] scopeFilters, IResourceFilter[] resultFilters, long maxLogFileSizeInBytes, int maxBacklogFiles, String logFilePath, IXCrawlerParameters.LogLevel maxLogLevel, long documentTimeoutInSeconds)
              Create crawler parameters.
     IXCrawlerParameters createCrawlerParameters(int maxDepth, int retrieverCount, int providerCount, boolean useETag, boolean useChecksum, boolean followLinks, boolean followRedirects, boolean crawlVersions, boolean crawlHidden, boolean crawlSystem, long requestDelayInMilliseconds, IXCrawlerParameters.ModificationCheckMode modificationCheckMode, boolean findAllDocsInDepth, boolean respectRobots, boolean respectNoIndex, boolean test, IResourceFilter[] scopeFilters, IResourceFilter[] resultFilters, long maxLogFileSizeInBytes, int maxBacklogFiles, String logFilePath, IXCrawlerParameters.LogLevel maxLogLevel, long documentTimeoutInSeconds)
              Create crawler parameters.
     IXCrawlerParameters createCrawlerParameters(int maxDepth, int retrieverCount, int providerCount, boolean useETag, boolean useChecksum, boolean followLinks, boolean followRedirects, boolean crawlVersions, boolean crawlHidden, boolean crawlSystem, long requestDelayInMilliseconds, IXCrawlerParameters.ModificationCheckMode modificationCheckMode, boolean findAllDocsInDepth, boolean respectRobots, boolean test, IResourceFilter[] scopeFilters, IResourceFilter[] resultFilters, long maxLogFileSizeInBytes, int maxBacklogFiles, String logFilePath, IXCrawlerParameters.LogLevel maxLogLevel, long documentTimeoutInSeconds)
              Create crawler parameters.
     IXCrawlerParameters createCrawlerParameters(int maxDepth, int retrieverCount, int providerCount, boolean useETag, boolean useChecksum, boolean followLinks, boolean crawlVersions, boolean crawlHidden, boolean crawlSystem, long requestDelayInMilliseconds, IXCrawlerParameters.ModificationCheckMode modificationCheckMode, boolean findAllDocsInDepth, boolean respectRobots, boolean test, IResourceFilter[] scopeFilters, IResourceFilter[] resultFilters, long maxLogFileSizeInBytes, int maxBacklogFiles, String logFilePath, IXCrawlerParameters.LogLevel maxLogLevel, long documentTimeoutInSeconds)
              Create crawler parameters.
     IXCrawlerParameters createCrawlerParameters(String parameterName)
              Create crawler parameters from a configurable in the configuration plugin /cm/services/xcrawlers.
     void deleteCrawlerTask(String taskID)
              Delete a crawler task.
     String[] getCrawlerParameterNames()
              Get the names of the available crawler parameters.
     IXCrawlerTaskSummary[] getCrawlerTaskSummaries()
              Get the state summaries of all crawler tasks.
     IXCrawlerTaskSummary getCrawlerTaskSummary(String taskID)
              Get the state summary of a crawler task.
     String getDefaultCrawlerParameterName()
              Get the name of the default crawler parameters.
     RID[] getFailedResourcesOfCrawler(String taskID, int max)
              Get resources that caused errors in the last (or current) run of a crawler task
    Added in 7.X
     RID[] getFailedResourcesOfEvents(String taskID, int max)
              Get resources that have been reported as failed by calls of reportFailedResource()
    Added in 7.X
     int getNumberOfFailedResourcesOfEvents(String taskID)
              Get the number of resources that have been reported as failed by calls of reportFailedResource()
    Added in 7.X
     boolean getSurvivesRestart(String taskID)
              Get the survive restart flag of a crawler task.
     boolean isFiltered(IResource resource, IXCrawlerParameters parameters, RID crawlStartPath)
              Check, if a resource would be filtered out during a crawl with specific crawler parameters
     boolean isRunning(String taskID)
              Check, if a crawler task is running for the specified taskID.
     boolean isScheduled(String taskID)
              Check, if a crawler task is scheduled for the specified taskID
    (and will run if any running or suspended crawler tasks for the
    same taskID are finished).
     boolean isSuspended(String taskID)
              Check, if a crawler task is suspended for the specified taskID.
     void recrawlErrors(String taskID)
              Restart a crawler task by crawling only the documents that failed during the last crawl.
     void reloadResourceFilters(String taskID)
              Reload the current version of the resource filters for a crawler.
     void reportDeletedResource(String taskID, RID rid, int startResourceListIndex)
              Report the deletion of a resource in the scope of a crawler
    The application that uses a crawler may use this method to report the deletion of a resource that was reported to it via RF event.
     void reportFailedResource(String taskID, RID crawlStartPath, RID rid, int startResourceListIndex)
              Report a problem with the processing of a resource in the scope of a crawler
    The application that uses a crawler may use this method to report a problem with the processing of a resource that was reported to him via RF event.
     void resumeCrawlerTask(String taskID)
              Resume a crawler task.
     void runCrawlerTask(String taskID, String taskDisplayName, IRidList[] startResources, IXCrawlerParameters[] parameters, String resultReceiverFactoryClassName, String userDataForFactory, boolean survivesRestart, boolean delta, ISystem node, boolean deleteAfterCompletion)
              Run a crawler task.
     void stopCrawlerTask(String taskID)
              Stop a crawler task.
     void stopCrawlerTaskAsync(String taskID)
              Stop a crawler task.
     void suspendCrawlerTask(String taskID)
              Suspend a crawler task.
     

    Field Detail

    MAX_TASK_ID_LENGTH

    static final int MAX_TASK_ID_LENGTH
    See Also:
    Constant Field Values

    MAX_TASK_DISPLAY_NAME_LENGTH

    static final int MAX_TASK_DISPLAY_NAME_LENGTH
    See Also:
    Constant Field Values

    MAX_USER_DATA_LENGTH

    static final int MAX_USER_DATA_LENGTH
    See Also:
    Constant Field Values
    Method Detail

    getCrawlerParameterNames

    String[] getCrawlerParameterNames()
                                      throws XCrawlerException
    Get the names of the available crawler parameters.
    These are the configurables in the configuration plugin /cm/services/xcrawlers of the config class XCrawler.

    Returns:
    the names of the available crawler parameters
    Throws:
    XCrawlerException

    getDefaultCrawlerParameterName

    String getDefaultCrawlerParameterName()
                                          throws XCrawlerException
    Get the name of the default crawler parameters.

    Returns:
    the name of the default crawler parameters
    Throws:
    XCrawlerException

    createCrawlerParameters

    IXCrawlerParameters createCrawlerParameters(String parameterName)
                                                throws XCrawlerException
    Create crawler parameters from a configurable in the configuration plugin /cm/services/xcrawlers.

    Parameters:
    parameterName - name of the configurable
    Returns:
    the created crawler parameters
    Throws:
    XCrawlerException

    createCrawlerParameters

    IXCrawlerParameters createCrawlerParameters(int maxDepth,
                                                int retrieverCount,
                                                int providerCount,
                                                boolean useETag,
                                                boolean useChecksum,
                                                boolean followLinks,
                                                boolean crawlVersions,
                                                boolean crawlHidden,
                                                boolean crawlSystem,
                                                long requestDelayInMilliseconds,
                                                IXCrawlerParameters.ModificationCheckMode modificationCheckMode,
                                                boolean findAllDocsInDepth,
                                                boolean respectRobots,
                                                boolean test,
                                                IResourceFilter[] scopeFilters,
                                                IResourceFilter[] resultFilters,
                                                long maxLogFileSizeInBytes,
                                                int maxBacklogFiles,
                                                String logFilePath,
                                                IXCrawlerParameters.LogLevel maxLogLevel,
                                                long documentTimeoutInSeconds)
                                                throws XCrawlerException
    Create crawler parameters.
    Old version

    Parameters:
    maxDepth - maximum depth of the crawl (0 is unlimited)
    retrieverCount - number of threads which retrieve the resources from the repositories
    providerCount - number of threads which provide the found resources to the result receivers
    useETag - true if the ETag of a resource should be used to detect modification
    useChecksum - true if the checksum of the resource content should be used to detect modification
    followLinks - true if links should be followed during the crawl
    crawlVersions - true if versions of resources should be included in the crawl
    crawlHidden - true if hidden resources should be included in the crawl
    crawlSystem - true if system resources should be included in the crawl
    requestDelayInMilliseconds - number of milliseconds between two consecutive resources retrievals (to limit repository load)
    modificationCheckMode - mode of resource modification detection (ETag AND checksum, ETag OR checksum)
    findAllDocsInDepth - true if resources should be found on the shortest possible path
    respectRobots - true if robot-rules of web-servers should be respected
    test - true if no resources should be provided to the result receiver
    scopeFilters - resource filters narrowing the scope of the crawl
    resultFilters - resource filters which are applied to the result of the crawl but do not narrow the scope
    maxLogFileSizeInBytes - maximum size of the crawler log file in bytes (0 is unlimited)
    maxBacklogFiles - maximum number of old crawler log files
    logFilePath - path to the crawler log file (if null the current system path is used)
    maxLogLevel - maximum log level
    documentTimeoutInSeconds - the document retrieval timeout in seconds
    Returns:
    the created crawler parameters
    Throws:
    XCrawlerException

    createCrawlerParameters

    IXCrawlerParameters createCrawlerParameters(int maxDepth,
                                                int retrieverCount,
                                                int providerCount,
                                                boolean useETag,
                                                boolean useChecksum,
                                                boolean followLinks,
                                                boolean followRedirects,
                                                boolean crawlVersions,
                                                boolean crawlHidden,
                                                boolean crawlSystem,
                                                long requestDelayInMilliseconds,
                                                IXCrawlerParameters.ModificationCheckMode modificationCheckMode,
                                                boolean findAllDocsInDepth,
                                                boolean respectRobots,
                                                boolean test,
                                                IResourceFilter[] scopeFilters,
                                                IResourceFilter[] resultFilters,
                                                long maxLogFileSizeInBytes,
                                                int maxBacklogFiles,
                                                String logFilePath,
                                                IXCrawlerParameters.LogLevel maxLogLevel,
                                                long documentTimeoutInSeconds)
                                                throws XCrawlerException
    Create crawler parameters.
    Old version

    Parameters:
    maxDepth - maximum depth of the crawl (0 is unlimited)
    retrieverCount - number of threads which retrieve the resources from the repositories
    providerCount - number of threads which provide the found resources to the result receivers
    useETag - true if the ETag of a resource should be used to detect modification
    useChecksum - true if the checksum of the resource content should be used to detect modification
    followLinks - true if links should be followed during the crawl
    followRedirects - true if redirects in Web-RMs should be followed during the crawl
    crawlVersions - true if versions of resources should be included in the crawl
    crawlHidden - true if hidden resources should be included in the crawl
    crawlSystem - true if system resources should be included in the crawl
    requestDelayInMilliseconds - number of milliseconds between two consecutive resources retrievals (to limit repository load)
    modificationCheckMode - mode of resource modification detection (ETag AND checksum, ETag OR checksum)
    findAllDocsInDepth - true if resources should be found on the shortest possible path
    respectRobots - true if robot-rules of web-servers should be respected
    test - true if no resources should be provided to the result receiver
    scopeFilters - resource filters narrowing the scope of the crawl
    resultFilters - resource filters which are applied to the result of the crawl but do not narrow the scope
    maxLogFileSizeInBytes - maximum size of the crawler log file in bytes (0 is unlimited)
    maxBacklogFiles - maximum number of old crawler log files
    logFilePath - path to the crawler log file (if null the current system path is used)
    maxLogLevel - maximum log level
    documentTimeoutInSeconds - the document retrieval timeout in seconds
    Returns:
    the created crawler parameters
    Throws:
    XCrawlerException

    createCrawlerParameters

    IXCrawlerParameters createCrawlerParameters(int maxDepth,
                                                int retrieverCount,
                                                int providerCount,
                                                boolean useETag,
                                                boolean useChecksum,
                                                boolean followLinks,
                                                boolean followRedirects,
                                                boolean crawlVersions,
                                                boolean crawlHidden,
                                                boolean crawlSystem,
                                                long requestDelayInMilliseconds,
                                                IXCrawlerParameters.ModificationCheckMode modificationCheckMode,
                                                boolean findAllDocsInDepth,
                                                boolean respectRobots,
                                                boolean respectNoIndex,
                                                boolean test,
                                                IResourceFilter[] scopeFilters,
                                                IResourceFilter[] resultFilters,
                                                long maxLogFileSizeInBytes,
                                                int maxBacklogFiles,
                                                String logFilePath,
                                                IXCrawlerParameters.LogLevel maxLogLevel,
                                                long documentTimeoutInSeconds)
                                                throws XCrawlerException
    Create crawler parameters.
    Old version

    Parameters:
    maxDepth - maximum depth of the crawl (0 is unlimited)
    retrieverCount - number of threads which retrieve the resources from the repositories
    providerCount - number of threads which provide the found resources to the result receivers
    useETag - true if the ETag of a resource should be used to detect modification
    useChecksum - true if the checksum of the resource content should be used to detect modification
    followLinks - true if links should be followed during the crawl
    followRedirects - true if redirects in Web-RMs should be followed during the crawl
    crawlVersions - true if versions of resources should be included in the crawl
    crawlHidden - true if hidden resources should be included in the crawl
    crawlSystem - true if system resources should be included in the crawl
    requestDelayInMilliseconds - number of milliseconds between two consecutive resources retrievals (to limit repository load)
    modificationCheckMode - mode of resource modification detection (ETag AND checksum, ETag OR checksum)
    findAllDocsInDepth - true if resources should be found on the shortest possible path
    respectRobots - true if robot-rules of web-servers should be respected
    respectNoIndex - true if the index-content property should be respected
    test - true if no resources should be provided to the result receiver
    scopeFilters - resource filters narrowing the scope of the crawl
    resultFilters - resource filters which are applied to the result of the crawl but do not narrow the scope
    maxLogFileSizeInBytes - maximum size of the crawler log file in bytes (0 is unlimited)
    maxBacklogFiles - maximum number of old crawler log files
    logFilePath - path to the crawler log file (if null the current system path is used)
    maxLogLevel - maximum log level
    documentTimeoutInSeconds - the document retrieval timeout in seconds
    Returns:
    the created crawler parameters
    Throws:
    XCrawlerException

    createCrawlerParameters

    IXCrawlerParameters createCrawlerParameters(int maxDepth,
                                                int retrieverCount,
                                                int providerCount,
                                                boolean useETag,
                                                boolean useChecksum,
                                                boolean followLinks,
                                                boolean followRedirects,
                                                boolean crawlVersions,
                                                boolean crawlVariants,
                                                boolean crawlHidden,
                                                boolean crawlSystem,
                                                long requestDelayInMilliseconds,
                                                IXCrawlerParameters.ModificationCheckMode modificationCheckMode,
                                                boolean findAllDocsInDepth,
                                                boolean respectRobots,
                                                boolean respectNoIndex,
                                                boolean respectNoFollow,
                                                boolean test,
                                                IResourceFilter[] scopeFilters,
                                                IResourceFilter[] resultFilters,
                                                long maxLogFileSizeInBytes,
                                                int maxBacklogFiles,
                                                String logFilePath,
                                                IXCrawlerParameters.LogLevel maxLogLevel,
                                                long documentTimeoutInSeconds)
                                                throws XCrawlerException
    Create crawler parameters.
    Old version

    Parameters:
    maxDepth - maximum depth of the crawl (0 is unlimited)
    retrieverCount - number of threads which retrieve the resources from the repositories
    providerCount - number of threads which provide the found resources to the result receivers
    useETag - true if the ETag of a resource should be used to detect modification
    useChecksum - true if the checksum of the resource content should be used to detect modification
    followLinks - true if links should be followed during the crawl
    followRedirects - true if redirects in Web-RMs should be followed during the crawl
    crawlVersions - true if versions of resources should be included in the crawl
    crawlVariants - true if variants of resources should be included in the crawl
    crawlHidden - true if hidden resources should be included in the crawl
    crawlSystem - true if system resources should be included in the crawl
    requestDelayInMilliseconds - number of milliseconds between two consecutive resources retrievals (to limit repository load)
    modificationCheckMode - mode of resource modification detection (ETag AND checksum, ETag OR checksum)
    findAllDocsInDepth - true if resources should be found on the shortest possible path
    respectRobots - true if robot-rules of web-servers should be respected
    respectNoIndex - true if the index-content property should be respected
    respectNoFollow - true if the follow-links property should be respected
    test - true if no resources should be provided to the result receiver
    scopeFilters - resource filters narrowing the scope of the crawl
    resultFilters - resource filters which are applied to the result of the crawl but do not narrow the scope
    maxLogFileSizeInBytes - maximum size of the crawler log file in bytes (0 is unlimited)
    maxBacklogFiles - maximum number of old crawler log files
    logFilePath - path to the crawler log file (if null the current system path is used)
    maxLogLevel - maximum log level
    documentTimeoutInSeconds - the document retrieval timeout in seconds
    Returns:
    the created crawler parameters
    Throws:
    XCrawlerException

    createCrawlerParameters

    IXCrawlerParameters createCrawlerParameters(int maxDepth,
                                                int retrieverCount,
                                                int providerCount,
                                                boolean useETag,
                                                boolean useChecksum,
                                                boolean useACL,
                                                boolean followLinks,
                                                boolean followRedirects,
                                                boolean crawlVersions,
                                                boolean crawlVariants,
                                                boolean crawlHidden,
                                                boolean crawlSystem,
                                                long requestDelayInMilliseconds,
                                                IXCrawlerParameters.ModificationCheckMode modificationCheckMode,
                                                boolean findAllDocsInDepth,
                                                boolean respectRobots,
                                                boolean respectNoIndex,
                                                boolean respectNoFollow,
                                                boolean test,
                                                IResourceFilter[] scopeFilters,
                                                IResourceFilter[] resultFilters,
                                                long maxLogFileSizeInBytes,
                                                int maxBacklogFiles,
                                                String logFilePath,
                                                IXCrawlerParameters.LogLevel maxLogLevel,
                                                long documentTimeoutInSeconds)
                                                throws XCrawlerException
    Create crawler parameters.
    Current version containing all possible settings.
    Added in 7.X

    Parameters:
    maxDepth - maximum depth of the crawl (0 is unlimited)
    retrieverCount - number of threads which retrieve the resources from the repositories
    providerCount - number of threads which provide the found resources to the result receivers
    useETag - true if the ETag of a resource should be used to detect modification
    useChecksum - true if the checksum of the resource content should be used to detect modification
    useACL - true if the ACL version number of the resource should be used to detect modification
    followLinks - true if links should be followed during the crawl
    followRedirects - true if redirects in Web-RMs should be followed during the crawl
    crawlVersions - true if versions of resources should be included in the crawl
    crawlVariants - true if variants of resources should be included in the crawl
    crawlHidden - true if hidden resources should be included in the crawl
    crawlSystem - true if system resources should be included in the crawl
    requestDelayInMilliseconds - number of milliseconds between two consecutive resources retrievals (to limit repository load)
    modificationCheckMode - mode of resource modification detection (ETag AND checksum, ETag OR checksum)
    findAllDocsInDepth - true if resources should be found on the shortest possible path
    respectRobots - true if robot-rules of web-servers should be respected
    respectNoIndex - true if the index-content property should be respected
    respectNoFollow - true if the follow-links property should be respected
    test - true if no resources should be provided to the result receiver
    scopeFilters - resource filters narrowing the scope of the crawl
    resultFilters - resource filters which are applied to the result of the crawl but do not narrow the scope
    maxLogFileSizeInBytes - maximum size of the crawler log file in bytes (0 is unlimited)
    maxBacklogFiles - maximum number of old crawler log files
    logFilePath - path to the crawler log file (if null the current system path is used)
    maxLogLevel - maximum log level
    documentTimeoutInSeconds - the document retrieval timeout in seconds
    Returns:
    the created crawler parameters
    Throws:
    XCrawlerException

    runCrawlerTask

    void runCrawlerTask(String taskID,
                        String taskDisplayName,
                        IRidList[] startResources,
                        IXCrawlerParameters[] parameters,
                        String resultReceiverFactoryClassName,
                        String userDataForFactory,
                        boolean survivesRestart,
                        boolean delta,
                        ISystem node,
                        boolean deleteAfterCompletion)
                        throws XCrawlerException
    Run a crawler task.
    Multiple lists of start resources can be specified. Each list has its own crawler parameters. The number of crawler parameters must match the number of lists of start resources.
    The crawler task is started asynchronously.
    Tasks with the same ID are started sequentially.

    Parameters:
    taskID - ID of the new task (maximum length is MAX_TASK_ID_LENGTH)
    taskDisplayName - display name of the new task (maximum length is MAX_TASK_DISPLAY_NAME_LENGTH, may be null)
    startResources - lists of start resources
    parameters - crawler parameters for the lists of start resources
    resultReceiverFactoryClassName - class which created result receivers; the name of the class is persisted in the database and reused via reflection when the crawler task is resumed; the class must implement IXCrawlerResultReceiverFactory
    userDataForFactory - this string is passed to the createResultReceiver() method of the resultReceiverFactory; here the result receiving application can store any data up to MAX_USER_DATA_LENGTH characters in length (may be null)
    survivesRestart - if true the crawler can be resumed even after a restart of CM
    delta - true if an incremental update should be performed
    node - cluster node on which the task should be executed
    deleteAfterCompletion - true if the crawler should be deleted after it is complete
    Throws:
    XCrawlerException

    suspendCrawlerTask

    void suspendCrawlerTask(String taskID)
                            throws XCrawlerException
    Suspend a crawler task.
    The task must be running.

    Parameters:
    taskID - ID of the task
    Throws:
    XCrawlerException

    resumeCrawlerTask

    void resumeCrawlerTask(String taskID)
                           throws XCrawlerException
    Resume a crawler task.
    The task must be suspended.

    Parameters:
    taskID - ID of the task
    Throws:
    XCrawlerException

    stopCrawlerTask

    void stopCrawlerTask(String taskID)
                         throws XCrawlerException
    Stop a crawler task.
    The method returns after the task is stopped.

    Parameters:
    taskID - ID of the task
    Throws:
    XCrawlerException

    stopCrawlerTaskAsync

    void stopCrawlerTaskAsync(String taskID)
                              throws XCrawlerException
    Stop a crawler task.
    The method returns immediately.

    Parameters:
    taskID - ID of the task
    Throws:
    XCrawlerException

    recrawlErrors

    void recrawlErrors(String taskID)
                       throws XCrawlerException
    Restart a crawler task by crawling only the documents that failed during the last crawl.
    The task must be down, done, failed, or stopped.

    Parameters:
    taskID - ID of the task
    Throws:
    XCrawlerException

    deleteCrawlerTask

    void deleteCrawlerTask(String taskID)
                           throws XCrawlerException
    Delete a crawler task.
    The task is stopped before deletion.
    The task is deleted in the database.

    Parameters:
    taskID - ID of the task
    Throws:
    XCrawlerException

    getCrawlerTaskSummaries

    IXCrawlerTaskSummary[] getCrawlerTaskSummaries()
                                                   throws XCrawlerException
    Get the state summaries of all crawler tasks.

    Returns:
    the state summaries of all crawler tasks
    Throws:
    XCrawlerException

    getCrawlerTaskSummary

    IXCrawlerTaskSummary getCrawlerTaskSummary(String taskID)
                                               throws XCrawlerException
    Get the state summary of a crawler task.

    Parameters:
    taskID - ID of the task
    Returns:
    the state summary of a crawler task (or null if no summary exists for this task)
    Throws:
    XCrawlerException

    isRunning

    boolean isRunning(String taskID)
                      throws XCrawlerException
    Check, if a crawler task is running for the specified taskID.

    Parameters:
    taskID - ID of the task
    Returns:
    true if a crawler task is running for the specified taskID
    Throws:
    XCrawlerException

    isSuspended

    boolean isSuspended(String taskID)
                        throws XCrawlerException
    Check, if a crawler task is suspended for the specified taskID.

    Parameters:
    taskID - ID of the task
    Returns:
    true if a crawler task is suspended for the specified taskID
    Throws:
    XCrawlerException

    isScheduled

    boolean isScheduled(String taskID)
                        throws XCrawlerException
    Check, if a crawler task is scheduled for the specified taskID
    (and will run if any running or suspended crawler tasks for the
    same taskID are finished).

    Parameters:
    taskID - ID of the task
    Returns:
    true if a crawler task is scheduled for the specified taskID
    Throws:
    XCrawlerException

    isFiltered

    boolean isFiltered(IResource resource,
                       IXCrawlerParameters parameters,
                       RID crawlStartPath)
                       throws XCrawlerException
    Check, if a resource would be filtered out during a crawl with specific crawler parameters

    Parameters:
    resource - the resource
    parameters - the crawler parameters
    crawlStartPath - path of the related datasource that is attached to the index (for depth calculation)
    Returns:
    true if the resource would be filtered (i.e. NOT passed to the result receiver)
    Throws:
    XCrawlerException

    reloadResourceFilters

    void reloadResourceFilters(String taskID)
                               throws XCrawlerException
    Reload the current version of the resource filters for a crawler.
    Works only for suspended crawlers! After the next resume the new filters apply.
    Works only for crawlers whose crawler parameters have been created from a configurable (via createCrawlerParameters(String parameterName)).

    Throws:
    XCrawlerException

    getSurvivesRestart

    boolean getSurvivesRestart(String taskID)
                               throws XCrawlerException
    Get the survive restart flag of a crawler task. Used to identify, if given crawler is able to survive a CM downtime.
    Added in 7.X

    Parameters:
    taskID - ID of the task
    Returns:
    true if crawler task survives the restart of CM
    Throws:
    XCrawlerException

    clearSurvivesRestart

    void clearSurvivesRestart(String taskID)
                              throws XCrawlerException
    Set the survive restart flag of a crawler task to 0, i.e. the task will NOT be resumed after the CM downtime.

    Parameters:
    taskID - ID of the task
    Throws:
    XCrawlerException - Added in 7.X

    reportDeletedResource

    void reportDeletedResource(String taskID,
                               RID rid,
                               int startResourceListIndex)
                               throws XCrawlerException
    Report the deletion of a resource in the scope of a crawler
    The application that uses a crawler may use this method to report the deletion of a resource that was reported to it via RF event.
    Before the start of the next delta crawl, the crawler will remove this resource from its database. The result receiver will not be informed about the deletion again. If the resource was re-created, the result receiver will be informed about this.
    Added in 7.X

    Parameters:
    taskID - ID of the crawler task
    rid - RID of the deleted resource
    startResourceListIndex - index (starting at 0) of the start resource list (passed in run()) the reported RID belongs to
    Throws:
    XCrawlerException

    reportFailedResource

    void reportFailedResource(String taskID,
                              RID crawlStartPath,
                              RID rid,
                              int startResourceListIndex)
                              throws XCrawlerException
    Report a problem with the processing of a resource in the scope of a crawler
    The application that uses a crawler may use this method to report a problem with the processing of a resource that was reported to him via RF event.
    Before the start of the next delta crawl, the crawler will remove this resource from its database. This will result in this resource being reported a new if it still exists. The resource will be included in the next recrawl of errors.

    Parameters:
    taskID - ID of the crawler task
    crawlStartPath - path of the related datasource that is attached to the index (for depth calculation)
    rid - RID of the failed resource
    startResourceListIndex - index (starting at 0) of the start resource list (passed in run()) the reported RID belongs to
    Throws:
    XCrawlerException

    getFailedResourcesOfCrawler

    RID[] getFailedResourcesOfCrawler(String taskID,
                                      int max)
                                      throws XCrawlerException
    Get resources that caused errors in the last (or current) run of a crawler task
    Added in 7.X

    Parameters:
    taskID - ID of the crawler task
    max - the maximum number of results (will be truncated to 100)
    Returns:
    the RIDs of the resources
    Throws:
    XCrawlerException

    getNumberOfFailedResourcesOfEvents

    int getNumberOfFailedResourcesOfEvents(String taskID)
                                           throws XCrawlerException
    Get the number of resources that have been reported as failed by calls of reportFailedResource()
    Added in 7.X

    Parameters:
    taskID - ID of the crawler task
    Returns:
    the number of resources
    Throws:
    XCrawlerException

    getFailedResourcesOfEvents

    RID[] getFailedResourcesOfEvents(String taskID,
                                     int max)
                                     throws XCrawlerException
    Get resources that have been reported as failed by calls of reportFailedResource()
    Added in 7.X

    Parameters:
    taskID - ID of the crawler task
    max - the maximum number of results (will be truncated to 100)
    Returns:
    the RIDs of the resources
    Throws:
    XCrawlerException
    Access Rights

    This class can be accessed from:

    
    
    SC DC Public Part ACH
    [sap.com] KMC-CM [sap.com] tc/km/frwk api EP-KM-CM
    [sap.com] KMC-WPC [sap.com] tc/kmc/wpc/wpcfacade api EP-PIN-WPC-WCM


    Copyright 2014 SAP AG Complete Copyright Notice