com.sapportals.wcm.service.xcrawler

Interface IXCrawlerParameters


public interface IXCrawlerParameters

Parameters determining the behavior of a crawl.
Changes between major releases 7.0 and 7.X:
Added methods:

  • 'public boolean getRespectNoFollow()'

  • 'public boolean getUseACL()'

  • Deprecated methods:
  • 'public boolean getFindAllDocsInDepth()'

  • 'public long getSleepDistanceInMilliseconds()'

  • 'public long getSleepDurationInMilliseconds()'

  • Copyright (c) SAP AG 2003


    Nested Class Summary
    static class IXCrawlerParameters.LogLevel
              Log levels for crawler log files
    static class IXCrawlerParameters.ModificationCheckMode
              Modes for checking whether a resource was modified
     
    Method Summary
     String getConfigurableName()
              Get the name of the configurable the CrawlerParameters have been created from (may be null).
     boolean getCrawlHidden()
              Check, whether hidden resources are included in the crawl.
     boolean getCrawlSystem()
              Check, whether system resources are included in the crawl.
     boolean getCrawlVariants()
              Check, whether variants of resources are included in the crawl.
     boolean getCrawlVersions()
              Check, whether versions of resources are included in the crawl.
     String getDescription()
              Get the description of the parameter set.
     long getDocumentTimeoutInSeconds()
              Get the document timeout in seconds.
     int getErrorCacheCapacity()
              Get the capacity of the cache for the error-set.
     IPropertyName getExcludedHrefPropertyName()
              Get the name of the property which holds the HREFs of a resource from a web-repository which are restricted by robot-rules.
     int getFilteredCacheCapacity()
              Get the capacity of the cache for the filtered-set.
     boolean getFindAllDocsInDepth()
              Deprecated. not used anymore (returns always true)
     int getFinishedCacheCapacity()
              Get the capacity of the cache for the finished-set.
     boolean getFollowLinks()
              Check, whether links are followed.
     boolean getFollowRedirects()
              Check, whether redirects on web-sites are followed.
     int getFoundCacheCapacity()
              Get the capacity of the cache for the found-set.
     IPropertyName getHrefPropertyName()
              Get the name of the property which holds the HREFs of a resource from a web-repository.
     String getLogFilePath()
              Get the path to the crawler log file.
     int getMaxBacklogFiles()
              Get the maximum number of old crawler log files.
     int getMaxDepth()
              Get the maximum depth of the crawl process (0 is unlimited).
     long getMaxLogFileSizeInBytes()
              Get the maximum size of the crawler log file in bytes.
     IXCrawlerParameters.LogLevel getMaxLogLevel()
              Get the maximum log level.
     IXCrawlerParameters.ModificationCheckMode getModificationCheckMode()
              Get the mode for checking whether a resource was modified.
     int getOldCacheCapacity()
              Get the capacity of the cache for the old-set.
     int getPostprocessedCacheCapacity()
              Get the capacity of the cache for the postprocessed-set.
     int getPostprocessingCacheCapacity()
              Get the capacity of the cache for the postprocessing-set.
     int getProviderCount()
              Get the number of provider threads.
     int getProvidingCacheCapacity()
              Get the capacity of the cache for the providing-set.
     long getRequestDelayInMilliseconds()
              Get the number of milliseconds every crawler thread waits after retrieving a resource from a repository to reduce the load on the underlying persistence (e.g. database) or channel (e.g. network).
     boolean getRespectNoFollow()
              Check, whether the http://sapportals.com/xmlns/cm/follow-links property should be respected
    Added in 7.X
     boolean getRespectNoIndex()
              Check, whether the http://sapportals.com/xmlns/cm/index-content property should be respected
     boolean getRespectRobots()
              Check, whether the robot-rules of web-servers are respected.
     IResourceFilter[] getResultFilters()
              Get the resource filters which are applied to the result of the crawl but do not narrow the scope.
     int getRetrieverCount()
              Get the number of retriever threads.
     int getRetrievingCacheCapacity()
              Get the capacity of the cache for the retrieving-set.
     IResourceFilter[] getScopeFilters()
              Get the resource filters which narrow the scope of the crawl.
     long getSleepDistanceInMilliseconds()
              Deprecated. not used anymore (returns always 0)
     long getSleepDurationInMilliseconds()
              Deprecated. not used anymore (returns always 0)
     boolean getTest()
              Check, whether the crawler runs in test-mode (no passing of results to the result receivers).
     int getTodoCacheCapacity()
              Get the capacity of the cache for the todo-set.
     boolean getUseACL()
              Check, whether the ACL version number is used to determine whether a resource has changed.
     boolean getUseChecksum()
              Check, whether a checksum is used to determine whether a resource has changed.
     boolean getUseETag()
              Check, whether the ETag is used to determine whether a resource has changed.
     

    Method Detail

    getConfigurableName

    String getConfigurableName()
    Get the name of the configurable the CrawlerParameters have been created from (may be null).

    Returns:
    the name of the configurable the CrawlerParameters have been created from (may be null)

    getDescription

    String getDescription()
    Get the description of the parameter set.

    Returns:
    the description of the parameter set

    getMaxDepth

    int getMaxDepth()
    Get the maximum depth of the crawl process (0 is unlimited).

    Returns:
    the maximum depth of the crawl process

    getRetrieverCount

    int getRetrieverCount()
    Get the number of retriever threads.

    Returns:
    the number of retriever threads

    getProviderCount

    int getProviderCount()
    Get the number of provider threads.

    Returns:
    the number of provider threads

    getUseChecksum

    boolean getUseChecksum()
    Check, whether a checksum is used to determine whether a resource has changed.

    Returns:
    true if a checksum is used to determine whether a resource has changed

    getUseETag

    boolean getUseETag()
    Check, whether the ETag is used to determine whether a resource has changed.

    Returns:
    true if the ETag is used to determine whether a resource has changed

    getUseACL

    boolean getUseACL()
    Check, whether the ACL version number is used to determine whether a resource has changed.
    Added in 7.X

    Returns:
    true if the ACL version number is used to determine whether a resource has changed

    getFollowLinks

    boolean getFollowLinks()
    Check, whether links are followed.

    Returns:
    true if links are followed

    getFollowRedirects

    boolean getFollowRedirects()
    Check, whether redirects on web-sites are followed.

    Returns:
    true if redirects on web-sites are followed

    getCrawlVersions

    boolean getCrawlVersions()
    Check, whether versions of resources are included in the crawl.

    Returns:
    true if versions of resources are included in the crawl

    getCrawlVariants

    boolean getCrawlVariants()
    Check, whether variants of resources are included in the crawl.

    Returns:
    true if variants of resources are included in the crawl

    getCrawlHidden

    boolean getCrawlHidden()
    Check, whether hidden resources are included in the crawl.

    Returns:
    true if hidden resources are included in the crawl

    getCrawlSystem

    boolean getCrawlSystem()
    Check, whether system resources are included in the crawl.

    Returns:
    true if system resources are included in the crawl

    getModificationCheckMode

    IXCrawlerParameters.ModificationCheckMode getModificationCheckMode()
    Get the mode for checking whether a resource was modified.

    Returns:
    the mode for checking whether a resource was modified

    getRequestDelayInMilliseconds

    long getRequestDelayInMilliseconds()
    Get the number of milliseconds every crawler thread waits after retrieving a resource from a repository to reduce the load on the underlying persistence (e.g. database) or channel (e.g. network).

    Returns:
    the number of milliseconds every crawler thread waits after retrieving a resource from a repository

    getFindAllDocsInDepth

    boolean getFindAllDocsInDepth()
    Deprecated. not used anymore (returns always true)

    Check, whether resources are found on the shorted possible path (there may be multiple paths in a web-repository).

    Returns:
    true if resources are found on the shorted possible path

    getRespectRobots

    boolean getRespectRobots()
    Check, whether the robot-rules of web-servers are respected.

    Returns:
    true if the robot-rules of web-servers are respected.

    getRespectNoIndex

    boolean getRespectNoIndex()
    Check, whether the http://sapportals.com/xmlns/cm/index-content property should be respected

    Returns:
    true if the index-content property is respected.

    getRespectNoFollow

    boolean getRespectNoFollow()
    Check, whether the http://sapportals.com/xmlns/cm/follow-links property should be respected
    Added in 7.X

    Returns:
    true if the follow-links property is respected.

    getTest

    boolean getTest()
    Check, whether the crawler runs in test-mode (no passing of results to the result receivers).

    Returns:
    true if whether the crawler runs in test-mode

    getScopeFilters

    IResourceFilter[] getScopeFilters()
    Get the resource filters which narrow the scope of the crawl.

    Returns:
    the resource filters which narrow the scope of the crawl

    getResultFilters

    IResourceFilter[] getResultFilters()
    Get the resource filters which are applied to the result of the crawl but do not narrow the scope.

    Returns:
    the resource filters which are applied to the result of the crawl but do not narrow the scope

    getHrefPropertyName

    IPropertyName getHrefPropertyName()
    Get the name of the property which holds the HREFs of a resource from a web-repository.

    Returns:
    the name of the property which holds the HREFs of a resource from a web-repository

    getExcludedHrefPropertyName

    IPropertyName getExcludedHrefPropertyName()
    Get the name of the property which holds the HREFs of a resource from a web-repository which are restricted by robot-rules.

    Returns:
    the name of the property which holds the HREFs of a resource from a web-repository which are restricted by robot-rules

    getTodoCacheCapacity

    int getTodoCacheCapacity()
    Get the capacity of the cache for the todo-set.

    Returns:
    the capacity of the cache for the todo-set

    getRetrievingCacheCapacity

    int getRetrievingCacheCapacity()
    Get the capacity of the cache for the retrieving-set.

    Returns:
    the capacity of the cache for the retrieving-set

    getFoundCacheCapacity

    int getFoundCacheCapacity()
    Get the capacity of the cache for the found-set.

    Returns:
    the capacity of the cache for the found-set

    getProvidingCacheCapacity

    int getProvidingCacheCapacity()
    Get the capacity of the cache for the providing-set.

    Returns:
    the capacity of the cache for the providing-set

    getFinishedCacheCapacity

    int getFinishedCacheCapacity()
    Get the capacity of the cache for the finished-set.

    Returns:
    the capacity of the cache for the finished-set

    getOldCacheCapacity

    int getOldCacheCapacity()
    Get the capacity of the cache for the old-set.

    Returns:
    the capacity of the cache for the old-set

    getPostprocessingCacheCapacity

    int getPostprocessingCacheCapacity()
    Get the capacity of the cache for the postprocessing-set.

    Returns:
    the capacity of the cache for the postprocessing-set

    getPostprocessedCacheCapacity

    int getPostprocessedCacheCapacity()
    Get the capacity of the cache for the postprocessed-set.

    Returns:
    the capacity of the cache for the postprocessed-set

    getErrorCacheCapacity

    int getErrorCacheCapacity()
    Get the capacity of the cache for the error-set.

    Returns:
    the capacity of the cache for the error-set

    getFilteredCacheCapacity

    int getFilteredCacheCapacity()
    Get the capacity of the cache for the filtered-set.

    Returns:
    the capacity of the cache for the filtered-set

    getSleepDistanceInMilliseconds

    long getSleepDistanceInMilliseconds()
    Deprecated. not used anymore (returns always 0)

    Get the number of milliseconds between two sleep-periods of a crawler-thread.

    Returns:
    the number of milliseconds between two sleep-periods of a crawler-thread

    getSleepDurationInMilliseconds

    long getSleepDurationInMilliseconds()
    Deprecated. not used anymore (returns always 0)

    Get the duration of a sleep-period of a crawler-thread in milliseconds.

    Returns:
    the duration of a sleep-period of a crawler-thread in milliseconds

    getMaxLogFileSizeInBytes

    long getMaxLogFileSizeInBytes()
    Get the maximum size of the crawler log file in bytes.

    Returns:
    the maximum size of the crawler log file in bytes

    getMaxBacklogFiles

    int getMaxBacklogFiles()
    Get the maximum number of old crawler log files.

    Returns:
    the maximum number of old crawler log files

    getLogFilePath

    String getLogFilePath()
    Get the path to the crawler log file.

    Returns:
    the path to the crawler log file (may return null)

    getMaxLogLevel

    IXCrawlerParameters.LogLevel getMaxLogLevel()
    Get the maximum log level.

    Returns:
    the maximum log level

    getDocumentTimeoutInSeconds

    long getDocumentTimeoutInSeconds()
    Get the document timeout in seconds.

    Returns:
    the document timeout in seconds
    Access Rights

    This class can be accessed from:

    
    
    SC DC Public Part ACH
    [sap.com] KMC-CM [sap.com] tc/km/frwk api EP-KM-CM
    [sap.com] KMC-WPC [sap.com] tc/kmc/wpc/wpcfacade api EP-PIN-WPC-WCM


    Copyright 2014 SAP AG Complete Copyright Notice