com.sapportals.wcm.service.xcrawler
Interface IXCrawlerParameters
- public interface IXCrawlerParameters
Parameters determining the behaviour of a crawl.
Copyright (c) SAP AG 2003
|
Method Summary |
String |
getConfigurableName()
Get the name of the configurable the CrawlerParameters have been created from (may be null). |
boolean |
getCrawlHidden()
Check, whether hidden resources are included in the crawl. |
boolean |
getCrawlSystem()
Check, whether system resources are included in the crawl. |
boolean |
getCrawlVersions()
Check, whether versions of resources are included in the crawl. |
String |
getDescription()
Get the description of the parameter set. |
long |
getDocumentTimeoutInSeconds()
Get the document timeout in seconds. |
int |
getErrorCacheCapacity()
Get the capacity of the cache for the error-set. |
IPropertyName |
getExcludedHrefPropertyName()
Get the name of the property which holds the HREFs of a resource from a web-repository which are restricted by robot-rules. |
int |
getFilteredCacheCapacity()
Get the capacity of the cache for the filtered-set. |
boolean |
getFindAllDocsInDepth()
Check, whether resources are found on the shorted possible path (there may be multiple paths in a web-repository). |
int |
getFinishedCacheCapacity()
Get the capacity of the cache for the finished-set. |
boolean |
getFollowLinks()
Check, whether links are followed. |
boolean |
getFollowRedirects()
Check, whether redirects on web-sites are followed. |
int |
getFoundCacheCapacity()
Get the capacity of the cache for the found-set. |
IPropertyName |
getHrefPropertyName()
Get the name of the property which holds the HREFs of a resource from a web-repository. |
String |
getLogFilePath()
Get the path to the crawler log file. |
int |
getMaxBacklogFiles()
Get the maximum number of old crawler log files. |
int |
getMaxDepth()
Get the maximum depth of the crawl process (0 is unlimited). |
long |
getMaxLogFileSizeInBytes()
Get the maximum size of the crawler log file in bytes. |
IXCrawlerParameters.LogLevel |
getMaxLogLevel()
Get the maximum log level. |
IXCrawlerParameters.ModificationCheckMode |
getModificationCheckMode()
Get the mode for checking whether a resource was modified. |
int |
getOldCacheCapacity()
Get the capacity of the cache for the old-set. |
int |
getPostprocessedCacheCapacity()
Get the capacity of the cache for the postprocessed-set. |
int |
getPostprocessingCacheCapacity()
Get the capacity of the cache for the postprocessing-set. |
int |
getProviderCount()
Get the number of provider threads. |
int |
getProvidingCacheCapacity()
Get the capacity of the cache for the providing-set. |
long |
getRequestDelayInMilliseconds()
Get the number of milliseconds every crawler thread waits after retrieving a resource from a repository to reduce
the load on the underlying persistency (e.g. database) or channel (e.g. network). |
boolean |
getRespectNoIndex()
Check, wether the http://sapportals.com/xmlns/cm/index-content property should be respected |
boolean |
getRespectRobots()
Check, whether the robot-rules of web-servers are respected. |
IResourceFilter[] |
getResultFilters()
Get the resource filters which are applied to the result of the crawl but do not narrow the scope. |
int |
getRetrieverCount()
Get the number of retriever threads. |
int |
getRetrievingCacheCapacity()
Get the capacity of the cache for the retrieving-set. |
IResourceFilter[] |
getScopeFilters()
Get the resource filters which narrow the scope of the crawl. |
long |
getSleepDistanceInMilliseconds()
Get the number of milliseconds between two sleep-periods of a crawler-thread. |
long |
getSleepDurationInMilliseconds()
Get the duration of a sleep-period of a crawler-thread in milliseconds. |
boolean |
getTest()
Check, whether the crawler runs in test-mode (no passing of results to the result receivers). |
int |
getTodoCacheCapacity()
Get the capacity of the cache for the todo-set. |
boolean |
getUseChecksum()
Check, whether a checksum is used to determine whether a resource has changed. |
boolean |
getUseETag()
Check, whether the ETag is used to determine whether a resource has changed. |
getConfigurableName
public String getConfigurableName()
- Get the name of the configurable the CrawlerParameters have been created from (may be null).
- Returns:
- the name of the configurable the CrawlerParameters have been created from (may be null)
getDescription
public String getDescription()
- Get the description of the parameter set.
- Returns:
- the description of the parameter set
getMaxDepth
public int getMaxDepth()
- Get the maximum depth of the crawl process (0 is unlimited).
- Returns:
- the maximum depth of the crawl process
getRetrieverCount
public int getRetrieverCount()
- Get the number of retriever threads.
- Returns:
- the number of retriever threads
getProviderCount
public int getProviderCount()
- Get the number of provider threads.
- Returns:
- the number of provider threads
getUseChecksum
public boolean getUseChecksum()
- Check, whether a checksum is used to determine whether a resource has changed.
- Returns:
- true iff a checksum is used to determine whether a resource has changed
getUseETag
public boolean getUseETag()
- Check, whether the ETag is used to determine whether a resource has changed.
- Returns:
- true iff the ETag is used to determine whether a resource has changed
getFollowLinks
public boolean getFollowLinks()
- Check, whether links are followed.
- Returns:
- true iff links are followed
getFollowRedirects
public boolean getFollowRedirects()
- Check, whether redirects on web-sites are followed.
- Returns:
- true iff redirects on web-sites are followed
getCrawlVersions
public boolean getCrawlVersions()
- Check, whether versions of resources are included in the crawl.
- Returns:
- true iff versions of resources are included in the crawl
getCrawlHidden
public boolean getCrawlHidden()
- Check, whether hidden resources are included in the crawl.
- Returns:
- true iff hidden resources are included in the crawl
getCrawlSystem
public boolean getCrawlSystem()
- Check, whether system resources are included in the crawl.
- Returns:
- true iff system resources are included in the crawl
getModificationCheckMode
public IXCrawlerParameters.ModificationCheckMode getModificationCheckMode()
- Get the mode for checking whether a resource was modified.
- Returns:
- the mode for checking whether a resource was modified
getRequestDelayInMilliseconds
public long getRequestDelayInMilliseconds()
- Get the number of milliseconds every crawler thread waits after retrieving a resource from a repository to reduce
the load on the underlying persistency (e.g. database) or channel (e.g. network).
- Returns:
- the number of milliseconds every crawler thread waits after retrieving a resource from a repository
getFindAllDocsInDepth
public boolean getFindAllDocsInDepth()
- Check, whether resources are found on the shorted possible path (there may be multiple paths in a web-repository).
- Returns:
- true iff resources are found on the shorted possible path
getRespectRobots
public boolean getRespectRobots()
- Check, whether the robot-rules of web-servers are respected.
- Returns:
- true iff the robot-rules of web-servers are respected.
getRespectNoIndex
public boolean getRespectNoIndex()
- Check, wether the http://sapportals.com/xmlns/cm/index-content property should be respected
- Returns:
- true iff the index-content property is respected.
getTest
public boolean getTest()
- Check, whether the crawler runs in test-mode (no passing of results to the result receivers).
- Returns:
- true iff whether the crawler runs in test-mode
getScopeFilters
public IResourceFilter[] getScopeFilters()
- Get the resource filters which narrow the scope of the crawl.
- Returns:
- the resource filters which narrow the scope of the crawl
getResultFilters
public IResourceFilter[] getResultFilters()
- Get the resource filters which are applied to the result of the crawl but do not narrow the scope.
- Returns:
- the resource filters which are applied to the result of the crawl but do not narrow the scope
getHrefPropertyName
public IPropertyName getHrefPropertyName()
- Get the name of the property which holds the HREFs of a resource from a web-repository.
- Returns:
- the name of the property which holds the HREFs of a resource from a web-repository
getExcludedHrefPropertyName
public IPropertyName getExcludedHrefPropertyName()
- Get the name of the property which holds the HREFs of a resource from a web-repository which are restricted by robot-rules.
- Returns:
- the name of the property which holds the HREFs of a resource from a web-repository which are restricted by robot-rules
getTodoCacheCapacity
public int getTodoCacheCapacity()
- Get the capacity of the cache for the todo-set.
- Returns:
- the capacity of the cache for the todo-set
getRetrievingCacheCapacity
public int getRetrievingCacheCapacity()
- Get the capacity of the cache for the retrieving-set.
- Returns:
- the capacity of the cache for the retrieving-set
getFoundCacheCapacity
public int getFoundCacheCapacity()
- Get the capacity of the cache for the found-set.
- Returns:
- the capacity of the cache for the found-set
getProvidingCacheCapacity
public int getProvidingCacheCapacity()
- Get the capacity of the cache for the providing-set.
- Returns:
- the capacity of the cache for the providing-set
getFinishedCacheCapacity
public int getFinishedCacheCapacity()
- Get the capacity of the cache for the finished-set.
- Returns:
- the capacity of the cache for the finished-set
getOldCacheCapacity
public int getOldCacheCapacity()
- Get the capacity of the cache for the old-set.
- Returns:
- the capacity of the cache for the old-set
getPostprocessingCacheCapacity
public int getPostprocessingCacheCapacity()
- Get the capacity of the cache for the postprocessing-set.
- Returns:
- the capacity of the cache for the postprocessing-set
getPostprocessedCacheCapacity
public int getPostprocessedCacheCapacity()
- Get the capacity of the cache for the postprocessed-set.
- Returns:
- the capacity of the cache for the postprocessed-set
getErrorCacheCapacity
public int getErrorCacheCapacity()
- Get the capacity of the cache for the error-set.
- Returns:
- the capacity of the cache for the error-set
getFilteredCacheCapacity
public int getFilteredCacheCapacity()
- Get the capacity of the cache for the filtered-set.
- Returns:
- the capacity of the cache for the filtered-set
getSleepDistanceInMilliseconds
public long getSleepDistanceInMilliseconds()
- Get the number of milliseconds between two sleep-periods of a crawler-thread.
- Returns:
- the number of milliseconds between two sleep-periods of a crawler-thread
getSleepDurationInMilliseconds
public long getSleepDurationInMilliseconds()
- Get the duration of a sleep-period of a crawler-thread in milliseconds.
- Returns:
- the duration of a sleep-period of a crawler-thread in milliseconds
getMaxLogFileSizeInBytes
public long getMaxLogFileSizeInBytes()
- Get the maximum size of the crawler log file in bytes.
- Returns:
- the maximum size of the crawler log file in bytes
getMaxBacklogFiles
public int getMaxBacklogFiles()
- Get the maximum number of old crawler log files.
- Returns:
- the maximum number of old crawler log files
getLogFilePath
public String getLogFilePath()
- Get the path to the crawler log file.
- Returns:
- the path to the crawler log file (may return null)
getMaxLogLevel
public IXCrawlerParameters.LogLevel getMaxLogLevel()
- Get the maximum log level.
- Returns:
- the maximum log level
getDocumentTimeoutInSeconds
public long getDocumentTimeoutInSeconds()
- Get the document timeout in seconds.
- Returns:
- the document timeout in seconds
Copyright 2006 SAP AG. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice. Microsoft, Windows, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation. Oracle is a registered trademark of Oracle Corporation. UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group. Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of Citrix Systems, Inc. HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C, World Wide Web Consortium, Massachusetts Institute of Technology. Java is a registered trademark of Sun Microsystems, Inc. JavaScript is a registered trademark of Sun Microsystems, Inc., used under license for technology invented and implemented by Netscape. MaxDB is a trademark of MySQL AB, Sweden. SAP, R/3, mySAP, mySAP.com, xApps, xApp, SAP NetWeaver, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary. These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.