Start of Content Area

Component documentation TREX Components  Locate the document in its SAP Library structure

Purpose

TREX includes the following central components:

·        Java client

·        ABAP client

·        Web server with TREX extension

·        Queue server

·        Preprocessor

·        Index server

·        Name server

Features

The graphic below shows the individual components and how they communicate.

This graphic is explained in the accompanying text

 

Java Client and ABAP Client

TREX provides several interfaces that can be used to integrate TREX functions into an application. The Java client is an interface that Java applications can use to access TREX. ABAP applications use the ABAP client to access TREX.

The Java client is integrated into Content Management. This means that the TREX functions are available in Content Management and in the portal. The ABAP client constitutes part of the SAP R/3 system.

Web Server with TREX extension

Content Management (more precisely, the Java client) accesses the TREX functions using a Web server. Communication between Content Management and the Web server takes place using HTTP/HTTPS and XML. The Web server receives requests and forwards them to the index server and queue server. The servers then process the requests.

A TREX component that enhances the Web server with TREX-specific functions is installed on the Web server. Technically speaking, this component is realized as follows:

·        On Windows, as an ISAPI server extension for the Microsoft Internet Information Server

·        On UNIX, as a shared library for the Apache Web server.

Queue Server

The queue server enables the asynchronous indexing of documents. It has a separate queue for each index. It gathers documents to be indexed into one of the queues. It transfers documents to the index server for the actual indexing process at regular intervals. You can use the queue parameters to control when and how many documents are transmitted. This allows you to schedule indexing for times at which the index server does not receive a large number of search requests.

The queue server forwards the documents to the preprocessor before transmitting them to the index server.

Preprocessor

The preprocessor has two tasks:

·        Before a search takes place, the preprocessor carries out a linguistic analysis of search queries. The preprocessor passes the results of the analysis to the index server, which then processes the document further.

·        Before indexing takes place, the preprocessor prepares the documents for the indexing process. The preparation consists of the following steps:

¡        Loading the document

In the portal, documents are not normally transferred directly to TREX. Instead, they are forwarded in the form of a URI that references the storage location of the document in question. The preprocessor resolves the URI and collects the actual document from the repository.

¡        Filtering the document

Documents can exist in various formats (Microsoft Word, Microsoft PowerPoint, PDF, and so on). The preprocessor filters the documents, that is, it extracts the text content and converts it to Unicode format UTF 8 for further processing.

¡        Analyzing the document linguistically

The preprocessor uses a lexicon that analyzes texts in various languages.

Index Server

The index server is responsible for indexing, classifying, and searching. It receives requests and forwards requests to the TREX engines. The engines provide the actual core functions of TREX.

·        The search engine is responsible for standard search functions such as exact, error-tolerant, linguistic, Boolean, and phrase search.

·        The text-mining engine is responsible for classification, searching for similar documents (‘See Also’), the extraction of key words, and so on.

·        The attribute engine is responsible for searching in document properties such as author, creation date, change date, and so on.

Name Server

The name server is used with large distributed TREX installations. It uses its database to store and coordinate system-wide information. It also ensures that the TREX servers can communicate with each other and that TREX can communicate with Content Management. The name server is also responsible for distributing the system load if more than one TREX server is capable of carrying out a task.

In a distributed scenario, you can install several name servers to ensure that a name server is always available. A replication procedure ensures that the databases of the different name servers are synchronized.

 

 

 

 

End of Content Area