Show TOC

Preprocessing FlowLocate this document in the navigation structure


The graphic below depicts the most important steps that take place immediately before, during, and after preprocessing.

The graphic depicts the process flow of the application transmitting the URI of a document to TREX. If the application transmits the document directly, the step 'Load document (HTTP/HTTPS Get)'does not take place.

The application sends indexing requests to the TREX Web server or TREX RFC server. This server then forwards the requests to the queue server. The queue server assigns the requests to the correct queues and distributes the requests among one or more preprocessors. The actual preprocessing of documents then takes place on the preprocessor(s).

When the preprocessing has been completed, the preprocessor passes the analyzed document to the queue server. The queue server collects the documents and, depending on its configuration, triggers further processing on the index server.

How Does the Distribution of Documents Take Place?

The distribution of documents among the preprocessors is controlled by the name server. The distribution takes place according to a round robin procedure that takes the number of times that a preprocessor has been accessed into account. Preprocessors that have been accessed less often are preferred when distributing documents.

The process flow is as follows:

  1. When a queue server receives a document it assigns it to a preprocessor client.

  2. The preprocessor client asks the name server for the address of a preprocessor.

  3. The name server returns the preprocessor that has been accessed least often.

  4. The preprocessor client forwards the document to the preprocessor and waits for a response. Preprocessor clients are busy while waiting for a response. They receive no further documents from the queue server during this time.

  5. When the preprocessing of the documents is over, the preprocessor client receives a response from the preprocessor, and returns its own response to the queue server.

  6. Only then is the preprocessor client free to receive further documents from the queue server.