The graphic below depicts the process of indexing with the queue server.
The application sends the documents to be indexed to the queue server. The application controls whether it sends documents directly or in the form of URIs that reference the storage location of the documents.
For example, the application can do this based on the size of a document: It can send small documents directly but larger documents in the form of URIs.
The queue server has a separate queue for each index. It collects documents sent by the application in this queue.
The queue server transmits the documents to the preprocessor. The preprocessor carries out the following steps:
Resolving the URI
If the application has sent a URI, the preprocessor loads the document from the storage location that the URI points to.
The preprocessor extracts the text content of the documents and converts it into Unicode format UTF 8.
Linguistic analysis (normalization, root reduction, tokenization).
The preprocessor sends the result back to the queue server.
Each queue has a start condition. The start condition defines when documents are to be indexed. When the start condition is met, the queue server sends the documents to the index server and triggers indexing there.
The index server adds the documents to its internal queue and calculates which changes to the index are required.
Once a certain number of documents have been indexed, the queue server triggers optimization.
The index server now processes its internal queue and performs the necessary changes to the index. It optimizes the index structure so that users can find all documents equally quickly and writes the complete index to the hard-drive again.
When optimization is complete, the documents can be found using the search function.