In the Preprocessor: View Docs window you can manually carry out processing steps that the preprocessor carries out automatically when indexing takes place.
You can specify a document for the preprocessor to process as follows:
Load and filter
Load, filter, and analyze linguistically
You use this function for test purposes and for troubleshooting.
The graphic below depicts the structure of the Preprocessor View Docs window.
Function Bars
The function bars contain the following fields and buttons:
Field/Button |
Description |
File/URL |
Path of URI to the document that the preprocessor is to process. |
Show Original |
Opens the original document. An application for opening the document must be installed. |
Show Filtered |
Displays the filtered document in the browser. |
None |
Deletes the output area. |
Get+Filter |
Loads and filters the document. You use this function to check which HTML code the filters generate from the original document. |
Get+Filter+Lex |
Loads and filters the document and analyzes it linguistically. You can also use this function to check the results of the linguistic analysis as follows:
|
Index |
Only relevant for Get+Filter+Lex. Defines whether the preprocessor is to use global or index-specific settings for processing.
|
Output Areas
The output areas display the results of the analysis. There are the following output areas:
Output area |
Description |
Document content |
Document content that the preprocessor issues after processing. Depending on the selected function, you see the following:
|
Python Extensions and Document Attributes |
Only relevant for Get+Filter+Lex.
|
Categories |
Only relevant for Get+Filter+Lex. Word types that the linguistic analysis assigned to the terms. If the list contains the category 'unknown', the linguistics analysis was not able to assign a word type to some of the terms. TREX includes these terms in the text-mining index as nouns by default (category: nn). This setting is defined in TREXMiningIndex.ini. Note
If there are a large number of terms in the category 'unknown', it is possible that language recognition determined the language incorrectly. Terms may be classified as unknown even if the correct document language was determined. This might be due to proper nouns such as the names of people, products, and places. It is planned to add recognition support for proper nouns (names entities) in a later TREX release. However, this function is not available in the current release. The NE node is therefore currently inactive. |
Status Bar
If you have executed the function Get+Filter+Lex, the status bar displays the following information.
Area |
Description |
Links |
Document language that the language recognition determined. |
Middle |
If you place the cursor on a term in the output area containing the document content, the following information appears:
|
Right |
You can use the <html> field to change the view of the document content. You can switch between views with and without HTML tags. |