Example Graph for Simple Text Analysis
Use the text analysis example graph in the SAP Data Hub Modeler to build applications with natural language processing capabilities.
The example graph com.sap.textanalysis.example is a simple Text Analysis application. You can use the graph to analyze text documents given in-place or as files in HDFS. This graph uses the Text Analysis Connector operator, which sends JSON-formatted requests to a text analysis server.
If the request is received through the inFileData port (a 'data' type of request), then the graph analyzes each text document provided as input to the TA Request Creator operator in a separate request and the Text Analysis Connector operator outputs the result as a string to the terminal.
If the request is received through the inFolderPath port (a 'folder' type of request), then the request must include a folder name indicating the location of the documents the graph must analyze. The graph writes the result of the analysis directly in the specified location in two files [folder_name]_TA.csv and [folder_name]_TADOC.csv and the graph terminates.
- You have installed the vora-textanalysis service and have the port number.
- For folder type of requests:
- You have an HDFS server available, which is reachable from a network firewall, if any.
- You have the hostname for the HDFS server and its port number.
- If Kerberos is not enabled, the folder and subfolders in HDFS, where
your files are located allow write permissions to the 'root'
For more information, see 624ce81c22f94cb99a1100f4aae925e8.html
- Start the SAP Data Hub Modeler.
- In the navigation pane, select the Graphs tab.
- In the search box, enter
The tool loads the selected graph in the graph editor.
- Select the TA Request Creator operator and in the
right pane, select the Configuration tab to set the
configuration parameter values.
TA Request Creator
Host name and port number of the vora-textanalysis service
Use either LINGANALYSIS_BASIC, LINGANALYSIS_STEMS, LINGANALYSIS_FULL, EXTRACTION_CORE, EXTRACTION_CORE_ENTERPRISE, EXTRACTION_CORE_PUBLIC_SECTOR, or EXTRACTION_CORE_VOICEOFCUSTOMER. For more information on the description of each configuration, see the Text Analysis section in the Developer Guide for SAP Vora.
A list of languages used for language detection specified in ISO 639-1 codes. For example: 'EN,DE,ES'. If no language is specified, then automatic detection is attempted.
mimetype (Optional)The type of input documents. Allowed values are 'text/plain', 'text/html', 'text/xml', and 'text'. The value 'text' indicates that the input is one of plain text, HTML or XML. If not set, or if value is 'text', document identification and conversion are performed.
If the document contains text, this parameter indicates the encoding. For example: 'UTF-8'. If not set and the MIME type indicates text, encoding detection and conversion are performed.
folderpathFolder path to analyze, without trailing '/'. This value is required only if connected to inFolderPath port.
If true, the analysis is done recursively in the subfolders in the specified location. Output files are written locally in each subfolder.
- In the editor toolbar, choose (Save) to save the graph.
- In the editor toolbar, choose (Run) to execute the
The Status tab in the bottom pane shows the status for the graph execution as running to indicate that the graph is being executed.
For more information on the results of text analysis, see the section Text Analysis in the Developer Guide for SAP Vora.