Modeling Guide

Example Graph for Simple Text Analysis

Use the text analysis example graph in the SAP Data Hub Modeler to build applications with natural language processing capabilities.

The example graph com.sap.textanalysis.example is a simple Text Analysis application. You can use the graph to analyze text documents given in-place or as files in HDFS. This graph uses the Text Analysis Connector operator, which sends JSON-formatted requests to a text analysis server.

If the request is received through the inFileData port (a 'data' type of request), then the graph analyzes each text document provided as input to the TA Request Creator operator in a separate request and the Text Analysis Connector operator outputs the result as a string to the terminal.

If the request is received through the inFolderPath port (a 'folder' type of request), then the request must include a folder name indicating the location of the documents the graph must analyze. The graph writes the result of the analysis directly in the specified location in two files [folder_name]_TA.csv and [folder_name]_TADOC.csv and the graph terminates.

Prerequisites to execute the text analysis example graph
  • You have installed the vora-textanalysis service and have the port number.
  • For folder type of requests:
    • You have an HDFS server available, which is reachable from a network firewall, if any.
    • You have the hostname for the HDFS server and its port number.
    • If Kerberos is not enabled, the folder and subfolders in HDFS, where your files are located allow write permissions to the 'root' user.

      For more information, see 624ce81c22f94cb99a1100f4aae925e8.html

Executing the com.sap.textanalysis.example graph
  1. Start the SAP Data Hub Modeler.
  2. In the navigation pane, select the Graphs tab.
  3. In the search box, enter com.sap.textanalysis.example.

    The tool loads the selected graph in the graph editor.

  4. Select the TA Request Creator operator and in the right pane, select the Configuration tab to set the configuration parameter values.

    Operator

    Configuration Parameter

    Value

    TA Request Creator

    (Operator id: javascriptoperator1)

    serverendpoints

    Host name and port number of the vora-textanalysis service

    taconfig

    Use either LINGANALYSIS_BASIC, LINGANALYSIS_STEMS, LINGANALYSIS_FULL, EXTRACTION_CORE, EXTRACTION_CORE_ENTERPRISE, EXTRACTION_CORE_PUBLIC_SECTOR, or EXTRACTION_CORE_VOICEOFCUSTOMER. For more information on the description of each configuration, see the Text Analysis section in the Developer Guide for SAP Vora.

    languages (Optional)

    A list of languages used for language detection specified in ISO 639-1 codes. For example: 'EN,DE,ES'. If no language is specified, then automatic detection is attempted.

    mimetype (Optional)

    The type of input documents. Allowed values are 'text/plain', 'text/html', 'text/xml', and 'text'. The value 'text' indicates that the input is one of plain text, HTML or XML. If not set, or if value is 'text', document identification and conversion are performed.

    encoding (Optional)

    If the document contains text, this parameter indicates the encoding. For example: 'UTF-8'. If not set and the MIME type indicates text, encoding detection and conversion are performed.

    folderpath

    Folder path to analyze, without trailing '/'. This value is required only if connected to inFolderPath port.

    recursive

    If true, the analysis is done recursively in the subfolders in the specified location. Output files are written locally in each subfolder.

  5. In the editor toolbar, choose (Save) to save the graph.
  6. In the editor toolbar, choose (Run) to execute the graph.
    The Status tab in the bottom pane shows the status for the graph execution as running to indicate that the graph is being executed.

    For more information on the results of text analysis, see the section Text Analysis in the Developer Guide for SAP Vora.