text_analysis

hana_ml.text.ta.text_analysis(data, thread_ratio=None, timeout=None)

Text analysis function, can perform the task of POS (Part-of-Speech), NER (Named-Entity-Recognition) and sentiment-phrase-score.

Parameters:

dataDataFrame

The input data for text analysis, must be a 4-column DataFrame structured as follows:

1st column : ID of input text, of type INT, VARCHAR if NVARCHAR
2nd column : Text content, of type VARCHAR, NVARCHAR or NCLOB
3rd column : Specifies the language of the text content, can be 'en', 'de', 'fr', 'es', 'pt' or empty (means automatically detected)
4th column : Specifies the task, which can be 'pos', 'ner', 'sentiment-phrase-score' or a combination of them (separated by comma, e.g. 'pos, sentiment-phrase-score').

thread_ratiofloat, optional

Specifies the ratio of threads that can be used by this function, with valid range from 0 to 1, where

Values outside valid range are ignored (no error thrown), and in such case the function heuristically determines the number of threads to use.

Defaults to 0.0.

timeoutint, optional

Specifies the maximum amount of time (in seconds) the client will wait for a response from the server.

Defaults to 10.

Returns:

A tuple of DataFrames:

Examples

>>> sentences, pos, ner, doc_sentiment,  sentence_sentiment, phrase_sentiment, extra = text_analysis(data=df, thread_ratio=0.5, timeout=20)