hana_ml.text.tm package

This module provides various functions of text mining. The following functions are available:

hana_ml.text.tm

tf_analysis(data[, lang, enable_stopwords, ...])

Perform Term Frequency(TF) analysis on the given document.

text_classification(pred_data[, ref_data, ...])

This function classifies (categorizes) an input document with respect to sets of categories (taxonomies) using TF-IDF text vectorizer and KNN classifier.

get_related_doc(pred_data[, ref_data, top, ...])

This function returns the top-ranked related documents for a query document / or multiple docments based on Term Frequency - Inverse Document Frequency (TF-IDF) result or reference data.

get_related_term(pred_data[, ref_data, top, ...])

This function returns the top-ranked related terms for a query term / or multiple terms based on Term Frequency - Inverse Document Frequency (TF-IDF) result or reference data.

get_relevant_doc(pred_data[, ref_data, top, ...])

This function returns the top-ranked documents that are relevant to a term / or multiple terms based on Term Frequency - Inverse Document Frequency (TF-IDF) result or reference data.

get_relevant_term(pred_data[, ref_data, ...])

This function returns the top-ranked relevant terms that describe a document / or multiple docments based on Term Frequency - Inverse Document Frequency (TF-IDF) result or reference data.

get_suggested_term(pred_data[, ref_data, ...])

This function returns the top-ranked terms that match an initial substring / or multiple substrings based on Term Frequency - Inverse Document Frequency (TF-IDF) result or reference data.

TFIDF()

Class for term frequency–inverse document frequency.

TextClassificationWithModel([language, ...])

Text classification class.