hana_ml.text package¶
This package contains a collection of algorithms for text related class and functions like text analysis, text mining and text chunking.
Note
If you wish to use text mining-related functions, please note that the functionalities available in HANA On-Premise and HANA Cloud's text mining differ significantly. The functions supported by HANA On-Premise and HANA Cloud, along with their reference links, are listed below.
To support both HANA On-Premise and HANA Cloud, hana_ml uses the same function names for equivalent functionalities. For example, text_classification maps to the TM_CATEGORIZE_KNN SQL procedure in HANA On-Premise and PAL_TEXTCLASSIFICATION in HANA Cloud. Additionally, certain parameters may be marked as supported only by HANA On-Premise.
If the HANA system you are using does not support certain functions and you attempt to use them, an error will be raised.
HANA On-Premise Text Mining
TextClassificationWithModel(PAL_TEXTCLASSIFICATION_TRAIN / PAL_TEXTCLASSIFICATION_PREDICT) supported since HANA 2.0 SPS08.TFIDF(PAL_TEXT_COLLECT/PAL_TEXT_TFIDF) supported since HANA 2.0 SPS07.
HANA Cloud Text Mining
The algorithms are distributed across the following sub-packages.
hana_ml.text.tm¶
|
Perform Term Frequency(TF) analysis on the given document. |
|
This Text Tokenize function extracts the given document into tokens. |
|
This function classifies (categorizes) an input document with respect to sets of categories (taxonomies) using TF-IDF text vectorizer and KNN classifier. |
|
This function returns the top-ranked related documents for a query document / or multiple docments based on Term Frequency - Inverse Document Frequency (TF-IDF) result or reference data. |
|
This function returns the top-ranked related terms for a query term / or multiple terms based on Term Frequency - Inverse Document Frequency (TF-IDF) result or reference data. |
|
This function returns the top-ranked documents that are relevant to a term / or multiple terms based on Term Frequency - Inverse Document Frequency (TF-IDF) result or reference data. |
|
This function returns the top-ranked relevant terms that describe a document / or multiple docments based on Term Frequency - Inverse Document Frequency (TF-IDF) result or reference data. |
|
This function returns the top-ranked terms that match an initial substring / or multiple substrings based on Term Frequency - Inverse Document Frequency (TF-IDF) result or reference data. |
|
This function searches for the best matching documents based on the given keywords. |
|
Class for term frequency–inverse document frequency. |
|
Text classification class. |
hana_ml.text.anns_model¶
|
ANNS model create with IVF indexing. |
|
List the ANNS models. |
hana_ml.text.pal_embeddings¶
|
Embeds input documents into vectors. |
hana_ml.text.text_splitter¶
|
For a long text, it may be necessary to transform it to better suit. |
hana_ml.text.ta¶
|
Text analysis function, can perform the task of POS (Part-of-Speech), NER (Named-Entity-Recognition) and sentiment-phrase-score. |
|
Part of Speech (POS) tagging is a natural language processing technique that involves assigning specific grammatical categories or labels (such as nouns, verbs, adjectives, adverbs, pronouns, etc.) to individual words within a sentence. |
|
This is a wrapper of named entity recognition (NER) functionality for text analysis, which aims at facilitating users' use of text analysis targeted specially for named entity recognition. |
|
A sentiment score, often referred to as a sentiment analysis score, is a numerical representation of the sentiment or emotion conveyed in a piece of text, be it a tweet, a product review, or an article. |