hana_ml.text package
This package contains a collection of algorithms for text related class and functions like text analysis, text mining and text chunking.
Note
If you wish to use Text Mining related functions, please be aware that the functionalities included in HANA On-Premise and HANA Cloud's Text Mining significantly vary. The functions supported by HANA On-Premise and HANA Cloud and the reference link are listed below.
In order to support both HANA On-Premise and HANA Cloud, hana_ml uses the same function name for the same functionality. For instance, 'text_classification' could map to the 'TM_CATEGORIZE_KNN' SQL procedure in HANA On-Premise, and 'PAL_TEXTCLASSIFICATION' in HANA Cloud. Moreover, certain parameters might be marked as only supported by HANA On-Premise.
If the HANA system you're using doesn't support certain functions and yet you attempt to use them, an error will be thrown.
HANA On-Premise Text mining
TextClassificationWithModel
(PAL_TEXTCLASSIFICATION_TRAIN/PAL_TEXTCLASSIFICATION_PREDICT) supported in HANA 2.0 SPS08.
TFIDF
(PAL_TEXT_COLLECT/PAL_TEXT_TFIDF ) supported since HANA 2.0 SPS07.
HANA Cloud Text mining
The algorithms are distributed into the following sub-packages.
hana_ml.text.tm
|
Perform Term Frequency(TF) analysis on the given document. |
|
This function classifies (categorizes) an input document with respect to sets of categories (taxonomies) using TF-IDF text vectorizer and KNN classifier. |
|
This function returns the top-ranked related documents for a query document / or multiple docments based on Term Frequency - Inverse Document Frequency (TF-IDF) result or reference data. |
|
This function returns the top-ranked related terms for a query term / or multiple terms based on Term Frequency - Inverse Document Frequency (TF-IDF) result or reference data. |
|
This function returns the top-ranked documents that are relevant to a term / or multiple terms based on Term Frequency - Inverse Document Frequency (TF-IDF) result or reference data. |
|
This function returns the top-ranked relevant terms that describe a document / or multiple docments based on Term Frequency - Inverse Document Frequency (TF-IDF) result or reference data. |
|
This function returns the top-ranked terms that match an initial substring / or multiple substrings based on Term Frequency - Inverse Document Frequency (TF-IDF) result or reference data. |
|
This function searches for the best matching documents based on the given keywords. |
|
Class for term frequency–inverse document frequency. |
|
Text classification class. |
hana_ml.text.anns_model
|
ANNS model create with IVF indexing. |
|
List the ANNS models. |
hana_ml.text.pal_embeddings
|
Embeds input documents into vectors. |
hana_ml.text.text_splitter
|
For a long text, it may be necessary to transform it to better suit. |