TF.Analysis — hanaml.TF.Analysis • hana.ml.r

hanaml.TF.Analysis is a R wrapper for SAP Text Mining TF Analysis algorithm.

hanaml.TF.Analysis(data, lang = NULL)

Arguments

data: DataFrame
DataFrame containting the data.
lang: c('EN', 'DE', 'ES', 'FR', 'RU'), optional
Specify the language type. The HANA cloud instance currently supports 'EN', 'DE', 'ES', 'FR' and 'RU'. If NULL, it will do the auto detection.
Defaults to NULL.

Value

List of DataFrames
DataFrames of TF-IDF results:

DataFrame 1: TF-IDF result.
DataFrame 2: Document term frequency table.
DataFrame 3: Document category table.

Examples

Input DataFrame data:


> data$collect()
      ID                                                  CONTENT       CATEGORY
1   doc1                      term1 term2 term2 term3 term3 term3     CATEGORY_1
2   doc2                      term2 term3 term3 term4 term4 term4     CATEGORY_1
3   doc3                      term3 term4 term4 term5 term5 term5     CATEGORY_2
4   doc4    term3 term4 term4 term5 term5 term5 term5 term5 term5     CATEGORY_2
5   doc5                                              term4 term6     CATEGORY_3
6   doc6                                  term4 term6 term6 term6     CATEGORY_3

Call the function:


> result <- hanaml.TF.Analysis(data)

Output:


> result[[1]]$head(3)$Collect()
  TM_TERMS TM_TERM_TF_F  TM_TERM_IDF_F  TM_TERM_TF_V  TM_TERM_IDF_V
1    term1            1              1      0.030303       1.791759
2    term2            3              2      0.090909       1.098612
3    term3            7              4      0.212121       0.405465