hanaml.TF.Analysis is a R wrapper for SAP HANA PAL TF Analysis algorithm.

hanaml.TF.Analysis(data)

Arguments

data

DataFrame
DataFrame containting the data.

Value

List of DataFrames
DataFrames of TF-IDF results:

  • DataFrame 1: TF-IDF result,

  • DataFrame 2: Document term frequency table,

  • DataFrame 3: Document category table

Examples

Input DataFrame data:

> data$collect()
        ID                                                  CONTENT       CATEGORY
  0   doc1                      term1 term2 term2 term3 term3 term3     CATEGORY_1
  1   doc2                      term2 term3 term3 term4 term4 term4     CATEGORY_1
  2   doc3                      term3 term4 term4 term5 term5 term5     CATEGORY_2
  3   doc4    term3 term4 term4 term5 term5 term5 term5 term5 term5     CATEGORY_2
  4   doc5                                              term4 term6     CATEGORY_3
  5   doc6                                  term4 term6 term6 term6     CATEGORY_3

Call the function:

> result <- hanaml.TF.Analysis(data)

Output:

> result[[1]]$head(3)$Collect()
    TM_TERMS TM_TERM_TF_F  TM_TERM_IDF_F  TM_TERM_TF_V  TM_TERM_IDF_V
  0    term1            1              1      0.030303       1.791759
  1    term2            3              2      0.090909       1.098612
  2    term3            7              4      0.212121       0.405465