hanaml.TF.Analysis.Rdhanaml.TF.Analysis is a R wrapper for SAP HANA PAL TF Analysis algorithm.
hanaml.TF.Analysis(data)
| data |
|
|---|
List of DataFrames
DataFrames of TF-IDF results:
DataFrame 1: TF-IDF result,
DataFrame 2: Document term frequency table,
DataFrame 3: Document category table
Input DataFrame data:
> data$collect()
ID CONTENT CATEGORY
0 doc1 term1 term2 term2 term3 term3 term3 CATEGORY_1
1 doc2 term2 term3 term3 term4 term4 term4 CATEGORY_1
2 doc3 term3 term4 term4 term5 term5 term5 CATEGORY_2
3 doc4 term3 term4 term4 term5 term5 term5 term5 term5 term5 CATEGORY_2
4 doc5 term4 term6 CATEGORY_3
5 doc6 term4 term6 term6 term6 CATEGORY_3
Call the function:
> result <- hanaml.TF.Analysis(data)
Output:
> result[[1]]$head(3)$Collect()
TM_TERMS TM_TERM_TF_F TM_TERM_IDF_F TM_TERM_TF_V TM_TERM_IDF_V
0 term1 1 1 0.030303 1.791759
1 term2 3 2 0.090909 1.098612
2 term3 7 4 0.212121 0.405465