hanaml.Text.TFIDF is a R wrapper for SAP HANA PAL text tfidf algorithm.

hanaml.Text.TFIDF(data, idf = NULL)

Arguments

data

DataFrame
Data to be analysis.

idf

DataFrame, optional
Inverse document frequency of documents.

Value

DataFrame
Inverse document frequency of documents.

Examples

Input DataFrame data:

> data$collect()
       ID	                            CONTENT
  0	doc1	term1 term2 term2 term3 term3 term3
  1	doc2	term2 term3 term3 term4 term4 term4
  2	doc3	term3 term4 term4 term5 term5 term5
  3	doc5	term3 term4 term4 term5 term5 term5 term5 term5 term5
  4	doc4	term4 term6
  5	doc6	term4 term6 term6 term6

Call the function:

> result <- hanaml.Text.Collector(data)
> tfidf <- hanaml.Text.TFIDF(data, result[[1]])

Output:

> tfidf$Head(3)$Collect()
       ID	TERMS	TF_VALUE	TFIDF_VALUE
  0	doc1	term1	     1.0	   1.791759
  1	doc1	term2	     2.0	   2.197225
  2	doc1	term3	     3.0	   1.216395