hanaml.Text.TFIDF is a R wrapper for SAP Text Mining text tfidf algorithm.

hanaml.Text.TFIDF(data, idf = NULL)

Arguments

data

DataFrame
Data to be analysis.

idf

DataFrame, optional
Inverse document frequency of documents.

Value

DataFrame
Inverse document frequency of documents.

Examples

Input DataFrame data:


> data$collect()
       ID                                CONTENT
1    doc1    term1 term2 term2 term3 term3 term3
2    doc2    term2 term3 term3 term4 term4 term4
3    doc3    term3 term4 term4 term5 term5 term5
4    doc5    term3 term4 term4 term5 term5 term5 term5 term5 term5
5    doc4    term4 term6
6    doc6    term4 term6 term6 term6

Call the function:


> result <- hanaml.Text.Collector(data)
> tfidf <- hanaml.Text.TFIDF(data, result[[1]])

Output:


> tfidf$Head(3)$Collect()
     ID  TERMS   TF_VALUE    TFIDF_VALUE
1  doc1  term1        1.0       1.791759
2  doc1  term2        2.0       2.197225
3  doc1  term3        3.0       1.216395