hanaml.Text.Classification.Rd
hanaml.Text.Classification is a R wrapper for SAP Text Mining Text Classification algorithm.
hanaml.Text.Classification(
pred.data,
ref.data = NULL,
k.nearest.neighbours = NULL,
thread.ratio = NULL,
lang = NULL
)
DataFrame
The prediction data for classification.
DataFrame, optional
The reference data for classification.
Defaults to NULL.
integer, optional
Number of nearest neighbors (k).
double, optional
Specifies the ratio of total number of threads that can be used by this function.
The range of this parameter is from 0 to 1, where 0 means only using 1 thread,
and 1 means using at most all the currently available threads.
Values outside this range are ignored and this function heuristically determines the number of threads to use.
c('EN', 'DE', 'ES', 'FR', 'RU'), optional
Specify the language type. The HANA cloud instance currently supports 'EN', 'DE', 'ES', 'FR' and 'RU'.
If NULL, it will do the auto detection.
Defaults to NULL.
List of DataFrames
DataFrames of text classification results:
DataFrame 1: Text classification result.
DataFrame 2: Statistics table.
Input DataFrame data:
> data$collect()
ID CONTENT CATEGORY
1 doc1 term1 term2 term2 term3 term3 term3 CATEGORY_1
2 doc2 term2 term3 term3 term4 term4 term4 CATEGORY_1
3 doc3 term3 term4 term4 term5 term5 term5 CATEGORY_2
4 doc4 term3 term4 term4 term5 term5 term5 term5 term5 term5 CATEGORY_2
5 doc5 term4 term6 CATEGORY_3
6 doc6 term4 term6 term6 term6 CATEGORY_3
Call the function:
> result <- hanaml.Text.Classification(data$Select(data$columns[0], data$columns[1]), data)
Output:
> result[[1]]$head(1)$Collect()
ID TARGET
1 doc1 CATEGORY_1