hanaml.HGBTClassifier.Rdhanaml.HGBTClassifier is a R wrapper for SAP HANA PAL HGBT.
hanaml.HGBTClassifier( data = NULL, key = NULL, features = NULL, label = NULL, formula = NULL, n.estimators = NULL, random.state = NULL, subsample = NULL, max.depth = NULL, split.threshold = NULL, learning.rate = NULL, split.method = NULL, sketch.eps = NULL, fold.num = NULL, min.sample.weight.leaf = NULL, min.samples.leaf = NULL, max.w.in.split = NULL, col.subsample.split = NULL, col.subsample.tree = NULL, lambda = NULL, alpha = NULL, adopt.prior = NULL, evaluation.metric = NULL, reference.metric = NULL, parameter.range = NULL, parameter.values = NULL, resampling.method = NULL, repeat.times = NULL, param.search.strategy = NULL, random.search.times = NULL, timeout = NULL, progress.indicator.id = NULL, calculate.importance = NULL, calculate.cm = NULL, base.score = NULL, thread.ratio = NULL, categorical.variable = NULL )
| data |
|
|---|---|
| key |
|
| features |
|
| label |
|
| formula |
|
| n.estimators |
|
| random.state |
|
| subsample |
|
| max.depth |
|
| split.threshold |
|
| learning.rate |
|
| split.method |
|
| sketch.eps |
|
| fold.num |
|
| min.sample.weight.leaf |
|
| min.samples.leaf |
|
| max.w.in.split |
|
| col.subsample.split |
|
| col.subsample.tree |
|
| lambda |
|
| alpha |
|
| adopt.prior |
|
| evaluation.metric |
|
| reference.metric |
|
| parameter.range |
|
| parameter.values |
|
| resampling.method |
If no value is specified for this parameter,
then no model evaluation or parameter selection will be activated. |
| repeat.times |
|
| param.search.strategy |
If this parameter is not set, then only model evaluation is activated. |
| random.search.times |
|
| timeout |
|
| progress.indicator.id |
|
| calculate.importance |
|
| calculate.cm |
|
| base.score |
|
| thread.ratio |
|
| categorical.variable |
VALID only for variables of "INTEGER" type, omitted otherwise. |
A "HGBTClassifier" object with the following attributes:
model DataFrame
ROW_INDEX - model row index
TREE_INDEX - tree index( -1 indicates the global information.)
MODEL_CONTENT - model content
feature.importances DataFrame
VARIABLE_NAME - Independent variable name
IMPORTANCE - Variable importance
confusion.matrix DataFrame
ACTUAL_CLASS - The actual class name
PREDICTED_CLASS - The predicted class name
COUNT - Number of records
stats DataFrame
STAT_NAME - Statistics name
STAT_VALUE - Statistics value
cv DataFrame
PARM_NAME - parameter name
INT_VALUE - integer value
DOUBLE_VALUE - double value
STRING_VALUE - character value
Input DataFrame data:
> data$Collect()
ATT1 ATT2 ATT3 ATT4 LABEL
1 1.0 10.0 100.0 1.0 A
2 1.1 10.1 100.0 1.0 A
3 1.2 10.2 100.0 1.0 A
4 1.3 10.4 100.0 1.0 A
5 1.2 10.3 100.0 1.0 A
6 4.0 40.0 400.0 4.0 B
7 4.1 40.1 400.0 4.0 B
Call the function:
> ghc <- hanaml.HGBTClassifier(data = data,
features = c("ATT1", "ATT2", "ATT3", "ATT4"),
label = "LABEL",
n.estimators = 4, split.threshold = 0,
learning.rate = 0.5, fold.num = 5, max.depth = 6,
evaluation.metric = "error.rate", reference.metric = c("auc"),
parameter.range = list("learning.rate" = c(0.1, 1.0, 3),
"n.estimators" = c(4, 3, 10),
"split.threshold" = c(0.1, 0.3, 1.0)))
Output:
> ghc.stats$Collect()
STAT_NAME STAT_VALUE
1 ERROR_RATE_MEAN 0.133333
2 ERROR_RATE_VAR 0.0266666
3 AUC_MEAN 0.9