hanaml.HGBTRegressor.Rdhanaml.HGBTRegressor is a R wrapper for SAP HANA PAL HGBT.
hanaml.HGBTRegressor( data = NULL, key = NULL, features = NULL, label = NULL, formula = NULL, n.estimators = NULL, random.state = NULL, subsample = NULL, max.depth = NULL, split.threshold = NULL, learning.rate = NULL, split.method = NULL, sketch.eps = NULL, fold.num = NULL, min.sample.weight.leaf = NULL, min.samples.leaf = NULL, max.w.in.split = NULL, col.subsample.split = NULL, col.subsample.tree = NULL, lambda = NULL, alpha = NULL, adopt.prior = NULL, evaluation.metric = NULL, reference.metric = NULL, parameter.range = NULL, parameter.values = NULL, resampling.method = NULL, repeat.times = NULL, param.search.strategy = NULL, random.search.times = NULL, timeout = NULL, progress.indicator.id = NULL, calculate.importance = NULL, base.score = NULL, thread.ratio = NULL, categorical.variable = NULL )
| data |
|
|---|---|
| key |
|
| features |
|
| label |
|
| formula |
|
| n.estimators |
|
| random.state |
|
| subsample |
|
| max.depth |
|
| split.threshold |
|
| learning.rate |
|
| split.method |
The exact method comparably has the highest test accuracy, but costs more time.
On the other hand, the other two methods have relative higher computational efficiency but might
lead to lower test accuracy, and are considered to be adopted as the training data set is huge. |
| sketch.eps |
|
| fold.num |
|
| min.sample.weight.leaf |
|
| min.samples.leaf |
|
| max.w.in.split |
|
| col.subsample.split |
|
| col.subsample.tree |
|
| lambda |
|
| alpha |
|
| adopt.prior |
|
| evaluation.metric |
|
| reference.metric |
|
| parameter.range |
|
| parameter.values |
|
| resampling.method |
If no value is specified for this parameter,
then no model evaluation or parameter selection will be activated. |
| repeat.times |
|
| param.search.strategy |
If this parameter is not set, then only model evaluation is activated. |
| random.search.times |
|
| timeout |
|
| progress.indicator.id |
|
| calculate.importance |
|
| base.score |
|
| thread.ratio |
|
| categorical.variable |
VALID only for variables of "INTEGER" type, omitted otherwise. |
An "HGBTRegressor" object with the following attributes:
model: DataFrame
ROW_INDEX - model row index
TREE_INDEX - tree index( -1 indicates the global information.)
MODEL_CONTENT - model content
feature.importances DataFrame
VARIABLE_NAME - Independent variable name
IMPORTANCE - Variable importance
stats DataFrame
STAT_NAME - Statistics name
STAT_VALUE - Statistics value
cv DataFrame
PARM_NAME - parameter name
INT_VALUE - integer value
DOUBLE_VALUE - double value
STRING_VALUE - character value
Input DataFrame data for training:
> data$Collect()
ATT1 ATT2 ATT3 ATT4 TARGET
1 19.76 6235.0 100.00 100.00 25.10
2 17.85 46230.0 43.67 84.53 19.23
3 19.96 7360.0 65.51 81.57 21.42
4 16.80 28715.0 45.16 93.33 18.11
5 18.20 21934.0 49.20 83.07 19.24
6 16.71 1337.0 74.84 94.99 19.31
7 18.81 17881.0 70.66 92.34 20.07
Call the function:
> hgr <- HGBTRegressor(data,
features = c("ATT1","ATT2","ATT3", "ATT4"),
label = "TARGET",
n.estimators = 20, split.threshold = 0.75,
split.method = "exact", learning.rate = 0.75,
fold.num = 5, max.depth = 6,
evaluation.metric = "rmse", reference.metric = c("mae"),
parameter.range = list("learning.rate" = c(0.25, 1.0, 4),
"n.estimators" = c(10, 1, 20),
"split.threshold" = c(0.0, 0.2, 1.0)))
Output:
> hgr$feature.importances$Collect() VARIABLE_NAME IMPORTANCE 1 ATT1 0.744019 2 ATT2 0.164429 3 ATT3 0.078935 4 ATT4 0.012617