hanaml.RandomForestRegressor {hana.ml.r} | R Documentation |
Random Forest for Regression
Description
hanaml.RandomForestRegressor is a R wrapper for PAL
Random Decision Trees.
Usage
hanaml.RandomForestRegressor(conn.context, data = NULL,
formula = NULL,
features = NULL,
label = NULL, key = NULL,
n.estimators = NULL,
max.features = NULL,
max.depth = NULL,
min.samples.leaf = NULL,
split.threshold = NULL,
calculate.oob = TRUE,
random.state = NULL,
thread.ratio = NULL,
allow.missing.dependent = TRUE,
categorical.variable = NULL,
sample.fraction = NULL)
Arguments
conn.context |
ConnectionContext
The connection to the SAP HANA system.
|
data |
DataFrame
DataFrame containing the data.
|
formula |
formula type, optional
Formula to be used for model generation.
format = label~<feature_list>
eg: formula=CATEGORY~V1+V2+V3
You can either give the formula,
or a feature and label combination.
Do not provide both.
Defaults to NULL.
|
key |
character, optional
Name of the ID column of data.
If not provided, then data is assumed to have no ID column.
|
features |
list of character, optional
Names of the feature columns.
If features is not provided, it defaults
to all non-ID, no-label columns.
|
label |
character, optional
Name of the column in data
that specifies the dependent variable.
Defaults to the last no-ID column if not provided.
|
n.estimators |
integer, optional
Specifies the number of trees in the random forest.
Defaults to 100.
|
max.features |
integer, optional
Specifies the number of randomly selected splitting variables.
Should not be larger than the number of input features.
Defaults to 'sqrt(p)' (for classification) or 'p/3' (for regression),
where p is the number of input features.
|
max.depth |
integer, optional
The maximum depth of a tree.
By default it is unlimited.
|
min.samples.leaf |
integer, optional
Specifies the minimum number of records in a leaf.
Defaults to 5 for regression.
|
split.threshold |
double , optional
Specifies the stop condition: if the improvement value of the best
split is less than this value, the tree stops growing.
Defaults to 1e-5.
|
calculate.oob |
logical, optional
If TRUE, calculate the out-of-bag error.
Defaults to TRUE.
|
random.state |
integer, optional
Specifies the seed for random number generator.
0: Uses the current time (in seconds) as the seed.
Others: Uses the specified value as the seed.
Defaults to 0.
|
thread.ratio |
double, optional
Controls the proportion of available threads to use.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates up to all available threads. Values between 0 and
1 will use up to that percentage of available threads. Other values
are heuistically determined.
Defaults to -1 (heuristically determined).
|
allow.missing.dependent |
logical, optional
Specifies if a missing target value is allowed.
FALSE: Not allowed. An error occurs if a missing target is present.
TRUE: Allowed. The datum with a missing target is removed.
Defaults to TRUE.
|
categorical.variable |
character or list of characters, optional
Indicates features should be treated as categorical.
The behavior is dependent on what input is given.
'string': categorical.'integer' and 'double': continuous.
VALID only for integer variables; omitted otherwise.
The default value is detected from input data.
|
sample.fraction |
double, optional
The fraction of data used for training.
Assume there are n pieces of data, sample fraction is r, then n*r
data is selected for training.
Defaults to 1.0.
|
Format
R6Class
object.
Value
Return a "RandomForestClassifier" object with following values:
model : DataFrame
Trained model content.
feature.importance : DataFrame
The feature importance (the higher, the more important the feature).
oob.error : DataFrame
Out-of-bag error rate or mean squared error for random forest up
to indexed tree.
Set to None if calculate_oob is FALSE.
Note
Using Summary and Print
Summary provides a general summary of the output of the model.
Usage: summary(rfr) where rfr is the model generated
Print provides information on the coefficients and
the optional parameter values given by the user.
Usage: print(rfr) where rfr is the model generated.
See Also
predict.RandomForestRegressor
Examples
## Not run:
Input DataFrame df for training:
>df$Collect()
ID A B C D CLASS
0 0 -0.965679 1.142985 -0.019274 -1.598807 -23.633813
1 1 2.249528 1.459918 0.153440 -0.526423 212.532559
2 2 -0.631494 1.484386 -0.335236 0.354313 26.342585
3 3 -0.967266 1.131867 -0.684957 -1.397419 -62.563666
4 4 -1.175179 -0.253179 -0.775074 0.996815 -115.534935
......
Creating RandomForestRegressor instance and generating model:
> rfr <- hanaml.RandomForestRegressor(conn.context=cc, data = df, random.state=3)
> rfr$feature.importances$Collect()
VARIABLE_NAME IMPORTANCE
0 A 0.249593
1 B 0.381879
2 C 0.291403
3 D 0.077125
Input DataFrame for scoring:
> head(df3$Collect(),5)
ID A B C D CLASS
0 0 1.081277 0.204114 1.220580 -0.750665 139.10170
1 1 0.524813 -0.012192 -0.418597 2.946886 52.17203
2 2 -0.280871 0.100554 -0.343715 -0.118843 -34.69829
3 3 -0.113992 -0.045573 0.957154 0.090350 51.93602
4 4 0.287476 1.266895 0.466325 -0.432323 106.63425
..
Performing score() on given DataFrame:
> rfr$score(data = df3, features = list("A","B", "C","D")
0.8490768
## End(Not run)
[Package
hana.ml.r version 1.0.8
Index]