hanaml.IsolationForest is a R wrapper for SAP HANA PAL Isolation Forest algorithm.

hanaml.IsolationForest(
  data = NULL,
  key = NULL,
  features = NULL,
  n.estimators = NULL,
  max.samples = NULL,
  max.features = NULL,
  bootstrap = NULL,
  random.state = NULL,
  thread.ratio = NULL
)

Arguments

data

DataFrame
Input data which includes key and feature columns.

key

character, optional
Name of the ID column. If not provided, the data is assumed to have no ID column.
No default value.

features

character or list of characters, optional
Name of feature columns.
If not provided, it defaults all non-key columns of data.

n.estimators

integer, optional
Specifies the number of trees to grow.
Defaults to 100.

max.samples

integer, optional
Specifies the number of samples to draw from input to train each tree. If max.samples is larger than the number of samples provided, all samples will be used for all trees.
Defaults to 256.

max.features

integer, optional
Specifies the number of features to draw from input to train each tree.
0 means no sampling.
Defaults to 0.

bootstrap

logical, optional
Specifies sampling method.

  • FALSE: Sampling without replacement.

  • TRUE: Sampling with replacement.

Defaults to FALSE.

random.state

integer, optional
Specifies the seed for random number generator.

  • 0: Uses the current time (in second) as seed.

  • Others: Uses the specified value as seed.

Defaults to 0.

thread.ratio

numeric, optional
The ratio of available threads.

  • 0: single thread.

  • 0~1: percentage.

  • Others: heuristically determined.

Defaults to -1.

Value

Returns a "hanaml.IsolationForest" object with following values:

  • model : DataFrame
    The fitted model.

Details

Isolation Forest generates the anomaly score of each sample.

Examples

Input DataFrame data:


> data$Collect()
   ID  V000  V001
1   0  -2.0  -1.0
2   1  -1.0  -1.0
3   2  -1.0  -2.0
4   3   1.0   1.0
5   4   1.0   2.0
6   5   2.0   1.0
7   6   6.0   3.0
8   7  -4.0   7.0

Call the function:


> isof <- hanaml.IsolationForest(data=df,
                                 key='ID',
                                 features=list("V000", "V001"),
                                 random.state=2,
                                 thread.ratio=0)

Output:


> isof$model$Collect()
   TREE_INDEX                                        MODEL_CONTENT
1           0  {"NS":8,"NF":2,"FX":[0,1],"1":{"SF":0,"SV":5.5...}}
2           1  {"NS":8,"NF":2,"FX":[0,1],"1":{"SF":0,"SV":5.5...}}
3           2  {"NS":8,"NF":2,"FX":[1,0],"1":{"SF":0,"SV":4.6...}}
4           3  {"NS":8,"NF":2,"FX":[1,0],"1":{"SF":0,"SV":4.6...}}
5           4  {"NS":8,"NF":2,"FX":[1,0],"1":{"SF":0,"SV":5.3...}}