hanaml.MLPClassifier {hana.ml.r}R Documentation

Multi-layer perceptron (MLP) Classifier

Description

hanaml.MLPClassifier is a R wrapper for PAL Multi-layer Perceptron algorithm.

Usage

hanaml.MLPClassifier(conn.context, data = NULL, key = NULL,
                     features = NULL, label = NULL,
                     formula = NULL, hidden.layer.size = NULL,
                     activation = NULL, output.activation = NULL,
                     learning.rate = NULL, momentum = NULL,
                     training.style = NULL, max.iter = NULL,
                     normalization = NULL, weight.init = NULL,
                     thread.ratio = NULL, categorical.variable = NULL,
                     batch.size = NULL, resampling.method = NULL,
                     evaluation.metric = NULL, fold.num = NULL,
                     repeat.times = NULL, param.search.strategy = NULL,
                     random.search.times = NULL, seed = NULL,
                     timeout = NULL, progress.indicator.id = NULL,
                     param.range = NULL, param.values = NULL)

Arguments

conn.context

ConnectionContext
The connection to the SAP HANA system.

data

DataFrame
DataFrame containing the data.

key

character, optional
Name of the ID column. If 'key' is not provided, it is assumed that the input has no ID column.

features

list of characters, optional
Names of the feature columns. If 'features' is not provided, it defaults to all the non-ID and non-label columns.

label

character, optional
Name of the label column. If 'label' is not provided, it defaults to the last column.

formula

formula type, optional
Formula to be used for model generation. format = label~<feature_list> e.g.formula = LABEL~V1+V2+V3 You can either give the formula, or a feature and label combination. Do not provide both.

activation

character
Activation function for the hidden layer should be from below list: 'tanh', 'linear', 'sigmoid-asymmetric', 'sigmoid-symmetric', 'gaussian-asymmetric', 'gaussian-symmetric', 'elliot-asymmetric', 'elliot-symmetric', 'sin-asymmetric', 'sin-symmetric', 'cos-asymmetric', 'cos-symmetric', 'relu'

output.activation

character
Output Activation function for the hidden layer should be from below list: 'tanh', 'linear', 'sigmoid-asymmetric', 'sigmoid-symmetric', 'gaussian-asymmetric', 'gaussian-symmetric', 'elliot-asymmetric', 'elliot-symmetric', 'sin-asymmetric', 'sin-symmetric', 'cos-asymmetric', 'cos-symmetric', 'relu'

hidden.layer.size

list of integers, optional
The size of each hidden layer.

max.iter

numeric, optional
The maximum number of iterations.
Defaults to 100.

training.style

{"batch", "stochastic"}, optional
Specifies the training style.
Defaults to "stochastic".

learning.rate

double, optional
Specifies the learning rate. Only valid when training.style is "stochastic".

momentum

double, optional
Specifies the momentum for gradient descent update. Only valid when training.style is "stochastic".

batch.size

int, optional
Specifies the size of mini batch. Only valid when training.style is stochastic.
Defaults to 1.

normalization

{"no", "z-transform", "scalar"}, optional
Defaults to 'no' .

weight.init

character, optional
Specifies the weight initial value from below list. 'all-zeros', 'normal', 'uniform', 'variance-scale-normal', 'variance-scale-uniform'
Defaults to 'all-zeros'.

categorical.variable

character or list of characters, optional
Column names in the data table used as category variable.

thread.ratio

double, optional
Controls the proportion of available threads to use. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads. Values between 0 and 1 will use that percentage of available threads. Values outside this range tell PAL to heuristically determine the number of threads to use.
Defaults to 0.

resampling.method

character, optional
specifies the resampling values form below list. 'cv', 'stratified_cv', 'bootstrap', 'stratified_bootstrap'
If no value is specified for this parameter, neither model evaluation nor parameter selection is activated.

evaluation.metric

character, optional
Specifies the evaluation metric for model evaluation or parameter selection. 'ACCURACY', 'F1_SCORE', 'AUC_1VsRest', 'AUC_pairwise', 'RMSE'

fold.num

numeric, optional
Specifies the fold number for the cross validation method.

repeat.times

numeric, optional
Specifies the number of repeat times for resampling.
Defaults to 1.

param.search.strategy

character, optional
Specifies the method to activate parameter selection. values should either be 'GRID' or 'RANDOM'

random.search.times

numeric, optional
Specifies the number of times to randomly select candidate parameters for selection.

seed

numeric, optional
Specifies the seed for random generation.
Use system time when 0 is specified.

timeout

numeric, optional
Specifies maximum running time for model evaluation or parameter selection, in seconds. No timeout when 0 is specified.

progress.indicator.id

character, optional
Sets an ID of progress indicator for model evaluation or parameter selection.
No progress indicator is active if no value is provided.

param.values

list, optional
Specifies values of the following parameters for parameter selection:
'action', 'output.action', 'hidden.layer.size', 'learning.rate', 'momentum', 'batch.size'.

param.range

list, optional
Specifies range of the following parameters for parameter selection:
'learning.rate', 'momentum', 'batch.size'.

Format

R6Class object.

Value

An "MLPClassifier" object with the following attributes:

model: DataFrame

log: DataFrame

statistics: DataFrame

Examples

## Not run: 
Training data df:

> df <- conn.context$table("PAL_TRAIN_MLP_REG_DATA_TBL")
> df$Collect()
   V000  V001 V002  V003 LABEL
0     1  1.71   AC     0    AA
1    10  1.78   CA     5    AB
2    17  2.36   AA     6    AA
3    12  3.15   AA     2     C
4     7  1.05   CA     3    AB
5     6  1.50   CA     2    AB
6     9  1.97   CA     6     C
7     5  1.26   AA     1    AA
8    12  2.13   AC     4     C
9    18  1.87   AC     6    AA

Training the model:

> mlpc <- hanaml.MLPClassifier(conn.context = conn, data = df, key = NULL,
                              features = NULL, label = NULL,
                              hidden.layer.size = c(10,10),
                              activation = "TANH", output.activation ="TANH",
                              learning.rate = 0.001, momentum = 0.0001,
                              training.style = "stochastic", max.iter = 100,
                              normalization = "z-transform", weight.init = "normal",
                              thread.ratio = 0.3, categorical.variable = "V003")

Training result may look different from the following results due to model randomness.


> mlpc$train.log$Collect()

         ITERATION ERROR
    0           1  1.080261
    1           2  1.008358
    2           3  0.947069
    3           4  0.894585
    4           5  0.849411
    5           6  0.810309
    6           7  0.776256
    7           8  0.746413
    8           9  0.720093
    9          10  0.696737
    10         11  0.675886
    11         12  0.657166
    12         13  0.640270
    13         14  0.624943
    14         15  0.609432
    ..        ...       ...
    91         92  0.317840
    92         93  0.316630
    93         94  0.315376
    94         95  0.314210
    95         96  0.313066
    96         97  0.312021
    97         98  0.310916
    98         99  0.309770
    99        100  0.308704

Model evaluation example:

> df <- conn.context$table("PAL_TRAIN_MLP_EVAL_DATA_TBL")
> df$Collect()
   V000  V001 V002  V003 LABEL
0     1  1.71   AC     0    AA
1    10  1.78   CA     5    AB
2    17  2.36   AA     6    AA
3    12  3.15   AA     2     C
4     7  1.05   CA     3    AB
5     6  1.50   CA     2    AB
6     9  1.97   CA     6     C
7     5  1.26   AA     1    AA
8    12  2.13   AC     4     C
9    18  1.87   AC     6    AA
Training the model:

> mlpc <- hanaml.MLPClassifier(conn.context, data = df, label= "LABEL",
                               hidden.layer.size = c(10,10),
                               activation = "tanh" ,output.activation = "tanh",
                               learning.rate = 0.001, momentum=0.00001,
                               training.style = "stochastic",
                               categorical.variable = "V003", max.iter = 100,
                               normalization = "z-transform",
                               weight.init = "normal", thread.ratio = 0.3,
                               resampling.method = "cv",
                               evaluation.metric = "f1_score",
                               fold.num = 10, repeat.times = 2,
                               seed = 1, progress.indicator.id = "TEST")


Parameter Selection Example:

> df <- conn.context$table("PAL_TRAIN_MLP_EVAL_DATA_TBL")

Training the model
> mlpc <- hanaml.MLPClassifier(conn.context, data = df, label= "LABEL",
                              learning.rate=0.001, momentum=0.00001,
                              training.style="stochastic",
                              categorical.variable = "V003",
                              max.iter = 100, normalization = "z-transform",
                              weight.init = "normal", thread.ratio = 0.3,
                              resampling.method = "stratified_bootstrap",
                              evaluation.metric = "ACCURACY",
                              param.search.strategy = "grid",
                              repeat.times = 2, seed = 1,
                              progress.indicator.id = "TEST",
                              param.values = list("hidden.layer.size" =
                                                   list(c(10,10), c(5,5,5)),
                                                  "activation" =
                                                  c("tanh",
                                                  "linear",
                                                  "sigmoid-asymmetric"),
                                                  "output.activation" =
                                                  c("sigmoid-symmetric",
                                                  "gaussian-asymmetric",
                                                  "gaussian-symmetric")))


Optimal Parameters:

0   PARAM_NAME                  INT_VALUE   DOUBLE_VALUE  STRING_VALUE
1   HIDDEN_LAYER_SIZE             ?             ?             10,10
2   OUTPUT_LAYER_ACTIVE_FUNC      4             ?             ?
3   HIDDEN_LAYER_ACTIVE_FUNC      1             ?             ?


## End(Not run)

[Package hana.ml.r version 1.0.8 Index]