R: Multi-layer perceptron (MLP) Classifier

hanaml.MLPClassifier {hana.ml.r}

R Documentation

Multi-layer perceptron (MLP) Classifier

Description

hanaml.MLPClassifier is a R wrapper for PAL Multi-layer Perceptron algorithm.

Usage

hanaml.MLPClassifier(conn.context, data = NULL, key = NULL,
                     features = NULL, label = NULL,
                     formula = NULL, hidden.layer.size = NULL,
                     activation = NULL, output.activation = NULL,
                     learning.rate = NULL, momentum = NULL,
                     training.style = NULL, max.iter = NULL,
                     normalization = NULL, weight.init = NULL,
                     thread.ratio = NULL, categorical.variable = NULL,
                     batch.size = NULL, resampling.method = NULL,
                     evaluation.metric = NULL, fold.num = NULL,
                     repeat.times = NULL, param.search.strategy = NULL,
                     random.search.times = NULL, seed = NULL,
                     timeout = NULL, progress.indicator.id = NULL,
                     param.range = NULL, param.values = NULL)

Arguments

`conn.context`	`ConnectionContext` The connection to the SAP HANA system.
`data`	`DataFrame` DataFrame containing the data.
`key`	`character, optional` Name of the ID column. If 'key' is not provided, it is assumed that the input has no ID column.
`features`	`list of characters, optional` Names of the feature columns. If 'features' is not provided, it defaults to all the non-ID and non-label columns.
`label`	`character, optional` Name of the label column. If 'label' is not provided, it defaults to the last column.
`formula`	`formula type, optional` Formula to be used for model generation. format = label~<feature_list> e.g.formula = LABEL~V1+V2+V3 You can either give the formula, or a feature and label combination. Do not provide both.
`activation`	`character` Activation function for the hidden layer should be from below list: `'tanh', 'linear', 'sigmoid-asymmetric', 'sigmoid-symmetric', 'gaussian-asymmetric', 'gaussian-symmetric', 'elliot-asymmetric', 'elliot-symmetric', 'sin-asymmetric', 'sin-symmetric', 'cos-asymmetric', 'cos-symmetric', 'relu'`
`output.activation`	`character` Output Activation function for the hidden layer should be from below list: `'tanh', 'linear', 'sigmoid-asymmetric', 'sigmoid-symmetric', 'gaussian-asymmetric', 'gaussian-symmetric', 'elliot-asymmetric', 'elliot-symmetric', 'sin-asymmetric', 'sin-symmetric', 'cos-asymmetric', 'cos-symmetric', 'relu'`
`hidden.layer.size`	`list of integers, optional` The size of each hidden layer.
`max.iter`	`numeric, optional` The maximum number of iterations. Defaults to 100.
`training.style`	`{"batch", "stochastic"}, optional` Specifies the training style. Defaults to "stochastic".
`learning.rate`	`double, optional` Specifies the learning rate. Only valid when training.style is "stochastic".
`momentum`	`double, optional` Specifies the momentum for gradient descent update. Only valid when training.style is "stochastic".
`batch.size`	`int, optional` Specifies the size of mini batch. Only valid when training.style is stochastic. Defaults to 1.
`normalization`	`{"no", "z-transform", "scalar"}`, optional Defaults to 'no' .
`weight.init`	`character, optional` Specifies the weight initial value from below list. `'all-zeros', 'normal', 'uniform', 'variance-scale-normal', 'variance-scale-uniform'` Defaults to 'all-zeros'.
`categorical.variable`	`character or list of characters, optional` Column names in the data table used as category variable.
`thread.ratio`	`double, optional` Controls the proportion of available threads to use. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads. Values between 0 and 1 will use that percentage of available threads. Values outside this range tell PAL to heuristically determine the number of threads to use. Defaults to 0.
`resampling.method`	`character, optional` specifies the resampling values form below list. `'cv', 'stratified_cv', 'bootstrap', 'stratified_bootstrap'` If no value is specified for this parameter, neither model evaluation nor parameter selection is activated.
`evaluation.metric`	`character, optional` Specifies the evaluation metric for model evaluation or parameter selection. `'ACCURACY', 'F1_SCORE', 'AUC_1VsRest', 'AUC_pairwise', 'RMSE'`
`fold.num`	`numeric, optional` Specifies the fold number for the cross validation method.
`repeat.times`	`numeric, optional` Specifies the number of repeat times for resampling. Defaults to 1.
`param.search.strategy`	`character, optional` Specifies the method to activate parameter selection. values should either be 'GRID' or 'RANDOM'
`random.search.times`	`numeric, optional` Specifies the number of times to randomly select candidate parameters for selection.
`seed`	`numeric, optional` Specifies the seed for random generation. Use system time when 0 is specified.
`timeout`	`numeric, optional` Specifies maximum running time for model evaluation or parameter selection, in seconds. No timeout when 0 is specified.
`progress.indicator.id`	`character, optional` Sets an ID of progress indicator for model evaluation or parameter selection. No progress indicator is active if no value is provided.
`param.values`	`list, optional` Specifies values of the following parameters for parameter selection: 'action', 'output.action', 'hidden.layer.size', 'learning.rate', 'momentum', 'batch.size'.
`param.range`	`list, optional` Specifies range of the following parameters for parameter selection: 'learning.rate', 'momentum', 'batch.size'.

Format

R6Class object.

Value

An "MLPClassifier" object with the following attributes:

model: DataFrame

ROW_INDEX - model row index
MODEL_CONTENT - model content

log: DataFrame

ITERATION - iteration Number
ERROR - Mean squared error between predicted values and target values for each iteration

statistics: DataFrame

STAT_NAME - statistics name
STAT_VALUE - values of the statistics

Examples

## Not run: 
Training data df:

> df <- conn.context$table("PAL_TRAIN_MLP_REG_DATA_TBL")
> df$Collect()
   V000  V001 V002  V003 LABEL
0     1  1.71   AC     0    AA
1    10  1.78   CA     5    AB
2    17  2.36   AA     6    AA
3    12  3.15   AA     2     C
4     7  1.05   CA     3    AB
5     6  1.50   CA     2    AB
6     9  1.97   CA     6     C
7     5  1.26   AA     1    AA
8    12  2.13   AC     4     C
9    18  1.87   AC     6    AA

Training the model:

> mlpc <- hanaml.MLPClassifier(conn.context = conn, data = df, key = NULL,
                              features = NULL, label = NULL,
                              hidden.layer.size = c(10,10),
                              activation = "TANH", output.activation ="TANH",
                              learning.rate = 0.001, momentum = 0.0001,
                              training.style = "stochastic", max.iter = 100,
                              normalization = "z-transform", weight.init = "normal",
                              thread.ratio = 0.3, categorical.variable = "V003")

Training result may look different from the following results due to model randomness.


> mlpc$train.log$Collect()

         ITERATION ERROR
    0           1  1.080261
    1           2  1.008358
    2           3  0.947069
    3           4  0.894585
    4           5  0.849411
    5           6  0.810309
    6           7  0.776256
    7           8  0.746413
    8           9  0.720093
    9          10  0.696737
    10         11  0.675886
    11         12  0.657166
    12         13  0.640270
    13         14  0.624943
    14         15  0.609432
    ..        ...       ...
    91         92  0.317840
    92         93  0.316630
    93         94  0.315376
    94         95  0.314210
    95         96  0.313066
    96         97  0.312021
    97         98  0.310916
    98         99  0.309770
    99        100  0.308704

Model evaluation example:

> df <- conn.context$table("PAL_TRAIN_MLP_EVAL_DATA_TBL")
> df$Collect()
   V000  V001 V002  V003 LABEL
0     1  1.71   AC     0    AA
1    10  1.78   CA     5    AB
2    17  2.36   AA     6    AA
3    12  3.15   AA     2     C
4     7  1.05   CA     3    AB
5     6  1.50   CA     2    AB
6     9  1.97   CA     6     C
7     5  1.26   AA     1    AA
8    12  2.13   AC     4     C
9    18  1.87   AC     6    AA
Training the model:

> mlpc <- hanaml.MLPClassifier(conn.context, data = df, label= "LABEL",
                               hidden.layer.size = c(10,10),
                               activation = "tanh" ,output.activation = "tanh",
                               learning.rate = 0.001, momentum=0.00001,
                               training.style = "stochastic",
                               categorical.variable = "V003", max.iter = 100,
                               normalization = "z-transform",
                               weight.init = "normal", thread.ratio = 0.3,
                               resampling.method = "cv",
                               evaluation.metric = "f1_score",
                               fold.num = 10, repeat.times = 2,
                               seed = 1, progress.indicator.id = "TEST")


Parameter Selection Example:

> df <- conn.context$table("PAL_TRAIN_MLP_EVAL_DATA_TBL")

Training the model
> mlpc <- hanaml.MLPClassifier(conn.context, data = df, label= "LABEL",
                              learning.rate=0.001, momentum=0.00001,
                              training.style="stochastic",
                              categorical.variable = "V003",
                              max.iter = 100, normalization = "z-transform",
                              weight.init = "normal", thread.ratio = 0.3,
                              resampling.method = "stratified_bootstrap",
                              evaluation.metric = "ACCURACY",
                              param.search.strategy = "grid",
                              repeat.times = 2, seed = 1,
                              progress.indicator.id = "TEST",
                              param.values = list("hidden.layer.size" =
                                                   list(c(10,10), c(5,5,5)),
                                                  "activation" =
                                                  c("tanh",
                                                  "linear",
                                                  "sigmoid-asymmetric"),
                                                  "output.activation" =
                                                  c("sigmoid-symmetric",
                                                  "gaussian-asymmetric",
                                                  "gaussian-symmetric")))


Optimal Parameters:

0   PARAM_NAME                  INT_VALUE   DOUBLE_VALUE  STRING_VALUE
1   HIDDEN_LAYER_SIZE             ?             ?             10,10
2   OUTPUT_LAYER_ACTIVE_FUNC      4             ?             ?
3   HIDDEN_LAYER_ACTIVE_FUNC      1             ?             ?


## End(Not run)

[Package hana.ml.r version 1.0.8 Index]