Multi-Layer Perceptron (MLP) Classifier

hanaml.MLPClassifier is a R wrapper for SAP HANA PAL Multi-layer Perceptron algorithm for classification.

hanaml.MLPClassifier(
  data = NULL,
  key = NULL,
  features = NULL,
  label = NULL,
  formula = NULL,
  hidden.layer.size = NULL,
  activation = NULL,
  output.activation = NULL,
  learning.rate = NULL,
  momentum = NULL,
  training.style = NULL,
  max.iter = NULL,
  normalization = NULL,
  weight.init = NULL,
  thread.ratio = NULL,
  categorical.variable = NULL,
  batch.size = NULL,
  resampling.method = NULL,
  evaluation.metric = NULL,
  fold.num = NULL,
  repeat.times = NULL,
  param.search.strategy = NULL,
  random.search.times = NULL,
  random.state = NULL,
  timeout = NULL,
  progress.indicator.id = NULL,
  parameter.range = NULL,
  parameter.values = NULL
)

Arguments

data	`DataFrame` DataFrame containting the data.
key	`character, optional` Name of the ID column. If not provided, the data is assumed to have no ID column. No default value.
features	`character of list of characters, optional` Name of feature columns. If not provided, it defaults all non-key, non-label columns of data.
label	`character, optional` Name of the column which specifies the dependent variable. Defaults to the last column of data if not provided.
formula	`formula type, optional` Formula to be used for model generation. format = label~<feature_list> e.g.: formula=CATEGORY~V1+V2+V3 You can either give the formula, or a feature and label combination, but do not provide both. Defaults to NULL.
hidden.layer.size	`vector/list of integers, mandatory` Specifies the sizes of hidden layers. The value 0 will be ignored, for example, c(2, 0, 3) is equivalent to c(2, 3).
activation	`character` Activation function for the hidden layer options: `"tanh", "linear", "sigmoid-asymmetric", "sigmoid-symmetric", "gaussian-asymmetric", "gaussian-symmetric", "elliot-asymmetric", "elliot-symmetric", "sin-asymmetric", "sin-symmetric", "cos-asymmetric", "cos-symmetric", "relu"`
output.activation	`character` Output activation function for the hidden layer options: `"tanh", "linear", "sigmoid-asymmetric", "sigmoid-symmetric", "gaussian-asymmetric", "gaussian-symmetric", "elliot-asymmetric", "elliot-symmetric", "sin-asymmetric", "sin-symmetric", "cos-asymmetric", "cos-symmetric", "relu"`
learning.rate	`double, optional` Specifies the learning rate. Only valid when training.style is "stochastic". No default value.
momentum	`double, optional` Specifies the momentum for gradient descent update. Only valid when training.style is "stochastic". No default value.
training.style	`{"batch", "stochastic"}, optional` Specifies the training style. Defaults to "stochastic".
max.iter	`integer, optional` The maximum number of iterations. Defaults to 100.
normalization	`{"no", "z-transform", "scalar"}, optional` Specifies the normalization type. Defaults to 'no' .
weight.init	`character, optional` Specifies the initial value of weight from the options below. `"all-zeros", "normal", "uniform", "variance-scale-normal", "variance-scale-uniform"` Defaults to "all-zeros".
thread.ratio	`double, optional` Controls the proportion of available threads to use. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads. Values between 0 and 1 will use that percentage of available threads. Values outside this range tell PAL to heuristically determine the number of threads to use. Defaults to 0.
categorical.variable	`character or list of characters, optional` Column names in the data table used as category variable. No default value.
batch.size	`integer, optional` Specifies the size of mini batch. Only valid when training.style is stochastic. Defaults to 1.
resampling.method	`character, optional` Specifies the resampling values. `"cv", "stratified_cv", "bootstrap", "stratified_bootstrap"` If no value is specified for this parameter, neither model evaluation nor parameter selection is activated. No default value.
evaluation.metric	`character, optional` Specifies the evaluation metric for model evaluation or parameter selection. Valid values include: `"ACCURACY", "F1_SCORE", "AUC_1VsRest", "AUC_pairwise"` No default value.
fold.num	`integer, optional` Specifies the fold number for the cross validation method. Mandatory and valid only when resampling.method is set to "cv" or "stratified_cv". No default value.
repeat.times	`integer, optional` Specifies the number of repeat times for resampling. Defaults to 1.
param.search.strategy	`character, optional` Specifies the method to activate parameter selection. values should either be "grid" or "random" No default value.
random.search.times	`integer, optional` Specifies the number of times to randomly select candidate parameters for selection. No default value.
random.state	`integer, optional` Specifies the seed for random generation. Use system time when 0 is specified.
timeout	`integer, optional` Specifies maximum running time for model evaluation or parameter selection, in seconds. No timeout when 0 is specified. Default value is 0.
progress.indicator.id	`character, optional` Sets an ID of progress indicator for model evaluation or parameter selection. No progress indicator is active if no value is provided. No default value.
parameter.range	`list, optional` Specifies range of the following parameters for parameter selection: `learning.rate`, `momentum`, `batch.size`.
parameter.values	`list, optional` Specifies values of the following parameters for parameter selection: `action`, `output.action`, `hidden.layer.size`, `learning.rate`, `momentum`, `batch.size`.

Value

A "MLPClassifier" object with the following attributes:

model: DataFrame

ROW_INDEX - model row index.
MODEL_CONTENT - model content.

log: DataFrame

ITERATION - iteration number.
ERROR - Mean squared error between predicted values and target values for each iteration.

statistics: DataFrame

STAT_NAME - statistics name.
STAT_VALUE - values of the statistics.

optim.param: DataFrame

Selected optimal parameters.

Examples

Input DataFrame data:

> data$Collect()
   V000  V001 V002  V003 LABEL
1     1  1.71   AC     0    AA
2    10  1.78   CA     5    AB
3    17  2.36   AA     6    AA
4    12  3.15   AA     2     C
5     7  1.05   CA     3    AB
6     6  1.50   CA     2    AB
7     9  1.97   CA     6     C
8     5  1.26   AA     1    AA
9    12  2.13   AC     4     C
10   18  1.87   AC     6    AA

Training the model:

> mlpc <- hanaml.MLPClassifier(data = data,
                               hidden.layer.size = c(10,10),
                               activation = "TANH",
                               output.activation ="TANH",
                               learning.rate = 0.001,
                               momentum = 0.0001,
                               training.style = "stochastic",
                               max.iter = 100,
                               normalization = "z-transform",
                               weight.init = "normal",
                               thread.ratio = 0.3,
                               categorical.variable = "V003")

Output:

    > mlpc$train.log$Collect()

        ITERATION     ERROR
    1           1  1.080261
    2           2  1.008358
    3           3  0.947069
    4           4  0.894585
    5           5  0.849411
    ..        ...       ...
    92         92  0.317840
    93         93  0.316630
    94         94  0.315376
    95         95  0.314210
    96         96  0.313066
    97         97  0.312021
    98         98  0.310916
    99         99  0.309770
    100       100  0.308704

Model evaluation example: Training the model:

> mlpc <- hanaml.MLPClassifier(data = df, label= "LABEL",
                               hidden.layer.size = c(10,10),
                               activation = "tanh" ,output.activation = "tanh",
                               learning.rate = 0.001, momentum=0.00001,
                               training.style = "stochastic",
                               categorical.variable = "V003", max.iter = 100,
                               normalization = "z-transform",
                               weight.init = "normal", thread.ratio = 0.3,
                               resampling.method = "cv",
                               evaluation.metric = "f1_score",
                               fold.num = 10, repeat.times = 2,
                               random.state = 1, progress.indicator.id = "TEST")

Parameter Selection Example:

> mlpc <- hanaml.MLPClassifier(data = df, label= "LABEL",
                              learning.rate=0.001, momentum=0.00001,
                              training.style="stochastic",
                              categorical.variable = "V003",
                              max.iter = 100, normalization = "z-transform",
                              weight.init = "normal", thread.ratio = 0.3,
                              resampling.method = "stratified_bootstrap",
                              evaluation.metric = "ACCURACY",
                              param.search.strategy = "grid",
                              repeat.times = 2, random.state = 1,
                              progress.indicator.id = "TEST",
                              parameter.values = list("hidden.layer.size" =
                                                   list(c(10,10), c(5,5,5)),
                                                  "activation" =
                                                  c("tanh",
                                                  "linear",
                                                  "sigmoid-asymmetric"),
                                                  "output.activation" =
                                                  c("sigmoid-symmetric",
                                                  "gaussian-asymmetric",
                                                  "gaussian-symmetric")))

Output:

Optimal Parameters:

1   PARAM_NAME                  INT_VALUE   DOUBLE_VALUE  STRING_VALUE
2   HIDDEN_LAYER_SIZE            NA             NA             10,10
3   OUTPUT_LAYER_ACTIVE_FUNC      4             NA             <NA>
4   HIDDEN_LAYER_ACTIVE_FUNC      1             NA             <NA>

Arguments

Value

Examples

See also