hanaml.MLPClassifier is a R wrapper for SAP HANA PAL Multi-layer Perceptron algorithm for classification.

hanaml.MLPClassifier(
  data = NULL,
  key = NULL,
  features = NULL,
  label = NULL,
  formula = NULL,
  hidden.layer.size = NULL,
  activation = NULL,
  output.activation = NULL,
  learning.rate = NULL,
  momentum = NULL,
  training.style = NULL,
  max.iter = NULL,
  normalization = NULL,
  weight.init = NULL,
  thread.ratio = NULL,
  categorical.variable = NULL,
  batch.size = NULL,
  resampling.method = NULL,
  evaluation.metric = NULL,
  fold.num = NULL,
  repeat.times = NULL,
  param.search.strategy = NULL,
  random.search.times = NULL,
  random.state = NULL,
  timeout = NULL,
  progress.indicator.id = NULL,
  parameter.range = NULL,
  parameter.values = NULL
)

Arguments

data

DataFrame
DataFrame containting the data.

key

character, optional
Name of the ID column. If not provided, the data is assumed to have no ID column.
No default value.

features

character of list of characters, optional
Name of feature columns.
If not provided, it defaults all non-key, non-label columns of data.

label

character, optional
Name of the column which specifies the dependent variable.
Defaults to the last column of data if not provided.

formula

formula type, optional
Formula to be used for model generation. format = label~<feature_list> e.g.: formula=CATEGORY~V1+V2+V3
You can either give the formula, or a feature and label combination, but do not provide both.
Defaults to NULL.

hidden.layer.size

vector/list of integers, mandatory
Specifies the sizes of hidden layers.
The value 0 will be ignored, for example, c(2, 0, 3) is equivalent to c(2, 3).

activation

character
Activation function for the hidden layer options: "tanh", "linear", "sigmoid-asymmetric", "sigmoid-symmetric", "gaussian-asymmetric", "gaussian-symmetric", "elliot-asymmetric", "elliot-symmetric", "sin-asymmetric", "sin-symmetric", "cos-asymmetric", "cos-symmetric", "relu"

output.activation

character
Output activation function for the hidden layer options: "tanh", "linear", "sigmoid-asymmetric", "sigmoid-symmetric", "gaussian-asymmetric", "gaussian-symmetric", "elliot-asymmetric", "elliot-symmetric", "sin-asymmetric", "sin-symmetric", "cos-asymmetric", "cos-symmetric", "relu"

learning.rate

double, optional
Specifies the learning rate.
Only valid when training.style is "stochastic".
No default value.

momentum

double, optional
Specifies the momentum for gradient descent update.
Only valid when training.style is "stochastic".
No default value.

training.style

{"batch", "stochastic"}, optional
Specifies the training style.
Defaults to "stochastic".

max.iter

integer, optional
The maximum number of iterations.
Defaults to 100.

normalization

{"no", "z-transform", "scalar"}, optional
Specifies the normalization type.
Defaults to 'no' .

weight.init

character, optional
Specifies the initial value of weight from the options below. "all-zeros", "normal", "uniform", "variance-scale-normal", "variance-scale-uniform"
Defaults to "all-zeros".

thread.ratio

double, optional
Controls the proportion of available threads to use. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads. Values between 0 and 1 will use that percentage of available threads. Values outside this range tell PAL to heuristically determine the number of threads to use.
Defaults to 0.

categorical.variable

character or list of characters, optional
Column names in the data table used as category variable.
No default value.

batch.size

integer, optional
Specifies the size of mini batch.
Only valid when training.style is stochastic.
Defaults to 1.

resampling.method

character, optional
Specifies the resampling values.
"cv", "stratified_cv", "bootstrap", "stratified_bootstrap"
If no value is specified for this parameter, neither model evaluation nor parameter selection is activated.
No default value.

evaluation.metric

character, optional
Specifies the evaluation metric for model evaluation or parameter selection.
Valid values include: "ACCURACY", "F1_SCORE", "AUC_1VsRest", "AUC_pairwise"
No default value.

fold.num

integer, optional
Specifies the fold number for the cross validation method.
Mandatory and valid only when resampling.method is set to "cv" or "stratified_cv".
No default value.

repeat.times

integer, optional
Specifies the number of repeat times for resampling.
Defaults to 1.

param.search.strategy

character, optional
Specifies the method to activate parameter selection.
values should either be "grid" or "random"
No default value.

random.search.times

integer, optional
Specifies the number of times to randomly select candidate parameters for selection.
No default value.

random.state

integer, optional
Specifies the seed for random generation.
Use system time when 0 is specified.

timeout

integer, optional
Specifies maximum running time for model evaluation or parameter selection, in seconds. No timeout when 0 is specified.
Default value is 0.

progress.indicator.id

character, optional
Sets an ID of progress indicator for model evaluation or parameter selection.
No progress indicator is active if no value is provided.
No default value.

parameter.range

list, optional
Specifies range of the following parameters for parameter selection:
learning.rate, momentum, batch.size.

parameter.values

list, optional
Specifies values of the following parameters for parameter selection:
action, output.action, hidden.layer.size, learning.rate, momentum, batch.size.

Value

A "MLPClassifier" object with the following attributes:

model: DataFrame

  • ROW_INDEX - model row index.

  • MODEL_CONTENT - model content.

log: DataFrame

  • ITERATION - iteration number.

  • ERROR - Mean squared error between predicted values and target values for each iteration.

statistics: DataFrame

  • STAT_NAME - statistics name.

  • STAT_VALUE - values of the statistics.

optim.param: DataFrame

  • Selected optimal parameters.

Examples

Input DataFrame data:

> data$Collect()
   V000  V001 V002  V003 LABEL
1     1  1.71   AC     0    AA
2    10  1.78   CA     5    AB
3    17  2.36   AA     6    AA
4    12  3.15   AA     2     C
5     7  1.05   CA     3    AB
6     6  1.50   CA     2    AB
7     9  1.97   CA     6     C
8     5  1.26   AA     1    AA
9    12  2.13   AC     4     C
10   18  1.87   AC     6    AA

Training the model:

> mlpc <- hanaml.MLPClassifier(data = data,
                               hidden.layer.size = c(10,10),
                               activation = "TANH",
                               output.activation ="TANH",
                               learning.rate = 0.001,
                               momentum = 0.0001,
                               training.style = "stochastic",
                               max.iter = 100,
                               normalization = "z-transform",
                               weight.init = "normal",
                               thread.ratio = 0.3,
                               categorical.variable = "V003")

Output:

    > mlpc$train.log$Collect()

        ITERATION     ERROR
    1           1  1.080261
    2           2  1.008358
    3           3  0.947069
    4           4  0.894585
    5           5  0.849411
    ..        ...       ...
    92         92  0.317840
    93         93  0.316630
    94         94  0.315376
    95         95  0.314210
    96         96  0.313066
    97         97  0.312021
    98         98  0.310916
    99         99  0.309770
    100       100  0.308704

Model evaluation example: Training the model:

> mlpc <- hanaml.MLPClassifier(data = df, label= "LABEL",
                               hidden.layer.size = c(10,10),
                               activation = "tanh" ,output.activation = "tanh",
                               learning.rate = 0.001, momentum=0.00001,
                               training.style = "stochastic",
                               categorical.variable = "V003", max.iter = 100,
                               normalization = "z-transform",
                               weight.init = "normal", thread.ratio = 0.3,
                               resampling.method = "cv",
                               evaluation.metric = "f1_score",
                               fold.num = 10, repeat.times = 2,
                               random.state = 1, progress.indicator.id = "TEST")

Parameter Selection Example:

> mlpc <- hanaml.MLPClassifier(data = df, label= "LABEL",
                              learning.rate=0.001, momentum=0.00001,
                              training.style="stochastic",
                              categorical.variable = "V003",
                              max.iter = 100, normalization = "z-transform",
                              weight.init = "normal", thread.ratio = 0.3,
                              resampling.method = "stratified_bootstrap",
                              evaluation.metric = "ACCURACY",
                              param.search.strategy = "grid",
                              repeat.times = 2, random.state = 1,
                              progress.indicator.id = "TEST",
                              parameter.values = list("hidden.layer.size" =
                                                   list(c(10,10), c(5,5,5)),
                                                  "activation" =
                                                  c("tanh",
                                                  "linear",
                                                  "sigmoid-asymmetric"),
                                                  "output.activation" =
                                                  c("sigmoid-symmetric",
                                                  "gaussian-asymmetric",
                                                  "gaussian-symmetric")))

Output:

Optimal Parameters:

1   PARAM_NAME                  INT_VALUE   DOUBLE_VALUE  STRING_VALUE
2   HIDDEN_LAYER_SIZE            NA             NA             10,10
3   OUTPUT_LAYER_ACTIVE_FUNC      4             NA             <NA>
4   HIDDEN_LAYER_ACTIVE_FUNC      1             NA             <NA>

See also