MLPClassifier

class hana_ml.algorithms.pal.neural_network.MLPClassifier(activation=None, activation_options=None, output_activation=None, output_activation_options=None, hidden_layer_size=None, hidden_layer_size_options=None, max_iter=None, training_style=None, learning_rate=None, momentum=None, batch_size=None, normalization=None, weight_init=None, categorical_variable=None, resampling_method=None, evaluation_metric=None, fold_num=None, repeat_times=None, search_strategy=None, random_search_times=None, random_state=None, timeout=None, progress_indicator_id=None, param_values=None, param_range=None, thread_ratio=None, reduction_rate=None, aggressive_elimination=None)

Multi-layer perceptron (MLP) Classifier.

Parameters:
activationstr

Specifies the activation function for the hidden layer.

Valid activation functions include:
  • 'tanh',

  • 'linear',

  • 'sigmoid_asymmetric',

  • 'sigmoid_symmetric',

  • 'gaussian_asymmetric',

  • 'gaussian_symmetric',

  • 'elliot_asymmetric',

  • 'elliot_symmetric',

  • 'sin_asymmetric',

  • 'sin_symmetric',

  • 'cos_asymmetric',

  • 'cos_symmetric',

  • 'relu'

Should not be specified only if activation_options is provided.

activation_optionslist of str, optional

A list of activation functions for parameter selection.

See activation for the full set of valid activation functions.

output_activationstr

Specifies the activation function for the output layer.

Valid activation functions same as those in activation.

Should not be specified only if output_activation_options is provided.

output_activation_optionslist of str, optional

A list of activation functions for the output layer for parameter selection.

See activation for the full set of activation functions.

hidden_layer_sizelist of int or tuple of int

Sizes of all hidden layers.

Should not be specified only if hidden_layer_size_options is provided.

hidden_layer_size_optionslist of tuples, optional

A list of optional sizes of all hidden layers for parameter selection.

max_iterint, optional

Maximum number of iterations.

Defaults to 100.

training_style{'batch', 'stochastic'}, optional

Specifies the training style.

Defaults to 'stochastic'.

learning_ratefloat, optional

Specifies the learning rate. Mandatory and valid only when training_style is 'stochastic'.

momentumfloat, optional

Specifies the momentum for gradient descent update. Mandatory and valid only when training_style is 'stochastic'.

batch_sizeint, optional

Specifies the size of mini batch. Valid only when training_style is 'stochastic'.

Defaults to 1.

normalization{'no', 'z-transform', 'scalar'}, optional

Defaults to 'no'.

weight_init{'all-zeros', 'normal', 'uniform', 'variance-scale-normal', 'variance-scale-uniform'}, optional

Specifies the weight initial value.

Defaults to 'all-zeros'.

categorical_variablestr or list of str, optional

Specifies column name(s) in the data table used as category variable.

Valid only when column is of INTEGER type.

thread_ratiofloat, optional

Controls the proportion of available threads to use for training.

The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads.

Values between 0 and 1 will use that percentage of available threads.

Values outside this range tell PAL to heuristically determine the number of threads to use.

Defaults to 0.

resampling_methodstr, optional

Specifies the resampling method for model evaluation or parameter selection.

Valid options include: 'cv', 'stratified_cv', 'bootstrap', 'stratified_bootstrap', 'cv_sha', 'stratified_cv_sha', 'bootstrap_sha', 'stratified_bootstrap_sha', 'cv_hyperband', 'stratified_cv_hyperband', 'bootstrap_hyperband', 'stratified_bootstrap_hyperband'.

If no value is specified for this parameter, neither model evaluation nor parameter selection is activated.

No default value.

Note

Resampling method with suffix 'sha' or 'hyperband' is used for parameter selection only, not for model evaluation.

evaluation_metric{'accuracy','f1_score', 'auc_onevsrest', 'auc_pairwise'}, optional

Specifies the evaluation metric for model evaluation or parameter selection.

Must be specified together with resampling_method to activate model evaluation or parameter selection.

No default value.

fold_numint, optional

Specifies the fold number for the cross-validation.

Mandatory and valid only when resampling_method is specified to be one of the following: 'cv', 'stratified_cv', 'cv_sha', 'stratified_cv_sha', 'cv_hyperband', 'stratified_cv_hyperband'.

repeat_timesint, optional

Specifies the number of repeat times for resampling.

Defaults to 1.

search_strategy{'grid', 'random'}, optional

Specifies the method for parameter selection.

  • mandatory if resampling_method is specified with suffix 'sha'

  • defaults to 'random' and cannot be changed if resampling_method is specified with suffix 'hyperband'

  • otherwise no default value, and parameter selection will not be activated if not specified

random_search_timesint, optional

Specifies the number of times to randomly select candidate parameters.

Mandatory and valid only when search_strategy is set to 'random'.

random_stateint, optional

Specifies the seed for random generation.

When 0 is specified, system time is used.

Defaults to 0.

timeoutint, optional

Specifies maximum running time for model evaluation/parameter selection, in seconds.

No timeout when 0 is specified.

Defaults to 0.

progress_idstr, optional

Sets an ID of progress indicator for model evaluation/parameter selection.

If not provided, no progress indicator is activated.

param_valuesdict or list of tuples, optional

Specifies the values of following parameters for model parameter selection:

learning_rate, momentum, batch_size.

If input is list of tuples, then each tuple must contain exactly two elements:

  • 1st element is the parameter name(str type),

  • 2nd element is a list of valid values for that parameter.

If input is dict, then for each element, the key must be parameter name, while value be a list of valid values for the corresponding parameter.

A simple example for illustration:

[('learning_rate', [0.1, 0.2, 0.5]), ('momentum', [0.2, 0.6])],

or

dict(learning_rate=[0.1, 0.2, 0.5], momentum=[0.2, 0.6]).

Valid only when resampling_method and search_strategy are both specified, and training_style is 'stochastic'.

param_rangelist of tuple, optional

Specifies the range of the following parameters for model parameter selection:

learning_rate, momentum, batch_size.

If input is a list of tuples, the each tuple should contain exactly two elements:

  • 1st element is the parameter name(str type),

  • 2nd element is a list that specifies the range of that parameter as follows: first value is the start value, second value is the step, and third value is the end value. The step value can be omitted, and will be ignored, if search_strategy is set to 'random'.

Otherwise, if input is a dict, then for each element the key should be parameter name, while value specifies the range of that parameter.

Valid only when resampling_method and search_strategy are both specified, and training_style is 'stochastic'.

reduction_ratefloat, optional

Specifies reduction rate in SHA or Hyperband method.

For each round, the available parameter candidate size will be divided by value of this parameter. Thus valid value for this parameter must be greater than 1.0

Valid only when resampling_method is specified with suffix 'sha' or 'hyperband'(e.g. 'cv_sha', 'stratified_bootstrap_hyperband').

Defaults to 3.0.

aggressive_eliminationbool, optional

Specifies whether to apply aggressive elimination while using SHA method.

Aggressive elimination happens when the data size and parameters size to be searched does not match and there are still bunch of parameters to be searched while data size reaches its upper limits. If aggressive elimination is applied, lower bound of limit of data size will be used multiple times first to reduce number of parameters.

Valid only when resampling_method is specified with suffix 'sha'.

Defaults to False.

Examples

Training data:

>>> df.collect()
   V000  V001 V002  V003 LABEL
0     1  1.71   AC     0    AA
1    10  1.78   CA     5    AB
2    17  2.36   AA     6    AA
3    12  3.15   AA     2     C
4     7  1.05   CA     3    AB
5     6  1.50   CA     2    AB
6     9  1.97   CA     6     C
7     5  1.26   AA     1    AA
8    12  2.13   AC     4     C
9    18  1.87   AC     6    AA

Training the model:

>>> mlpc = MLPClassifier(hidden_layer_size=(10,10),
...                      activation='tanh', output_activation='tanh',
...                      learning_rate=0.001, momentum=0.0001,
...                      training_style='stochastic',max_iter=100,
...                      normalization='z-transform', weight_init='normal',
...                      thread_ratio=0.3, categorical_variable='V003')
>>> mlpc.fit(data=df)

Training result may look different from the following results due to model randomness.

>>> mlpc.model_.collect()
   ROW_INDEX                                      MODEL_CONTENT
0          1  {"CurrentVersion":"1.0","DataDictionary":[{"da...
1          2  t":0.2700182926188939},{"from":13,"weight":0.0...
2          3  ht":0.2414416413305134},{"from":21,"weight":0....
>>> mlpc.train_log_.collect()
    ITERATION     ERROR
0           1  1.080261
1           2  1.008358
2           3  0.947069
3           4  0.894585
4           5  0.849411
5           6  0.810309
6           7  0.776256
7           8  0.746413
8           9  0.720093
9          10  0.696737
10         11  0.675886
11         12  0.657166
12         13  0.640270
13         14  0.624943
14         15  0.609432
15         16  0.595204
16         17  0.582101
17         18  0.569990
18         19  0.558757
19         20  0.548305
20         21  0.538553
21         22  0.529429
22         23  0.521457
23         24  0.513893
24         25  0.506704
25         26  0.499861
26         27  0.493338
27         28  0.487111
28         29  0.481159
29         30  0.475462
..        ...       ...
70         71  0.349684
71         72  0.347798
72         73  0.345954
73         74  0.344071
74         75  0.342232
75         76  0.340597
76         77  0.338837
77         78  0.337236
78         79  0.335749
79         80  0.334296
80         81  0.332759
81         82  0.331255
82         83  0.329810
83         84  0.328367
84         85  0.326952
85         86  0.325566
86         87  0.324232
87         88  0.322899
88         89  0.321593
89         90  0.320242
90         91  0.318985
91         92  0.317840
92         93  0.316630
93         94  0.315376
94         95  0.314210
95         96  0.313066
96         97  0.312021
97         98  0.310916
98         99  0.309770
99        100  0.308704

Prediction:

>>> pred_df.collect()
>>> res, stat = mlpc.predict(data=pred_df, key='ID')

Prediction result may look different from the following results due to model randomness.

>>> res.collect()
   ID TARGET     VALUE
0   1      C  0.472751
1   2      C  0.417681
2   3      C  0.543967
>>> stat.collect()
   ID CLASS  SOFT_MAX
0   1    AA  0.371996
1   1    AB  0.155253
2   1     C  0.472751
3   2    AA  0.357822
4   2    AB  0.224496
5   2     C  0.417681
6   3    AA  0.349813
7   3    AB  0.106220
8   3     C  0.543967

Model Evaluation:

>>> mlpc = MLPClassifier(activation='tanh',
...                      output_activation='tanh',
...                      hidden_layer_size=(10,10),
...                      learning_rate=0.001,
...                      momentum=0.0001,
...                      training_style='stochastic',
...                      max_iter=100,
...                      normalization='z-transform',
...                      weight_init='normal',
...                      resampling_method='cv',
...                      evaluation_metric='f1_score',
...                      fold_num=10,
...                      repeat_times=2,
...                      random_state=1,
...                      progress_indicator_id='TEST',
...                      thread_ratio=0.3)
>>> mlpc.fit(data=df, label='LABEL', categorical_variable='V003')

Model evaluation result may look different from the following result due to randomness.

>>> mlpc.stats_.collect()
            STAT_NAME                                         STAT_VALUE
0             timeout                                              FALSE
1     TEST_1_F1_SCORE                       1, 0, 1, 1, 0, 1, 0, 1, 1, 0
2     TEST_2_F1_SCORE                       0, 0, 1, 1, 0, 1, 0, 1, 1, 1
3  TEST_F1_SCORE.MEAN                                                0.6
4   TEST_F1_SCORE.VAR                                           0.252631
5      EVAL_RESULTS_1  {"candidates":[{"TEST_F1_SCORE":[[1.0,0.0,1.0,...
6     solution status  Convergence not reached after maximum number o...
7               ERROR                                 0.2951168443145714

Parameter selection:

>>> act_opts=['tanh', 'linear', 'sigmoid_asymmetric']
>>> out_act_opts = ['sigmoid_symmetric', 'gaussian_asymmetric', 'gaussian_symmetric']
>>> layer_size_opts = [(10, 10), (5, 5, 5)]
>>> mlpc = MLPClassifier(activation_options=act_opts,
...                      output_activation_options=out_act_opts,
...                      hidden_layer_size_options=layer_size_opts,
...                      learning_rate=0.001,
...                      batch_size=2,
...                      momentum=0.0001,
...                      training_style='stochastic',
...                      max_iter=100,
...                      normalization='z-transform',
...                      weight_init='normal',
...                      resampling_method='stratified_bootstrap',
...                      evaluation_metric='accuracy',
...                      search_strategy='grid',
...                      fold_num=10,
...                      repeat_times=2,
...                      random_state=1,
...                      progress_indicator_id='TEST',
...                      thread_ratio=0.3)
>>> mlpc.fit(data=df, label='LABEL', categorical_variable='V003')

Parameter selection result may look different from the following result due to randomness.

>>> mlpc.stats_.collect()
            STAT_NAME                                         STAT_VALUE
0             timeout                                              FALSE
1     TEST_1_ACCURACY                                               0.25
2     TEST_2_ACCURACY                                           0.666666
3  TEST_ACCURACY.MEAN                                           0.458333
4   TEST_ACCURACY.VAR                                          0.0868055
5      EVAL_RESULTS_1  {"candidates":[{"TEST_ACCURACY":[[0.50],[0.0]]...
6      EVAL_RESULTS_2  PUT_LAYER_ACTIVE_FUNC=6;HIDDEN_LAYER_ACTIVE_FU...
7      EVAL_RESULTS_3  FUNC=2;"},{"TEST_ACCURACY":[[0.50],[0.33333333...
8      EVAL_RESULTS_4  rs":"HIDDEN_LAYER_SIZE=10, 10;OUTPUT_LAYER_ACT...
9               ERROR                                  0.684842661926971
>>> mlpc.optim_param_.collect()
                 PARAM_NAME  INT_VALUE DOUBLE_VALUE STRING_VALUE
0         HIDDEN_LAYER_SIZE        NaN         None      5, 5, 5
1  OUTPUT_LAYER_ACTIVE_FUNC        4.0         None         None
2  HIDDEN_LAYER_ACTIVE_FUNC        3.0         None         None
Attributes:
model_DataFrame

Model content.

train_log_DataFrame

Provides mean squared error between predicted values and target values for each iteration.

stats_DataFrame

Names and values of statistics.

optim_param_DataFrame

Provides optimal parameters selected.

Available only when parameter selection is triggered.

Methods

create_model_state([model, function, ...])

Create PAL model state.

delete_model_state([state])

Delete PAL model state.

fit(data[, key, features, label, ...])

Fit the model when the training dataset is given.

predict(data[, key, features, thread_ratio])

Predict using the multi-layer perceptron model.

score(data[, key, features, label, thread_ratio])

Returns the accuracy on the given test data and labels.

set_model_state(state)

Set the model state by state information.

fit(data, key=None, features=None, label=None, categorical_variable=None)

Fit the model when the training dataset is given.

Parameters:
dataDataFrame

DataFrame containing the data.

keystr, optional

Name of the ID column.

If key is not provided, then:

  • if data is indexed by a single column, then key defaults to that index column

  • otherwise, it is assumed that data contains no ID column

featureslist of str, optional

Names of the feature columns.

If features is not provided, it defaults to all the non-ID and non-label columns.

labelstr, optional

Name of the label column. If label is not provided, it defaults to the last column.

categorical_variablestr or list of str, optional

Specifies INTEGER column(s) specified that should be treated as categorical. Other INTEGER columns will be treated as continuous.

create_model_state(model=None, function=None, pal_funcname='PAL_MULTILAYER_PERCEPTRON', state_description=None, force=False)

Create PAL model state.

Parameters:
modelDataFrame, optional

Specify the model for AFL state.

Defaults to self.model_.

functionstr, optional

Specify the function in the unified API.

A placeholder parameter, not effective for Multilayer Perceptron.

pal_funcnameint or str, optional

PAL function name.

Defaults to 'PAL_MULTILAYER_PERCEPTRON'.

state_descriptionstr, optional

Description of the state as model container.

Defaults to None.

forcebool, optional

If True it will delete the existing state.

Defaults to False.

delete_model_state(state=None)

Delete PAL model state.

Parameters:
stateDataFrame, optional

Specified the state.

Defaults to self.state.

property fit_hdbprocedure

Returns the generated hdbprocedure for fit.

predict(data, key=None, features=None, thread_ratio=None)

Predict using the multi-layer perceptron model.

Parameters:
dataDataFrame

DataFrame containing the data.

keystr, optional

Name of the ID column.

Mandatory if data is not indexed, or the index of data contains multiple columns.

Defaults to the single index column of data if not provided.

featureslist of str, optional

Names of the feature columns.

If features is not provided, it defaults to all the non-ID columns.

thread_ratiofloat, optional

Controls the proportion of available threads to be used for prediction.

The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads.

Values between 0 and 1 will use that percentage of available threads.

Values outside this range tell PAL to heuristically determine the number of threads to use.

Defaults to 0.

Returns:
DataFrame

Predicted classes, structured as follows:

  • ID column, with the same name and type as data 's ID column.

  • TARGET, type NVARCHAR, predicted class name.

  • VALUE, type DOUBLE, softmax value for the predicted class.

Softmax values for all classes, structured as follows:

  • ID column, with the same name and type as data 's ID column.

  • CLASS, type NVARCHAR, class name.

  • VALUE, type DOUBLE, softmax value for that class.

property predict_hdbprocedure

Returns the generated hdbprocedure for predict.

set_model_state(state)

Set the model state by state information.

Parameters:
state: DataFrame or dict

If state is DataFrame, it has the following structure:

  • NAME: VARCHAR(100), it mush have STATE_ID, HINT, HOST and PORT.

  • VALUE: VARCHAR(1000), the values according to NAME.

If state is dict, the key must have STATE_ID, HINT, HOST and PORT.

score(data, key=None, features=None, label=None, thread_ratio=None)

Returns the accuracy on the given test data and labels.

Parameters:
dataDataFrame

DataFrame containing the data.

keystr, optional

Name of the ID column.

Mandatory if data is not indexed, or the index of data contains multiple columns.

Defaults to the single index column of data if not provided.

featureslist of str, optional

Names of the feature columns.

If features is not provided, it defaults to all the non-ID and non-label columns.

labelstr, optional

Name of the label column.

If label is not provided, it defaults to the last column.

Returns:
float

Scalar value of accuracy after comparing the predicted result and original label.

Inherited Methods from PALBase

Besides those methods mentioned above, the MLPClassifier class also inherits methods from PALBase class, please refer to PAL Base for more details.