hana_ml.model_storage

This module provides the features of model storage.

All these features are accessible to the end user via ModelStorage class:

exception hana_ml.model_storage.ModelStorageError

Bases: hana_ml.ml_exceptions.Error

Exception class used in Model Storage

class hana_ml.model_storage.ModelStorage(connection_context, schema=None, meta=None)

Bases: object

The ModelStorage class allows users to save, list, load and delete models.

Models are saved into SAP HANA tables in a schema specified by the user. A model is identified with:

  • A name (string of 255 characters maximum), It must not contain any characters such as coma, semi-colon, tabulation, end-of-line, simple-quote, double-quote (',', ';', '"', ''', 'n', 't').

  • A version (positive integer starting from 1).

A model can be saved in three ways:

  1. It can be saved for the first time. No model with the same name and version is supposed to exist.

  2. It can be saved as a replacement. If a model with the same name and version already exists, it could be overwritten.

  3. It can be saved with a higher version. The model will be saved with an incremented version number.

Internally, a model is stored as two parts:

  1. The metadata. It contains the model identification (name, version, algorithm class) and also its python model object attributes required for reinstantiation. It is saved in a table named HANAML_MODEL_STORAGE by default.

  2. The back-end model. It consists in the model returned by SAP HANA APL or SAP HANA PAL. For SAP HANA APL, it is always saved into the table HANAMl_APL_MODELS_DEFAULT, while for SAP HANA PAL, a model can be saved into different tables depending on the nature of the specified algorithm.

Parameters
connection_contextConnectionContext

The connection object to a SAP HANA database. It must be the same as the one used by the model.

schemastr, optional

The schema name where the model storage tables are created.

Defaults to the current schema used by the user.

metastr, optional

The name of meta table stored in SAP HANA.

Defaults to 'HANAML_MODEL_STORAGE'.

Examples

Creating and training a model with functions MLPClassifier and AutoClassifier:

Assume the training data is data and the connection to SAP HANA is conn.

>>> model_pal_name = 'MLPClassifier 1'
>>> model_pal = MLPClassifier(conn, hidden_layer_size=[10, ], activation='TANH', output_activation='TANH', learning_rate=0.01, momentum=0.001)
>>> model_pal.fit(data, label='IS_SETOSA', key='ID')
>>> model_apl_name = 'AutoClassifier 1'
>>> model_apl = AutoClassifier(conn_context=conn)
>>> model_apl.fit(data, label='IS_SETOSA', key='ID')

Creating an instance of ModelStorage:

>>> MODEL_SCHEMA = 'MODEL_STORAGE' # HANA schema in which models are to be saved
>>> model_storage = ModelStorage(connection_context=conn, schema=MODEL_SCHEMA)

Saving these two trained models for the first time:

>>> model_pal.name = model_pal_name
>>> model_storage.save_model(model=model_pal)
>>> model_apl.name = model_apl_name
>>> model_storage.save_model(model=model_apl)

Listing saved models:

>>> print(model_storage.list_models())
               NAME  VERSION LIBRARY                         ...
0  AutoClassifier 1        1     APL  hana_ml.algorithms.apl ...
1  MLPClassifier 1         1     PAL  hana_ml.algorithms.pal ...

Reloading saved models:

>>> model1 = model_storage.load_model(name=model_pal_name, version=1)
>>> model2 = model_storage.load_model(name=model_apl_name, version=1)

Using loaded model model2 for new prediction:

>>> out = model2.predict(data=data_test)
>>> print(out.head(3).collect())
   ID PREDICTED  PROBABILITY IS_SETOSA
0   1      True     0.999492      None    ...
1   2      True     0.999478      None
2   3      True     0.999460      None

Other examples of functions:

Saving a model by overwriting the original model:

>>> model_storage.save_model(model=model_apl, if_exists='replace')
>>> print(list_models = model_storage.list_models(name=model.name))
               NAME  VERSION LIBRARY                            ...
0  AutoClassifier 1        1     APL  hana_ml.algorithms.apl    ...

Saving a model by upgrading the version:

>>> model_storage.save_model(model=model_apl, if_exists='upgrade')
>>> print(list_models = model_storage.list_models(name=model.name))
               NAME  VERSION LIBRARY                            ...
0  AutoClassifier 1        1     APL  hana_ml.algorithms.apl    ...
1  AutoClassifier 1        2     APL  hana_ml.algorithms.apl    ...

Deleting a model with specified version:

>>> model_storage.delete_model(name=model.name, version=model.version)

Deleteing models with same model name and different versions:

>>> model_storage.delete_models(name=model.name)

Clean up all models and meta data at once:

>>> model_storage.clean_up()

Methods

clean_up()

Be cautious! This function will delete all the models and the meta table.

delete_model(name, version)

Deletes a model with a given name and version.

delete_models(name[, start_time, end_time])

Deletes the model in a batch model with specified time range.

disable_persistent_memory(name, version)

Disable persistent memory.

enable_persistent_memory(name, version)

Enable persistent memory.

list_models([name, version])

Lists existing models.

load_into_memory(name, version)

Load a model into the memory.

load_model(name[, version])

Loads an existing model from the SAP HANA database.

model_already_exists(name, version)

Checks if a model with specified name and version already exists.

save_model(model[, if_exists, storage_type, ...])

Saves a model.

set_data_lake_container(name)

Set HDL container name.

set_logfile(loc)

Set log file location.

set_schedule(name, version, schedule_time, ...)

Create the schedule plan.

start_schedule(name, version)

Execute the schedule plan.

terminate_schedule(name, version)

Execute the schedule plan.

unload_from_memory(name, version[, ...])

Unload a model from the memory.

upgrade_meta()

Upgrade the meta table to the latest changes.

list_models(name=None, version=None)

Lists existing models.

Parameters
namestr, optional

The model name pattern to be matched.

Defaults to None.

versionint, optional

The model version.

Defaults to None.

Returns
pandas.DataFrame

The model metadata matching the provided name and version.

model_already_exists(name, version)

Checks if a model with specified name and version already exists.

Parameters
namestr

The model name.

versionint

The model version.

Returns
bool

If True, there is already a model with the same name and version. If False, there is no model with the same name.

save_model(model, if_exists='upgrade', storage_type='default', force=False)

Saves a model.

Parameters
modela model instance.

The model name must have been set before saving the model. The information of name and version will serve as an unique id of a model.

if_existsstr, optional
Specifies the behavior how a model is saved if a model with same name/version already exists:
  • 'fail': Raises an Error.

  • 'replace': Overwrites the previous model.

  • 'upgrade': Saves the model with an incremented version.

Defaults to 'upgrade'.

storage_type{'default', 'HDL'}, optional
Specifies the storage type of the model:
  • 'default' : HANA default storage.

  • 'HDL' : HANA data lake.

forcebool, optional

Drop the existing table if True.

delete_model(name, version)

Deletes a model with a given name and version.

Parameters
namestr

The model name.

versionint

The model version.

delete_models(name, start_time=None, end_time=None)

Deletes the model in a batch model with specified time range.

Parameters
namestr

The model name.

start_timestr, optional

The start timestamp for deleting.

Defaults to None.

end_timestr, optional

The end timestamp for deleting.

Defaults to None.

clean_up()

Be cautious! This function will delete all the models and the meta table.

load_model(name, version=None, **kwargs)

Loads an existing model from the SAP HANA database.

Parameters
namestr

The model name.

versionint, optional

The model version. By default, the last version will be loaded.

Returns
PAL/APL object

The loaded model ready for use.

enable_persistent_memory(name, version)

Enable persistent memory.

Parameters
namestr

The name of the model.

versionint

The model version.

disable_persistent_memory(name, version)

Disable persistent memory.

Parameters
namestr

The name of the model.

versionint

The model version.

load_into_memory(name, version)

Load a model into the memory.

Parameters
namestr

The name of the model.

versionint

The model version.

unload_from_memory(name, version, persistent_memory=None)

Unload a model from the memory. The dataset will be loaded back into memory after next query.

Parameters
namestr

The name of the model.

versionint

The model version.

persistent_memory{'retain', 'delete'}, optional

Only works when persistent memory is enabled.

Defaults to None.

set_data_lake_container(name)

Set HDL container name.

Parameters
namestr

The name of the HDL container.

set_schedule(name, version, schedule_time, connection_userkey, init_params, fit_params, training_dataset_select_statement, storage_type='default', encrypt=None, sslValidateCertificate=None)

Create the schedule plan.

Parameters
namestr

The model name.

versionint

The model version.

schedule_time{'every x seconds', 'every x minutes', 'every x hours', 'every x weeks'}

Schedule the training.

connection_userkeystr

Userkey generated by HANA hdbuserstore.

init_params:

The parameters of the hana_ml object initialization.

fit_params:

The parameters of the fit function.

training_dataset_select_statement:

The select statement of the training dataset to be scheduled.

storage_type{'default', 'HDL'}, optional

If 'HDL', the model will be saved in HANA Data Lake.

start_schedule(name, version)

Execute the schedule plan.

Parameters
namestr

The model name.

versionint

The model version.

terminate_schedule(name, version)

Execute the schedule plan.

Parameters
namestr

The model name.

versionint

The model version.

set_logfile(loc)

Set log file location.

upgrade_meta()

Upgrade the meta table to the latest changes.