hana_ml.model_storage

This module provides the features of model storage.

All these features are accessible to the end user via ModelStorage class:

ModelStorage

ModelStorageError

exception hana_ml.model_storage.ModelStorageError

Bases: Error

Exception class used in Model Storage

class hana_ml.model_storage.ModelStorage(connection_context, schema=None, meta=None)

Bases: object

The ModelStorage class allows users to save, list, load and delete models.

Models are saved into SAP HANA tables in a schema specified by the user. A model is identified with:

A name (string of 255 characters maximum), It must not contain any characters such as coma, semi-colon, tabulation, end-of-line, simple-quote, double-quote (',', ';', '"', ''', 'n', 't').
A version (positive integer starting from 1).

A model can be saved in three ways:

It can be saved for the first time. No model with the same name and version is supposed to exist.
It can be saved as a replacement. If a model with the same name and version already exists, it could be overwritten.
It can be saved with a higher version. The model will be saved with an incremented version number.

Internally, a model is stored as two parts:

The metadata. It contains the model identification (name, version, algorithm class) and also its python model object attributes required for reinstantiation. It is saved in a table named HANAML_MODEL_STORAGE by default.
The back-end model. It consists in the model returned by SAP HANA APL or SAP HANA PAL. For SAP HANA APL, it is always saved into the table HANAMl_APL_MODELS_DEFAULT, while for SAP HANA PAL, a model can be saved into different tables depending on the nature of the specified algorithm.

Parameters:

connection_contextConnectionContext

The connection object to a SAP HANA database. It must be the same as the one used by the model.

schemastr, optional

The schema name where the model storage tables are created.

Defaults to the current schema used by the user.

metastr, optional

The name of meta table stored in SAP HANA.

Defaults to 'HANAML_MODEL_STORAGE'.

Examples

Creating and training a model with functions MLPClassifier and AutoClassifier:

Assume the training data is data and the connection to SAP HANA is conn.

>>> model_pal_name = 'MLPClassifier 1'
>>> model_pal = MLPClassifier(conn, hidden_layer_size=[10, ], activation='TANH', output_activation='TANH', learning_rate=0.01, momentum=0.001)
>>> model_pal.fit(data, label='IS_SETOSA', key='ID')

>>> model_apl_name = 'AutoClassifier 1'
>>> model_apl = AutoClassifier(conn_context=conn)
>>> model_apl.fit(data, label='IS_SETOSA', key='ID')

Creating an instance of ModelStorage:

>>> MODEL_SCHEMA = 'MODEL_STORAGE' # HANA schema in which models are to be saved
>>> model_storage = ModelStorage(connection_context=conn, schema=MODEL_SCHEMA)

Saving these two trained models for the first time:

>>> model_pal.name = model_pal_name
>>> model_storage.save_model(model=model_pal)
>>> model_apl.name = model_apl_name
>>> model_storage.save_model(model=model_apl)

Listing saved models:

>>> print(model_storage.list_models())
               NAME  VERSION LIBRARY                         ...
0  AutoClassifier 1        1     APL  hana_ml.algorithms.apl ...
1  MLPClassifier 1         1     PAL  hana_ml.algorithms.pal ...

Reloading saved models:

>>> model1 = model_storage.load_model(name=model_pal_name, version=1)
>>> model2 = model_storage.load_model(name=model_apl_name, version=1)

Using loaded model model2 for new prediction:

>>> out = model2.predict(data=data_test)
>>> print(out.head(3).collect())
   ID PREDICTED  PROBABILITY IS_SETOSA
0   1      True     0.999492      None    ...
1   2      True     0.999478      None
2   3      True     0.999460      None

Other examples of functions:

Saving a model by overwriting the original model:

>>> model_storage.save_model(model=model_apl, if_exists='replace')
>>> print(list_models = model_storage.list_models(name=model.name))
               NAME  VERSION LIBRARY                            ...
0  AutoClassifier 1        1     APL  hana_ml.algorithms.apl    ...

Saving a model by upgrading the version:

>>> model_storage.save_model(model=model_apl, if_exists='upgrade')
>>> print(list_models = model_storage.list_models(name=model.name))
               NAME  VERSION LIBRARY                            ...
0  AutoClassifier 1        1     APL  hana_ml.algorithms.apl    ...
1  AutoClassifier 1        2     APL  hana_ml.algorithms.apl    ...

Deleting a model with specified version:

>>> model_storage.delete_model(name=model.name, version=model.version)

Deleting models with same model name and different versions:

>>> model_storage.delete_models(name=model.name)

Clean up all models and meta data at once:

>>> model_storage.clean_up()

Methods

`change_storage_type`(name, version, storage_type)	Change storage type for model tables.
`clean_up`()	Be cautious! This function will delete all the models and the meta table.
`delete_model`(name, version)	Deletes a model with a given name and version.
`delete_models`(name[, start_time, end_time])	Deletes the model in a batch model with specified time range.
`disable_persistent_memory`(name, version)	Disable persistent memory.
`display_hana_schedule`(name, version)	Display the server-side schedule plan.
`display_model_report`(name[, version])	Display model report.
`enable_persistent_memory`(name, version)	Enable persistent memory.
`export_model`(name, version[, directory])	Export model to client.
`import_model`(path[, model_schema, force, ...])	Import model from client to model storage.
`list_models`([name, version, display_type])	Lists existing models.
`load_into_memory`(name, version)	Load a model into the memory.
`load_mlflow_model`(connection_context, model_uri)	Load mlflow model by given model_uri.
`load_model`(name[, version])	Loads an existing model from the SAP HANA database.
`load_model_from_files`(path[, model_schema, ...])	Load model from client and create hana-ml object.
`model_already_exists`(name, version)	Checks if a model with specified name and version already exists.
`save_model`(model[, if_exists, storage_type, ...])	Saves a model.
`save_model_to_files`(model, directory[, ...])	Export model to local files.
`set_data_lake_container`(name)	Set HDL container name.
`set_logfile`(loc)	Set log file location.
`set_schedule`(name, version, schedule_time, ...)	Create the schedule plan.
`start_schedule`(name, version)	Execute the schedule plan.
`terminate_schedule`(name, version)	Execute the schedule plan.
`unload_from_memory`(name, version[, ...])	Unload a model from the memory.
`upgrade_meta`()	Upgrade the meta table to the latest changes.

export_model(name, version, directory=None)

Export model to client.

Parameters:

namestr: The model name.
versionint: The model version.
directorystr, optional: The directory to be exported.
Default to the current directory.

load_model_from_files(path, model_schema=None, use_temporary_table=True, force=False)

Load model from client and create hana-ml object.

Parameters:

pathstr

The location of models.

model_schemastr, optional

The schema to save model tables.

Defaults to the current schema.

use_temporary_tablebool, optional

Import models to temporary tables or not.

Defaults to True.

forcebool, optional

If True, it will drop the models with the same table name.

Defaults to False.

Returns:

hana-ml object

import_model(path, model_schema=None, force=False, table_structure=None)

Import model from client to model storage.

Parameters:

pathstr

The location of models.

model_schemastr, optional

The schema to save model tables.

Default to the schema of the model storage.

forcebool, optional

If True, it will drop the models with the same name and version in the model storage.

Default to False.

list_models(name=None, version=None, display_type='complete')

Lists existing models.

Parameters:

namestr, optional

The model name pattern to be matched.

Defaults to None.

versionint, optional

The model version.

Defaults to None.

display_type: {'complete', 'simple', 'no_reports'}, optional

Whether partially fetch the model information. - 'complete': fetch all the information. - 'simple': exclude JSON and MODEL_REPORT columns. - 'no_reports': exclude MODEL_REPORT column.

Defaults to 'complete'.

Returns:

pandas.DataFrame: The model metadata matching the provided name and version.

model_already_exists(name, version)

Checks if a model with specified name and version already exists.

Parameters:

namestr: The model name.
versionint: The model version.

Returns:

bool: If True, there is already a model with the same name and version. If False, there is no model with the same name.

change_storage_type(name, version, storage_type)

Change storage type for model tables.

Parameters:

namestr

The name of model.

versionstr

The version of model.

storage_type{'default', 'HDL'}

Specifies the storage type of the model:

'default' : HANA default storage.
'HDL' : HANA data lake.

save_model(model, if_exists='upgrade', storage_type='default', force=False, save_report=False)

Saves a model.

Parameters:

modela model instance.

The model name must have been set before saving the model. The information of name and version will serve as an unique id of a model.

if_existsstr, optional

Specifies the behavior how a model is saved if a model with same name/version already exists:

'fail': Raises an Error.
'replace': Overwrites the previous model.
'upgrade': Saves the model with an incremented version.

Defaults to 'upgrade'.

storage_type{'default', 'HDL'}, optional

Specifies the storage type of the model:

'default' : HANA default storage.
'HDL' : HANA data lake.

forcebool, optional

Drop the existing table if True.

Defaults to False.

save_reportbool, optional

Save the model report if True.

Defaults to False.

save_model_to_files(model, directory, save_report=False, storage_type='default')

Export model to local files.

Parameters:

modela model instance.: The model name and version must have been set before saving the model. The information of name and version will serve as an unique id of a model.
directorystr: The directory to save models.

delete_model(name, version)

Deletes a model with a given name and version.

Parameters:

namestr: The model name.
versionint: The model version.

delete_models(name, start_time=None, end_time=None)

Deletes the model in a batch model with specified time range.

Parameters:

namestr

The model name.

start_timestr, optional

The start timestamp for deleting.

Defaults to None.

end_timestr, optional

The end timestamp for deleting.

Defaults to None.

classmethod load_mlflow_model(connection_context, model_uri): Load mlflow model by given model_uri.

clean_up(): Be cautious! This function will delete all the models and the meta table.

load_model(name, version=None, **kwargs)

Loads an existing model from the SAP HANA database.

Parameters:

namestr: The model name.
versionint, optional: The model version. By default, the last version will be loaded.

Returns:

PAL/APL object: The loaded model ready for use.

display_model_report(name, version=None)

Display model report.

Parameters:

namestr: The model name.
versionint, optional: The model version. By default, the last version will be loaded.

enable_persistent_memory(name, version)

Enable persistent memory.

Parameters:

namestr: The name of the model.
versionint: The model version.

disable_persistent_memory(name, version)

Disable persistent memory.

Parameters:

namestr: The name of the model.
versionint: The model version.

load_into_memory(name, version)

Load a model into the memory.

Parameters:

namestr: The name of the model.
versionint: The model version.

unload_from_memory(name, version, persistent_memory=None)

Unload a model from the memory. The dataset will be loaded back into memory after next query.

Parameters:

namestr

The name of the model.

versionint

The model version.

persistent_memory{'retain', 'delete'}, optional

Only works when persistent memory is enabled.

Defaults to None.

set_data_lake_container(name)

Set HDL container name.

Parameters:

namestr: The name of the HDL container.

set_schedule(name, version, schedule_time, training_dataset_select_statement, init_params=None, fit_params=None, storage_type='default', connection_userkey=None, encrypt=None, sslValidateCertificate=None, server_side_scheduler=False, job_name=None, job_start_time=None, job_end_time=None)

Create the schedule plan.

Parameters:

namestr

The model name.

versionint

The model version.

schedule_timestr

It is valid in {'every x seconds', 'every x minutes', 'every x hours', 'every x weeks'} for client side scheduler.

It uses <cron> for HANA scheduler with <cron> ::= <year> <month> <date> <weekday> <hour> <minute> <seconds> such that

<year> A four-digit number.
<month> A number from 1 to 12.
<date> A number from 1 to 31.
<weekday> A three-character day of the week: mon,tue,wed,thu,fri,sat,sun.
<hour> A number from 0 to 23(expressed in 24-hour format).
<minute> A number from 0 to 59.
<seconds> A number from 0 to 59.

Each <cron> field also supports wildcard characters as follows:

* - Any value.
*/n - Any n-th value. For example, */1 for the day of the month means run every day of the month, */3 means run every third day of the month.
a:b - Any value between a and b.
a:b/n - Any n-th value between a and b. For example, 1:10/3 for the day of the month means every 3rd day between 1 and 10 or the 3rd, 6th, and 9th day of the month.
n.a - (For <weekday> only) A day of the week where n is a number from -5 to 5 for the n-th occurrence of the day in week a. For example, for the year 2019, 2.3 means Tuesday, January 15th. -3.22 means Friday, May 31st.

training_dataset_select_statement: str

The select statement of the training dataset to be scheduled.

init_params: dict, optional

The parameters of the hana_ml object initialization.

fit_params: dict, optional

The parameters of the fit function.

storage_type{'default', 'HDL'}, optional

If 'HDL', the model will be saved in HANA Data Lake.

connection_userkeystr, mandatory for client side scheduler

Userkey generated by HANA hdbuserstore.

server_side_schedulerbool

If True, it will use HANA scheduler.

Defaults to False.

job_start_timestr, optional when server_side_scheduler is True

Specifies the earliest time after which the scheduled job can start to run.

job_end_timestr, optional when server_side_scheduler is True

Specifies the latest time before which the scheduled job can start to run.

display_hana_schedule(name, version)

Display the server-side schedule plan.

Parameters:

namestr: The model name.
versionint: The model version.

start_schedule(name, version)

Execute the schedule plan.

Parameters:

namestr: The model name.
versionint: The model version.

terminate_schedule(name, version)

Execute the schedule plan.

Parameters:

namestr: The model name.
versionint: The model version.

set_logfile(loc): Set log file location.

upgrade_meta(): Upgrade the meta table to the latest changes.