This module provides the features of model storage.
All these features are accessible to the end user via ModelStorage class:
- exception hana_ml.model_storage.ModelStorageError
Exception class used in Model Storage
- class hana_ml.model_storage.ModelStorage(connection_context, schema=None, meta=None)
The ModelStorage class allows users to save, list, load and delete models.
Models are saved into SAP HANA tables in a schema specified by the user. A model is identified with:
A name (string of 255 characters maximum), It must not contain any characters such as coma, semi-colon, tabulation, end-of-line, simple-quote, double-quote (',', ';', '"', ''', 'n', 't').
A version (positive integer starting from 1).
A model can be saved in three ways:
It can be saved for the first time. No model with the same name and version is supposed to exist.
It can be saved as a replacement. If a model with the same name and version already exists, it could be overwritten.
It can be saved with a higher version. The model will be saved with an incremented version number.
Internally, a model is stored as two parts:
The metadata. It contains the model identification (name, version, algorithm class) and also its python model object attributes required for reinstantiation. It is saved in a table named HANAML_MODEL_STORAGE by default.
The back-end model. It consists in the model returned by SAP HANA APL or SAP HANA PAL. For SAP HANA APL, it is always saved into the table HANAMl_APL_MODELS_DEFAULT, while for SAP HANA PAL, a model can be saved into different tables depending on the nature of the specified algorithm.
- Parameters:
- connection_contextConnectionContext
The connection object to a SAP HANA database. It must be the same as the one used by the model.
- schemastr, optional
The schema name where the model storage tables are created.
Defaults to the current schema used by the user.
- metastr, optional
The name of meta table stored in SAP HANA.
Creating and training a model with functions MLPClassifier and AutoClassifier:
Assume the training data is data and the connection to SAP HANA is conn.
>>> model_pal_name = 'MLPClassifier 1' >>> model_pal = MLPClassifier(conn, hidden_layer_size=[10, ], activation='TANH', output_activation='TANH', learning_rate=0.01, momentum=0.001) >>> model_pal.fit(data, label='IS_SETOSA', key='ID')
>>> model_apl_name = 'AutoClassifier 1' >>> model_apl = AutoClassifier(conn_context=conn) >>> model_apl.fit(data, label='IS_SETOSA', key='ID')
Creating an instance of ModelStorage:
>>> MODEL_SCHEMA = 'MODEL_STORAGE' # HANA schema in which models are to be saved >>> model_storage = ModelStorage(connection_context=conn, schema=MODEL_SCHEMA)
Saving these two trained models for the first time:
>>> model_pal.name = model_pal_name >>> model_storage.save_model(model=model_pal) >>> model_apl.name = model_apl_name >>> model_storage.save_model(model=model_apl)
Listing saved models:
>>> print(model_storage.list_models()) NAME VERSION LIBRARY ... 0 AutoClassifier 1 1 APL hana_ml.algorithms.apl ... 1 MLPClassifier 1 1 PAL hana_ml.algorithms.pal ...
Reloading saved models:
>>> model1 = model_storage.load_model(name=model_pal_name, version=1) >>> model2 = model_storage.load_model(name=model_apl_name, version=1)
Using loaded model model2 for new prediction:
>>> out = model2.predict(data=data_test) >>> print(out.head(3).collect()) ID PREDICTED PROBABILITY IS_SETOSA 0 1 True 0.999492 None ... 1 2 True 0.999478 None 2 3 True 0.999460 None
Other examples of functions:
Saving a model by overwriting the original model:
>>> model_storage.save_model(model=model_apl, if_exists='replace') >>> print(list_models = model_storage.list_models(name=model.name)) NAME VERSION LIBRARY ... 0 AutoClassifier 1 1 APL hana_ml.algorithms.apl ...
Saving a model by upgrading the version:
>>> model_storage.save_model(model=model_apl, if_exists='upgrade') >>> print(list_models = model_storage.list_models(name=model.name)) NAME VERSION LIBRARY ... 0 AutoClassifier 1 1 APL hana_ml.algorithms.apl ... 1 AutoClassifier 1 2 APL hana_ml.algorithms.apl ...
Deleting a model with specified version:
>>> model_storage.delete_model(name=model.name, version=model.version)
Deleting models with same model name and different versions:
>>> model_storage.delete_models(name=model.name)
Clean up all models and meta data at once:
>>> model_storage.clean_up()
(name, version, storage_type)Change storage type for model tables.
()Be cautious! This function will delete all the models and the meta table.
(name, version)Deletes a model with a given name and version.
(name[, start_time, end_time])Deletes the model in a batch model with specified time range.
(name, version)Disable persistent memory.
(name, version)Display the server-side schedule plan.
(name[, version])Display model report.
(name, version)Enable persistent memory.
(name, version[, directory])Export model to client.
(name[, version])Get model card.
(path[, model_schema, force, ...])Import model from client to model storage.
([name, version, display_type])Lists existing models.
(name, version)Load a model into the memory.
(connection_context, model_uri)Load mlflow model by given model_uri.
(name[, version])Loads an existing model from the SAP HANA database.
(path[, model_schema, ...])Load model from client and create hana-ml object.
(name, version)Checks if a model with specified name and version already exists.
(model[, if_exists, storage_type, ...])Saves a model.
(model, directory[, ...])Export model to local files.
(name)Set HDL container name.
(loc)Set log file location.
(name, version, schedule_time, ...)Create the schedule plan.
(name, version)Execute the schedule plan.
(name, version)Execute the schedule plan.
(name, version[, ...])Unload a model from the memory.
Upgrade the meta table to the latest changes.
- export_model(name, version, directory=None)
Export model to client.
- Parameters:
- namestr
The model name.
- versionint
The model version.
- directorystr, optional
The directory to be exported.
- Default to the current directory.
- load_model_from_files(path, model_schema=None, use_temporary_table=True, force=False)
Load model from client and create hana-ml object.
- Parameters:
- pathstr
The location of models.
- model_schemastr, optional
The schema to save model tables.
Defaults to the current schema.
- use_temporary_tablebool, optional
Import models to temporary tables or not.
Defaults to True.
- forcebool, optional
If True, it will drop the models with the same table name.
Defaults to False.
- Returns:
- hana-ml object
- import_model(path, model_schema=None, force=False, table_structure=None)
Import model from client to model storage.
- Parameters:
- pathstr
The location of models.
- model_schemastr, optional
The schema to save model tables.
Default to the schema of the model storage.
- forcebool, optional
If True, it will drop the models with the same name and version in the model storage.
Default to False.
- list_models(name=None, version=None, display_type='complete')
Lists existing models.
- Parameters:
- namestr, optional
The model name pattern to be matched. The pattern here follows SQL string pattern management and wildcard characters such as % (matching any number of characters) and _ (matching a single character) are supported.
For example, to list models that start with the word "HGBT":
>>> model_storage = ModelStorage(connection_context=conn) >>> model_storage.list_models(name="HGBT%")
Defaults to None.
- versionint, optional
The model version.
Defaults to None.
- display_type: {'complete', 'simple', 'no_reports'}, optional
Whether partially fetch the model information. - 'complete': fetch all the information. - 'simple': exclude JSON and MODEL_REPORT columns. - 'no_reports': exclude MODEL_REPORT column.
Defaults to 'complete'.
- Returns:
- pandas.DataFrame
The model metadata matching the provided name and version.
- model_already_exists(name, version)
Checks if a model with specified name and version already exists.
- Parameters:
- namestr
The model name.
- versionint
The model version.
- Returns:
- bool
If True, there is already a model with the same name and version. If False, there is no model with the same name.
- change_storage_type(name, version, storage_type)
Change storage type for model tables.
- Parameters:
- namestr
The name of model.
- versionstr
The version of model.
- storage_type{'default', 'HDL'}
- Specifies the storage type of the model:
'default' : HANA default storage.
'HDL' : HANA data lake.
- save_model(model, if_exists='upgrade', storage_type='default', force=False, save_report=False)
Saves a model.
- Parameters:
- modela model instance.
The model name must have been set before saving the model. The information of name and version will serve as an unique id of a model.
- if_existsstr, optional
- Specifies the behavior how a model is saved if a model with same name/version already exists:
'fail': Raises an Error.
'replace': Overwrites the previous model.
'upgrade': Saves the model with an incremented version.
Defaults to 'upgrade'.
- storage_type{'default', 'HDL'}, optional
- Specifies the storage type of the model:
'default' : HANA default storage.
'HDL' : HANA data lake.
- forcebool, optional
Drop the existing table if True.
Defaults to False.
- save_reportbool, optional
Save the model report if True.
Defaults to False.
- save_model_to_files(model, directory, save_report=False, storage_type='default')
Export model to local files.
- Parameters:
- modela model instance.
The model name and version must have been set before saving the model. The information of name and version will serve as an unique id of a model.
- directorystr
The directory to save models.
- delete_model(name, version)
Deletes a model with a given name and version.
- Parameters:
- namestr
The model name.
- versionint
The model version.
- delete_models(name, start_time=None, end_time=None)
Deletes the model in a batch model with specified time range.
- Parameters:
- namestr
The model name pattern to be matched. The pattern here follows SQL string pattern management and wildcard characters such as % (matching any number of characters) and _ (matching a single character) are supported.
- start_timestr, optional
The start timestamp for deleting.
Defaults to None.
- end_timestr, optional
The end timestamp for deleting.
Defaults to None.
- classmethod load_mlflow_model(connection_context, model_uri)
Load mlflow model by given model_uri.
- clean_up()
Be cautious! This function will delete all the models and the meta table.
- load_model(name, version=None, **kwargs)
Loads an existing model from the SAP HANA database.
- Parameters:
- namestr
The model name.
- versionint, optional
The model version. By default, the last version will be loaded.
- Returns:
- PAL/APL object
The loaded model ready for use.
- get_model_card(name, version=None)
Get model card.
- Parameters:
- namestr
The model name.
- versionint, optional
The model version. By default, the last version will be loaded.
- display_model_report(name, version=None)
Display model report.
- Parameters:
- namestr
The model name.
- versionint, optional
The model version. By default, the last version will be loaded.
- enable_persistent_memory(name, version)
Enable persistent memory.
- Parameters:
- namestr
The name of the model.
- versionint
The model version.
- disable_persistent_memory(name, version)
Disable persistent memory.
- Parameters:
- namestr
The name of the model.
- versionint
The model version.
- load_into_memory(name, version)
Load a model into the memory.
- Parameters:
- namestr
The name of the model.
- versionint
The model version.
- unload_from_memory(name, version, persistent_memory=None)
Unload a model from the memory. The dataset will be loaded back into memory after next query.
- Parameters:
- namestr
The name of the model.
- versionint
The model version.
- persistent_memory{'retain', 'delete'}, optional
Only works when persistent memory is enabled.
Defaults to None.
- set_data_lake_container(name)
Set HDL container name.
- Parameters:
- namestr
The name of the HDL container.
- set_schedule(name, version, schedule_time, training_dataset_select_statement, init_params=None, fit_params=None, storage_type='default', connection_userkey=None, encrypt=None, sslValidateCertificate=None, server_side_scheduler=True, job_name=None, job_start_time=None, job_end_time=None, procedure_name=None, procedure_schema=None)
Create the schedule plan.
- Parameters:
- namestr
The model name.
- versionint
The model version.
- schedule_timestr
It is valid in {'every x seconds', 'every x minutes', 'every x hours', 'every x weeks'} for client side scheduler.
It uses <cron> for HANA scheduler with
<cron> ::= <year> <month> <date> <weekday> <hour> <minute> <seconds>
such that<year>
A four-digit number.<month>
A number from 1 to 12.<date>
A number from 1 to 31.<weekday>
A three-character day of the week: mon,tue,wed,thu,fri,sat,sun.<hour>
A number from 0 to 23(expressed in 24-hour format).<minute>
A number from 0 to 59.<seconds>
A number from 0 to 59.
field also supports wildcard characters as follows:* - Any value.
*/n - Any n-th value. For example, */1 for the day of the month means run every day of the month, */3 means run every third day of the month.
a:b - Any value between a and b.
a:b/n - Any n-th value between a and b. For example, 1:10/3 for the day of the month means every 3rd day between 1 and 10 or the 3rd, 6th, and 9th day of the month.
n.a - (For
only) A day of the week where n is a number from -5 to 5 for the n-th occurrence of the day in week a. For example, for the year 2019, 2.3 means Tuesday, January 15th. -3.22 means Friday, May 31st.
- training_dataset_select_statement: str
The select statement of the training dataset to be scheduled.
- init_params: dict, optional
The parameters of the hana_ml object initialization.
- fit_params: dict, optional
The parameters of the fit function.
- storage_type{'default', 'HDL'}, optional
If 'HDL', the model will be saved in HANA Data Lake.
- connection_userkeystr, mandatory for client side scheduler
Userkey generated by HANA hdbuserstore.
- server_side_schedulerbool
If True, it will use HANA scheduler.
Defaults to True.
- job_namestr
It indicates the scheduled job name in HANA scheduler and it must be set when HANA scheduler is used.
No Default Value.
- job_start_timestr, optional when server_side_scheduler is True
Specifies the earliest time after which the scheduled job can start to run.
- job_end_timestr, optional when server_side_scheduler is True
Specifies the latest time before which the scheduled job can start to run.
- procedure_namestr, optional
Specifies the name of the procedure in the scheduled job. If not specified, it will use "PROC_<job_name>".
- procedure_schemastr, optional
Specifies the schema of the procedure in the scheduled job. If not specified, it will use the current schema.
- display_hana_schedule(name, version)
Display the server-side schedule plan.
- Parameters:
- namestr
The model name.
- versionint
The model version.
- start_schedule(name, version)
Execute the schedule plan.
- Parameters:
- namestr
The model name.
- versionint
The model version.
- terminate_schedule(name, version)
Execute the schedule plan.
- Parameters:
- namestr
The model name.
- versionint
The model version.
- set_logfile(loc)
Set log file location.
- upgrade_meta()
Upgrade the meta table to the latest changes.