hana_ml.artifacts package

The artifacts package consists of the following sections:

The hana_ml.artifacts.generators.abap module provides various methods which helps you to embed machine learning algorithms of SAP HANA (e.g. Predictive Analysis Library (PAL)) via the Python API into SAP S/4HANA business applications with Intelligent Scenario Lifecycle Management (ISLM) framework. The ISLM framework is integrated into the ABAP layer (SAP Basis) so that the intelligent scenarios from above layers in SAP S/4HANA stack can utilize the framework completely. Specifically, a custom ABAP Managed Database Procedure (AMDP) class for a machine learning model needs to be created which can be consumed by ISLM.

Suppose you have a machine learning model developed in hana-ml and decide to embed it into a SAP S/4HANA business application. First, you can create an AMDPGenerator to establish a corresponding AMDP class and then import the generated ABAP class code within your ABAP development environment. Then the ABAP class can be used within an Intelligent Scenario managed by ISLM. Within ISLM, you can perform operations, such as training, activating, and monitoring of the intelligent scenario for a specific SAP S/4HANA.

Note

SAP S/4HANA System Requirement: S/4HANA 2020 FPS1 or higher.

Supported hana-ml algorithm for AMDP: UnifiedClassification.

AMDP Examples

Let's assume we have a connection to SAP HANA called connection_context and a basic Random Decision Trees Classifier 'rfc' with training data 'diabetes_train_valid' and prediction data 'diabetes_test'. Remember that every model has to contain fit and predict logic, therefore the methods fit() and predict() have to be called at least once. Note that we also need to enable the sql tracer before the training.

>>> connection_context.sql_tracer.enable_sql_trace(True)
>>> connection_context.sql_tracer.enable_trace_history(True)
>>> rfc_params = dict(n_estimators=5, split_threshold=0, max_depth=10)
>>> rfc = UnifiedClassification(func="randomdecisiontree", **rfc_params)
>>> rfc.fit(diabetes_train_valid,
            key='ID',
            label='CLASS',
            categorical_variable=['CLASS'],
            partition_method='stratified',
            stratified_column='CLASS')
>>> rfc.predict(diabetes_test.drop(cols=['CLASS']), key="ID")

Then, generate abap managed database procedures (AMDP) artifact by creating an AMDPGenerator:

>>> generator = AMDPGenerator(project_name="PIMA_DIAB", version="1", connection_context=connection_context, outputdir="out/")
>>> generator.generate()

The generate() process creates a .abap file on your local machine based on the work that was done previously. This .abap file contains the SQL logic wrapped in AMDPs you have created by interacting with the hana-ml package.

hana_ml.artifacts.generators.abap

This module handles generation of all AMDP(ABAP Managed Database Procedure) related artifacts based on the provided consumption layer elements. Currently this is experimental code only.

The following class is available:

class hana_ml.artifacts.generators.abap.AMDPGenerator(project_name, version, connection_context, outputdir)

Bases: object

This class provides AMDP(ABAP Managed Database Procedure) specific generation functionality. It also extends the config to cater for AMDP generation specific config.

Note

Supported hana-ml algorithm for AMDP: UnifiedClassification.

Parameters:
project_namestr

Name of the project.

versionstr

The version.

connection_contextstr

The connection to the SAP HANA.

outputdirstr

The directory of output.

Examples

Let's assume we have a connection to SAP HANA called connection_context and a basic Random Decision Trees Classifier 'rfc' with training data 'diabetes_train_valid' and prediction data 'diabetes_test'. Remember that every model has to contain fit and predict logic, therefore the methods fit() and predict() have to be called at least once.

>>> rfc_params = dict(n_estimators=5, split_threshold=0, max_depth=10)
>>> rfc = UnifiedClassification(func="randomdecisiontree", **rfc_params)
>>> rfc.fit(diabetes_train_valid,
            key='ID',
            label='CLASS',
            categorical_variable=['CLASS'],
            partition_method='stratified',
            stratified_column='CLASS')
>>> rfc.predict(diabetes_test.drop(cols=['CLASS']), key="ID")

Then, generate abap managed database procedures (AMDP) artifact by creating an AMDPGenerator:

>>> generator = AMDPGenerator(project_name="PIMA_DIAB", version="1", connection_context=connection_context, outputdir="out/")
>>> generator.generate()

The generate() process creates a .abap file on your local machine based on the work that was done previously. This .abap file contains the SQL logic wrapped in AMDPs you have created by interacting with the hana-ml package.

Methods

generate([training_dataset, apply_dataset, ...])

Generate artifacts by first building up the required folder structure for artifacts storage and then generating different required files.

generate(training_dataset='', apply_dataset='', no_reason_features=3)

Generate artifacts by first building up the required folder structure for artifacts storage and then generating different required files.

Parameters:
training_datasetstr, optional

Name of training dataset.

Defaults to ''.

apply_datasetstr, optional

Name of apply dataset.

Defaults to ''.

no_reason_featuresint, optional

The number of features that contribute to the classification decision the most. This reason code information is to be displayed during the prediction phase.

Defaults to 3.

hana_ml.artifacts.generators.hana

This module handles generation of all HANA design-time artifacts based on the provided base and consumption layer elements. These artifacts can incorporate into development projects in SAP Web IDE for SAP HANA or SAP Business Application Studio and be deployed via HANA Deployment Infrastructure (HDI) into a SAP HANA system.

The following class is available:

class hana_ml.artifacts.generators.hana.HANAGeneratorForCAP(project_name, output_dir, namespace=None)

Bases: object

HANA artifacts generator for the existing CAP project.

Parameters:
project_namestr

The name of project.

outputdirstr

The directory of output.

namespacestr, optional

Specifies the namespace for the project.

Defaults to "hana.ml".

Examples

>>> my_pipeline = Pipeline([
                    ('PCA', PCA(scaling=True, scores=True)),
                    ('HGBT_Classifier', HybridGradientBoostingClassifier(
                                            n_estimators=4, split_threshold=0,
                                            learning_rate=0.5, fold_num=5,
                                            max_depth=6))])
>>> my_pipeline.fit(diabetes_train, key="ID", label="CLASS")
>>> my_pipeline.predict(diabetes_test_m, key="ID")
>>> hanagen = HANAGeneratorForCAP(project_name="my_proj",
                                  output_dir=".",
                                  namespace="hana.ml")
>>> hanagen.generate_artifacts(my_pipeline)

Methods

generate_artifacts(obj[, cds_gen, ...])

Generate CAP artifacts.

materialize_ds_data([to_materialize])

Create input table for the input dataframe.

materialize_ds_data(to_materialize=True)

Create input table for the input dataframe.

Parameters:
to_materializebool, optional

If True, the input dataframe will be materialized.

Defaults to True.

generate_artifacts(obj, cds_gen=False, model_position=None, tudf=False)

Generate CAP artifacts.

Parameters:
objhana-ml object

The hana-ml object that has generated the execution statement.

cds_genbool, optional

Control whether to allow Python client to generate HANA tables, procedures, and so on. If True, it will generate HANA artifacts from cds.

Defaults to False.

model_positionbool or dict, optional

Specifies the model table position from the procedure outputs and the procedure inputs such that {"out": 0, "in" : 1}. If True, the model position {"out": 0, "in" : 1} will be used.

Defaults to None.

tudfbool, optional

If True, it will generate a table UDF for applying. Defaults to False.

class hana_ml.artifacts.generators.hana.HanaGenerator(project_name, version, grant_service, connection_context, outputdir, generation_merge_type=1, generation_group_type=12, sda_grant_service=None, remote_source='')

Bases: object

This class provides HANA specific generation functionality. It also extends the config file to cater for HANA specific config generation.

Parameters:
project_namestr

The name of project.

versionstr

The version name.

grant_servicestr

The grant service.

connection_contextstr

The connection to the SAP HANA.

outputdirstr

The directory of output.

generation_merge_typeint, optional

Merge type is which operations should be merged together. There are at this stage only 2 options:

  • 1: GENERATION_MERGE_NONE: All operations are generated separately (i.e. individual procedures in HANA)

  • 2: GENERATION_MERGE_PARTITION: A partition operation is merged into the respective related operation and generated as 1 (i.e. procedure in HANA).

Defaults to 1.

generation_group_typeint, optional
  • 11: GENERATION_GROUP_NONE # No grouping is applied. This means that solution specific implementation will define how to deal with this

  • 12: GENERATION_GROUP_FUNCTIONAL # Grouping is based on functional grouping. Meaning that logical related elements such as partition / fit / and related score will be put together.

Defaults to 12.

sda_grant_service: str, optional

The grant service of Smart Data Access (SDA).

Defaults to None.

remote_sourcestr, optional

The name of remote source.

Defaults to ''.

Examples

Let's assume we have a connection to SAP HANA called connection_context and a basic Random Decision Trees Classifier 'rfc' with training data 'diabetes_train_valid' and prediction data 'diabetes_test'.

>>> rfc_params = dict(n_estimators=5, split_threshold=0, max_depth=10)
>>> rfc = UnifiedClassification(func="randomdecisiontree", **rfc_params)
>>> rfc.fit(diabetes_train_valid,
            key='ID',
            label='CLASS',
            categorical_variable=['CLASS'],
            partition_method='stratified',
            stratified_column='CLASS',)
>>> rfc.predict(diabetes_test.drop(cols=['CLASS']), key="ID")

Then, we could generate HDI artifacts:

>>> hg = hana.HanaGenerator(project_name="test", version='1', grant_service='', connection_context=connection_context, outputdir="./hana_out")
>>> hg.generate_artifacts()

Returns a output path of the root folder where the hana related artifacts are stored:

>>> './hana_out\test\hana'

Methods

generate_artifacts([base_layer, ...])

Generate the artifacts by first building up the required folder structure for artifacts storage and then generating the different required files.

generate_artifacts(base_layer=True, consumption_layer=True, sda_data_source_mapping_only=False)

Generate the artifacts by first building up the required folder structure for artifacts storage and then generating the different required files. Be aware that this method only generates the generic files and offloads the generation of artifacts where traversal of base and consumption layer elements is required.

Parameters:
base_layerbool, optional

The base layer is the low level procedures that will be generated.

Defaults to True.

consumption_layerbool, optional

The consumption layer is the layer that will consume the base layer artifacts.

Defaults to True.

sda_data_source_mapping_onlybool, optional

In case data source mapping is provided, you can force to only do this for the Smart Data Access (SDA) HANA deployment infrastructure (HDI) container.

Defaults to False.

Returns:
str

Return the output path of the root folder where the hana related artifacts are stored.