hana_ml.artifacts package
The artifacts package consists of the following sections:
The hana_ml.artifacts.generators.abap module provides various methods which helps you to embed machine learning algorithms of SAP HANA (e.g. Predictive Analysis Library (PAL)) via the Python API into SAP S/4HANA business applications with Intelligent Scenario Lifecycle Management (ISLM) framework. The ISLM framework is integrated into the ABAP layer (SAP Basis) so that the intelligent scenarios from above layers in SAP S/4HANA stack can utilize the framework completely. Specifically, a custom ABAP Managed Database Procedure (AMDP) class for a machine learning model needs to be created which can be consumed by ISLM.
Suppose you have a machine learning model developed in hana-ml and decide to embed it into a SAP S/4HANA business application. First, you can create an AMDPGenerator to establish a corresponding AMDP class and then import the generated ABAP class code within your ABAP development environment. Then the ABAP class can be used within an Intelligent Scenario managed by ISLM. Within ISLM, you can perform operations, such as training, activating, and monitoring of the intelligent scenario for a specific SAP S/4HANA.
Note
SAP S/4HANA System Requirement: S/4HANA 2020 FPS1 or higher.
Supported hana-ml algorithm for AMDP: UnifiedClassification.
AMDP Examples
Let's assume we have a connection to SAP HANA called connection_context and a basic Random Decision Trees Classifier 'rfc' with training data 'diabetes_train_valid' and prediction data 'diabetes_test'. Remember that every model has to contain fit and predict logic, therefore the methods fit() and predict() have to be called at least once. Note that we also need to enable the sql tracer before the training.
>>> connection_context.sql_tracer.enable_sql_trace(True)
>>> connection_context.sql_tracer.enable_trace_history(True)
>>> rfc_params = dict(n_estimators=5, split_threshold=0, max_depth=10)
>>> rfc = UnifiedClassification(func="randomdecisiontree", **rfc_params)
>>> rfc.fit(diabetes_train_valid,
key='ID',
label='CLASS',
categorical_variable=['CLASS'],
partition_method='stratified',
stratified_column='CLASS')
>>> rfc.predict(diabetes_test.drop(cols=['CLASS']), key="ID")
Then, generate abap managed database procedures (AMDP) artifact by creating an AMDPGenerator:
>>> generator = AMDPGenerator(project_name="PIMA_DIAB", version="1", connection_context=connection_context, outputdir="out/")
>>> generator.generate()
The generate() process creates a .abap file on your local machine based on the work that was done previously. This .abap file contains the SQL logic wrapped in AMDPs you have created by interacting with the hana-ml package.
hana_ml.artifacts.generators.abap
This module handles generation of all AMDP(ABAP Managed Database Procedure) related artifacts based on the provided consumption layer elements. Currently this is experimental code only.
The following class is available:
- class hana_ml.artifacts.generators.abap.AMDPGenerator(project_name, version, connection_context, outputdir)
Bases:
object
This class provides AMDP(ABAP Managed Database Procedure) specific generation functionality. It also extends the config to cater for AMDP generation specific config.
Note
Supported hana-ml algorithm for AMDP: UnifiedClassification.
- Parameters:
- project_namestr
Name of the project.
- versionstr
The version.
- connection_contextstr
The connection to the SAP HANA.
- outputdirstr
The directory of output.
Examples
Let's assume we have a connection to SAP HANA called connection_context and a basic Random Decision Trees Classifier 'rfc' with training data 'diabetes_train_valid' and prediction data 'diabetes_test'. Remember that every model has to contain fit and predict logic, therefore the methods fit() and predict() have to be called at least once.
>>> rfc_params = dict(n_estimators=5, split_threshold=0, max_depth=10) >>> rfc = UnifiedClassification(func="randomdecisiontree", **rfc_params) >>> rfc.fit(diabetes_train_valid, key='ID', label='CLASS', categorical_variable=['CLASS'], partition_method='stratified', stratified_column='CLASS') >>> rfc.predict(diabetes_test.drop(cols=['CLASS']), key="ID")
Then, generate abap managed database procedures (AMDP) artifact by creating an AMDPGenerator:
>>> generator = AMDPGenerator(project_name="PIMA_DIAB", version="1", connection_context=connection_context, outputdir="out/") >>> generator.generate()
The generate() process creates a .abap file on your local machine based on the work that was done previously. This .abap file contains the SQL logic wrapped in AMDPs you have created by interacting with the hana-ml package.
Methods
generate
([training_dataset, apply_dataset, ...])Generate artifacts by first building up the required folder structure for artifacts storage and then generating different required files.
- generate(training_dataset='', apply_dataset='', no_reason_features=3)
Generate artifacts by first building up the required folder structure for artifacts storage and then generating different required files.
- Parameters:
- training_datasetstr, optional
Name of training dataset.
Defaults to ''.
- apply_datasetstr, optional
Name of apply dataset.
Defaults to ''.
- no_reason_featuresint, optional
The number of features that contribute to the classification decision the most. This reason code information is to be displayed during the prediction phase.
Defaults to 3.
hana_ml.artifacts.generators.hana
This module handles generation of all HANA design-time artifacts based on the provided base and consumption layer elements. These artifacts can incorporate into development projects in SAP Web IDE for SAP HANA or SAP Business Application Studio and be deployed via HANA Deployment Infrastructure (HDI) into a SAP HANA system.
The following class is available:
- class hana_ml.artifacts.generators.hana.HANAGeneratorForCAP(project_name, output_dir, namespace=None)
Bases:
object
HANA artifacts generator for the existing CAP project.
- Parameters:
- project_namestr
The name of project.
- outputdirstr
The directory of output.
- namespacestr, optional
Specifies the namespace for the project.
Defaults to "hana.ml".
Examples
>>> my_pipeline = Pipeline([ ('PCA', PCA(scaling=True, scores=True)), ('HGBT_Classifier', HybridGradientBoostingClassifier( n_estimators=4, split_threshold=0, learning_rate=0.5, fold_num=5, max_depth=6))]) >>> my_pipeline.fit(diabetes_train, key="ID", label="CLASS") >>> my_pipeline.predict(diabetes_test_m, key="ID") >>> hanagen = HANAGeneratorForCAP(project_name="my_proj", output_dir=".", namespace="hana.ml") >>> hanagen.generate_artifacts(my_pipeline)
Methods
generate_artifacts
(obj[, cds_gen, ...])Generate CAP artifacts.
materialize_ds_data
([to_materialize])Create input table for the input dataframe.
- materialize_ds_data(to_materialize=True)
Create input table for the input dataframe.
- Parameters:
- to_materializebool, optional
If True, the input dataframe will be materialized.
Defaults to True.
- generate_artifacts(obj, cds_gen=False, model_position=None, tudf=False)
Generate CAP artifacts.
- Parameters:
- objhana-ml object
The hana-ml object that has generated the execution statement.
- cds_genbool, optional
Control whether to allow Python client to generate HANA tables, procedures, and so on. If True, it will generate HANA artifacts from cds.
Defaults to False.
- model_positionbool or dict, optional
Specifies the model table position from the procedure outputs and the procedure inputs such that {"out": 0, "in" : 1}. If True, the model position {"out": 0, "in" : 1} will be used.
Defaults to None.
- tudfbool, optional
If True, it will generate a table UDF for applying. Defaults to False.
- class hana_ml.artifacts.generators.hana.HanaGenerator(project_name, version, grant_service, connection_context, outputdir, generation_merge_type=1, generation_group_type=12, sda_grant_service=None, remote_source='')
Bases:
object
This class provides HANA specific generation functionality. It also extends the config file to cater for HANA specific config generation.
- Parameters:
- project_namestr
The name of project.
- versionstr
The version name.
- grant_servicestr
The grant service.
- connection_contextstr
The connection to the SAP HANA.
- outputdirstr
The directory of output.
- generation_merge_typeint, optional
Merge type is which operations should be merged together. There are at this stage only 2 options:
1: GENERATION_MERGE_NONE: All operations are generated separately (i.e. individual procedures in HANA)
2: GENERATION_MERGE_PARTITION: A partition operation is merged into the respective related operation and generated as 1 (i.e. procedure in HANA).
Defaults to 1.
- generation_group_typeint, optional
11: GENERATION_GROUP_NONE # No grouping is applied. This means that solution specific implementation will define how to deal with this
12: GENERATION_GROUP_FUNCTIONAL # Grouping is based on functional grouping. Meaning that logical related elements such as partition / fit / and related score will be put together.
Defaults to 12.
- sda_grant_service: str, optional
The grant service of Smart Data Access (SDA).
Defaults to None.
- remote_sourcestr, optional
The name of remote source.
Defaults to ''.
Examples
Let's assume we have a connection to SAP HANA called connection_context and a basic Random Decision Trees Classifier 'rfc' with training data 'diabetes_train_valid' and prediction data 'diabetes_test'.
>>> rfc_params = dict(n_estimators=5, split_threshold=0, max_depth=10) >>> rfc = UnifiedClassification(func="randomdecisiontree", **rfc_params) >>> rfc.fit(diabetes_train_valid, key='ID', label='CLASS', categorical_variable=['CLASS'], partition_method='stratified', stratified_column='CLASS',) >>> rfc.predict(diabetes_test.drop(cols=['CLASS']), key="ID")
Then, we could generate HDI artifacts:
>>> hg = hana.HanaGenerator(project_name="test", version='1', grant_service='', connection_context=connection_context, outputdir="./hana_out")
>>> hg.generate_artifacts()
Returns a output path of the root folder where the hana related artifacts are stored:
>>> './hana_out\test\hana'
Methods
generate_artifacts
([base_layer, ...])Generate the artifacts by first building up the required folder structure for artifacts storage and then generating the different required files.
- generate_artifacts(base_layer=True, consumption_layer=True, sda_data_source_mapping_only=False)
Generate the artifacts by first building up the required folder structure for artifacts storage and then generating the different required files. Be aware that this method only generates the generic files and offloads the generation of artifacts where traversal of base and consumption layer elements is required.
- Parameters:
- base_layerbool, optional
The base layer is the low level procedures that will be generated.
Defaults to True.
- consumption_layerbool, optional
The consumption layer is the layer that will consume the base layer artifacts.
Defaults to True.
- sda_data_source_mapping_onlybool, optional
In case data source mapping is provided, you can force to only do this for the Smart Data Access (SDA) HANA deployment infrastructure (HDI) container.
Defaults to False.
- Returns:
- str
Return the output path of the root folder where the hana related artifacts are stored.