Pipeline
- class hana_ml.algorithms.pal.pipeline.Pipeline(steps)
Pipeline construction to run transformers and estimators sequentially.
- Parameters
- steplist
List of (name, transform) tuples that are chained. The last object should be an estimator.
- Attributes
fit_hdbprocedure
Returns the generated hdbprocedure for fit.
predict_hdbprocedure
Returns the generated hdbprocedure for predict.
Methods
abap_class_mapping
(value)Mapping the abap class.
add_amdp_item
(template_key, value)Add item.
add_amdp_name
(amdp_name)Add AMDP name.
add_amdp_template
(template_name)Add AMDP template
After add_item, generate amdp file from template.
create_amdp_class
(amdp_name, ...)Create AMDP class file.
It will disable mlflow autologging.
enable_mlflow_autologging
([schema, meta, ...])It will enable mlflow autologging.
fit
(data[, key, features, label, ...])Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.
fit_predict
(data[, apply_data, fit_params, ...])Fit all the transforms one after the other and transform the data, then fit_predict the transformed data using the last estimator.
fit_transform
(data[, fit_params])Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.
Generate the json formatted pipeline for auto-ml's pipeline_fit function.
Get AMDP not fillin keys.
Load ABAP class mapping.
load_amdp_template
(template_name)Load AMDP template
plot
([name, iframe_height])Plot pipeline.
predict
(data[, key, features, model])Predict function for AutoML.
write_amdp_file
([filepath, version, outdir])Write template to file.
- enable_mlflow_autologging(schema=None, meta=None, is_exported=False, registered_model_name=None)
It will enable mlflow autologging. Only works for fit function.
- Parameters
- schemastr, optional
Define the model storage schema for mlflow autologging.
Defaults to the current schema.
- metastr, optional
Define the model storage meta table for mlflow autologging.
Defaults to 'HANAML_MLFLOW_MODEL_STORAGE'.
- is_exportedbool, optional
Determine whether export the HANA model to mlflow.
Defaults to False.
- registered_model_namestr, optional
MLFlow registered_model_name.
- disable_mlflow_autologging()
It will disable mlflow autologging.
- fit_transform(data, fit_params=None)
Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.
- Parameters
- dataDataFrame
SAP HANA DataFrame to be transformed in the pipeline.
- fit_paramsdict
The parameters corresponding to the transformers/estimator name where each parameter name is prefixed such that parameter p for step s has key s__p.
- Returns
- DataFrame
Transformed SAP HANA DataFrame.
Examples
>>> my_pipeline = Pipeline([ ('pca', PCA(scaling=True, scores=True)), ('imputer', Imputer(strategy='mean')) ]) >>> fit_params = {'pca__key': 'ID', 'pca__label': 'CLASS'} >>> my_pipeline.fit_transform(data=train_data, fit_params=fit_params)
- fit(data, key=None, features=None, label=None, fit_params=None, categorical_variable=None, generate_json_pipeline=False, use_pal_pipeline_fit=True, endog=None, exog=None, model_table_name=None)
Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.
- Parameters
- dataDataFrame
SAP HANA DataFrame to be transformed in the pipeline.
- keystr, optional
Name of the ID column.
If
key
is not provided, then:if
data
is indexed by a single column, thenkey
defaults to that index column;otherwise, it is assumed that
data
contains no ID column.
- featureslist of str, optional
Names of the feature columns.
If
features
is not provided, it defaults to all non-ID, non-label columns.- labelstr, optional
Name of the dependent variable.
Defaults to the name of the last non-ID column.
- fit_paramsdict, optional
Parameters corresponding to the transformers/estimator name where each parameter name is prefixed such that parameter p for step s has key s__p.
- categorical_variablestr or list of str, optional
Specify INTEGER column(s) that should be be treated as categorical data. Other INTEGER columns will be treated as continuous.
- generate_json_pipelinebool, optional
Help generate json formatted pipeline.
Defaults to False.
- use_pal_pipeline_fitbool, optional
Use PAL's pipeline fit function instead of the original chain execution.
Defaults to True.
- endogstr, optional
Specifies the endogenous variable in time-series data. Please use
endog
instead oflabel
ifdata
is time-series data.Defaults to the name of 1st non-key column in
data
.- exogListOfStrings or str, optional
Specifies the exogenous variables in time-series data. Please use
exog
instead offeatures
ifdata
is time-series data.Defaults to
the list of names of all non-key, non-endog columns in
data
if final estimator is not ExponentialSmoothing based[] otherwise.
- model_table_namestr, optional
Specifies the HANA model table name instead of the generated temporary table.
Defaults to None.
- Returns
- DataFrame
Transformed SAP HANA DataFrame.
Examples
>>> my_pipeline = Pipeline([ ('pca', PCA(scaling=True, scores=True)), ('imputer', Imputer(strategy='mean')), ('hgbt', HybridGradientBoostingClassifier( n_estimators=4, split_threshold=0, learning_rate=0.5, fold_num=5, max_depth=6, cross_validation_range=cv_range)) ]) >>> fit_params = {'pca__key': 'ID', 'pca__label': 'CLASS', 'hgbt__key': 'ID', 'hgbt__label': 'CLASS', 'hgbt__categorical_variable': 'CLASS'} >>> hgbt_model = my_pipeline.fit(data=train_data, fit_params=fit_params)
- predict(data, key=None, features=None, model=None)
Predict function for AutoML.
- Parameters
- dataDataFrame
Data to be predicted.
- keystr, optional
Name of the ID column.
Mandatory if
data
is not indexed, or is indexed by multiple columns.Defaults to the index of
data
ifdata
is indexed by a single column.- featureslist of str, optional
Names of the feature columns.
If
features
is not provided, it defaults to all non-ID columns.- modelDataFrame, optional
The model to be used for prediction.
Defaults to the fitted model (model_).
- Returns
- DataFrame
Predicted result, structured as follows:
1st column: Data type and name same as the 1st column of
data
.2nd column: SCORE, predicted values(for regression) or class labels(for classification).
- fit_predict(data, apply_data=None, fit_params=None, predict_params=None)
Fit all the transforms one after the other and transform the data, then fit_predict the transformed data using the last estimator.
- Parameters
- dataDataFrame
SAP HANA DataFrame to be transformed in the pipeline.
- apply_dataDataFrame
SAP HANA DataFrame to be predicted in the pipeline.
- fit_paramsdict, optional
Parameters corresponding to the transformers/estimator name where each parameter name is prefixed such that parameter p for step s has key s__p.
- predict_paramsdict, optional
Parameters corresponding to the predictor name where each parameter name is prefixed such that parameter p for step s has key s__p.
- Returns
- DataFrame
Transformed SAP HANA DataFrame.
Examples
>>> my_pipeline = Pipeline([ ('pca', PCA(scaling=True, scores=True)), ('imputer', Imputer(strategy='mean')), ('hgbt', HybridGradientBoostingClassifier( n_estimators=4, split_threshold=0, learning_rate=0.5, fold_num=5, max_depth=6, cross_validation_range=cv_range)) ]) >>> fit_params = {'pca__key': 'ID', 'pca__label': 'CLASS', 'hgbt__key': 'ID', 'hgbt__label': 'CLASS', 'hgbt__categorical_variable': 'CLASS'} >>> hgbt_model = my_pipeline.fit_predict(data=train_data, apply_data=test_data, fit_params=fit_params)
- plot(name='my_pipeline', iframe_height=450)
Plot pipeline.
- generate_json_pipeline()
Generate the json formatted pipeline for auto-ml's pipeline_fit function.
- create_amdp_class(amdp_name, training_dataset, apply_dataset)
Create AMDP class file. Then build_amdp_class can be called to generate amdp class.
- Parameters
- training_datasetstr
Name of training dataset.
- apply_datasetstr
Name of apply dataset.
- abap_class_mapping(value)
Mapping the abap class.
- add_amdp_item(template_key, value)
Add item.
- add_amdp_name(amdp_name)
Add AMDP name.
- add_amdp_template(template_name)
Add AMDP template
- build_amdp_class()
After add_item, generate amdp file from template.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- get_amdp_notfillin_key()
Get AMDP not fillin keys.
- load_abap_class_mapping()
Load ABAP class mapping.
- load_amdp_template(template_name)
Load AMDP template
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.
- write_amdp_file(filepath=None, version=1, outdir='out')
Write template to file.