Changelog

What's New and Changed in version 2.13.220701

Bug fixes:
  • Fixed table name too long in model storage save function.

  • Fixed mlflow autologging with additional fit parameters.

  • Fixed no mlflow model info display issue.

  • Fixed metric sampling for model report.

  • Fixed wrong schedule template in model storage.

What's New and Changed in version 2.13.220608

Bug fixes:
  • fixed identifier length too long issue for function outputs.

What's New and Changed in version 2.13.220511

New functions:
  • Added upsert/update streams data in dataframe function.

  • Added stationarity_test function.

  • Added CrostonTSB function.

  • Added get_temporary_tables and clean_up_temporary_tables functions.

  • Added Pipeline class json outputs for auto-ml pipeline_fit.

  • Added EDA for time series data.
    • Added plot_pacf, plot_acf

    • Added plot_moving_average

    • Added plot_rolling_stddev

    • Added seasonal_plot

    • Added timeseries_box_plot

    • Added plot_seasonal_decompose

    • Added quarter_plot

  • Added rolling window in generate_feature function.

  • Added get_connection_id, restart_session and cancel_session_operation in dataframe function.

Ehancement:
  • Added support of the following collection of new parameters for HGBT in UnifiedClassification & UnifiedRegression: 'replacemissing', 'default_missing_direction', 'feature_grouping', 'tol_rate', 'compression', 'max_bin_num'.

  • Improved the performance of box_plot.

  • Enhanced the massive mode support of UninfiedClassification, UnifiedRegression, ARIMA, AutoARIMA, AdditiveModelForecast.

  • Enhanced MLFlow autologging for unified classification and regression.

API change:
  • Added 'interpret' in predict() of KNNClassifier & KNNRegressor for enabling procedure PAL_KNN_INTERPRET.

  • Added 'sample_size', 'top_k_attributions', 'random_state' in predict() of KNNClassifier & KNNRegressor for generating local interpretation result.

  • Enabled missing value handling for input data by adding imputation related parameters in fit(), predict() and score() functions of both UnifiedClassification & UnifiedRegression.

  • Added 'model_type' in GARCH initialization for allowing variant GARCH models.

Bug fixes:
  • Fixed key error bug for parameter 'param_values' in DecisionTreeClassifier/Regressor.

  • Fixed the encoding error of imputation strategy of NONE type in Imputer.

  • Fixed the key error bug when enabling AFL states for clustering algorithms.

What's New and Changed in version 2.12.220428

Bug Fixes:
  • Adapted the auto-ml logging according to the PAL function changes.

What's New and Changed in version 2.12.220425

Bug Fixes:
  • Fixed the display issue for the pipeline report.

  • Fixed the missing ptype issue in automl evaluate function.

  • Fixed the transform issue in pipeline fit_predict function.

What's New and Changed in version 2.12.220408

Bug Fixes:
  • Fixed cancellation button in auto_ml.

  • Fixed pivot_table for handling NULL values.

  • Fixed tree debriefing dot visualizer for decision trees.

  • Fixed the display issue for dataset report with NULL values.

What's New and Changed in version 2.12.220325

New functions:
  • Added IsolationForest.

  • Added auto_ml including AutomaticClassification, AutomaticRegression and Preprocessing.

  • Added progress monitor called PipelineProgressStatusMonitor for AutomaticClassification and AutomaticRegression.

  • Added best pipeline report called BestPipelineReport for AutomaticClassification and AutomaticRegression.

  • Added to_datetime(), to_tail() in hanaml.dataframe.

Enhancement:
  • Added validation procedure for n_components in CATPCA.

  • Improved display name in pivot_table.

  • Added compression and thresholding in wavelet transform.

  • Moved generate_feature to dataframe function.

  • Enhanced create_dataframe_from_pandas() with upsert option.

  • Added ignore_scatter_matrix option in dataset report.

  • Expose APL variable selection parameters.

  • Enhanced text mining with German support.

  • Support more loss functions in HybridGradientBoostingClassifier and HybridGradientBoostingRegressor.

  • Enhanced white_noise_test() with an option: the degree of freedom, model_df.

  • Enhanced Attention with local interpretability of model.

  • Ehhanced integer index support for TimeSeriesExplainer.explain_arima_model() for ARIMA and AutoARIMA.

  • Added procomputed affinity for AgglomerateHierarchicalClustering.

  • Added model compression related parameters for HybridGradientBoostingClassifier and HybridGradientBoostingRegressor.

Bug Fixes:
  • Fixed M4 sampling with lowercase column name.

  • Fixed inconsistent IDs assigned to solvers between LOGR and M_LOGR.

  • Fixed a parameter naming error in fft(): flattop_model -> flattop_mode.

  • Fixed a validation error for endog parameter in Attention predict().

API change:
  • Added 'model_df' in the white_noise_test() for selecting the degree of freedom.

  • Added 'explain_mode' in predict() of GRUAttention for selecting the mechanism for generating the reason code for inference results.

What's New and Changed in version 2.11.220209

Bug fixes:

  • Fixed wrong arg check for ‘histogram’ in HGBT split method.

  • Fixed bug in deploy_class with transport_request.

What's New and Changed in version 2.11.220107

Bug fixes:
  • Fixed box plot with lower case column name.

  • Fxied add_id when the rel_col input is list type.

  • Fix shortest_path and shortest_path_one_to_all type cast error.

  • Fixed fast DTW alignment error.

  • Position correction for random search times in LOGR.

  • Fixed HANA hint script generation for resource restriction.

What's New and Changed in version 2.11.211211

New functions:
  • Added FeatureSelection.

  • Added BSTS.

  • Added Word Cloud.

  • Added hdbprocedure generation in pal_base and applied to all functions.

  • Added GARCH.

  • APL classification, regression, clustering: a new method, 'export_apply_code', generates code which can be used to apply a trained model outside APL.

Enhancement:
  • Enhanced Preprocessing with FeatureSelection.

  • Enhanced the model storage with fit parameters in json format.

  • Enhanced PCA categorical support.

  • Enhanced model storage with fit parameters info.

  • Enhanced UnifiedExponentialSmoothing with massive mode.

  • Enhanced AMDP generation as a function in unified_classification.

  • Enhanced ARIMA with a explainer in the predict function.

  • Enhanced additive_model_forecast with a explainer in the predict function.

  • Enhanced HybridGradientBoostingClassifier with continue training of a trained HybridGradientBoostingClassifier model.

  • Enhanced APL AutoTimeSeries with advanced predict outputs: the 'APL/ApplyExtraMode' parameter can be set in 'extra_applyout_settings'.

  • Enhanced the stored procedure information retrieval.

  • Enhanced fillna to support non-numeric columns.

  • Enhanced dataset report to convert PAL unsupported type.

API change:
  • Added 'background_size' in the init() and 'thread_ratio', 'top_k_attributions', 'trend_mod', 'trend_width', 'seasonal_width' in the predict() function of ARIMA() and AutoARIMA().

  • Added 'show_explainer', 'decompose_seasonality', 'decompose_holiday' in the predict() function of additive_model_forecast().

  • Added 'warm_start' in the fit() function of HybridGradientBoostingClassifier() and HybridGradientBoostingRegressor() for continuing training with exisiting model.

Bug fixes:
  • Fixed index creation bug in on-premise text_classification api.

  • Fixed multi-class logistic regression init check bug.

  • Fix has_table with local temporary tables.

What's New and Changed in version 2.10.210918

New functions:
  • Added dtw() for generic dynamic time warping with predefined and custom defined step pattern.

  • Added wavedec() for multi-level discrete wavelet transformation, and waverec() for the corresponding inverse transformation.

  • Added wpdec() and wprec() for multi-level (discrete) wavelet packet transformation and inverse.

  • Added OnlineMultiLogisticRegression() which is the online version of Multi-Class Logistic Regression.

  • Added spectral clustering.

  • Added LSTM with attention.

  • Added OneHotEncoding.

  • Added unified preprocessor.

  • Added Pipeline plot.

  • Added UnifiedExponentialSmoothing().

Enhancement:
  • Enhanced the model storage support for OnlineLinearRegression().

  • Enhanced multi-threading in tm functions.

  • Enhanced HDL container option.

  • Enhanced timestamp support for ARIMA(), AutoARIMA(), VectorARIMA(), OnlineARIMA(), SingleExponentialSmoothing(), DoubleExponentialSmoothing(), TripleExponentialSmoothing(), AutoExponentialSmoothing(), BrownExponentialSmoothing(), Croston(), LR_seasonal_adjust().

  • Enhanced new distributions for MCMC sampling.

  • Support mutilple accuracy_measure methods in Single/Double/Triple ExponentialSmoothing, BrownExponentialSmoothing, Croston and LR_seasonal_adjust.

  • Added plotly support.

API change:
  • Added 'key', 'endog', 'exog', 'categorical_variable' in the fit() function of AdditiveModelForecast().

  • Added 'prediction_confidence_1' and 'prediction_confidence_2' in BrownExponentialSmoothing().

What's New and Changed in version 2.9.210726

Bug fixes:
  • Fixed load_model initialized error in model storage service.

  • Fixed bad link in pypi portal.

What's New and Changed in version 2.9.210709

Bug fixes:
  • Fixed missing WeaklyConnectedComponents in hana_ml.graph.algorithms.

  • Fixed missing statistics in hana_ml.graph.Graph.describe.

  • Fixed a bug, where the Graph object creation and discover_graph_workspace() and Graph.describe() do not work on an on-premise system

What's New and Changed in version 2.9.210630

Bug fixes:
  • Fixed accuracy_measure issue in Single/Double/Triple/Auto Exponential Smoothing().

  • Fixed empty input table error in Croston()

  • Fixed class_map error for multiclass logisticregreesion in UnifiedClassification().

What's New and Changed in version 2.9.210619

Enhancement:
  • Constants for directions used in graph functions can be found in hana_ml.graph.constants.DIRECTION_*

  • Following functions and objects are now available in hana_ml.graph for import

    • Graph object

    • create_graph_from_dataframes and create_graph_from_hana_dataframes factory methods

    • discover_graph_workspaces

    • discover_graph_workspace

  • The geometries do not need to be to be specified when creating a DataFrame instance anymore. The geometries are analyzed automatically.

  • Support list of targets and trans_param in feature_tool.

  • Enhanced unified report for unified_regression to view feature importance.

  • Enhanced join() to support list of DataFrame.

  • Enhanced union() to support list of DataFrame.

  • Streamlined the create_dataframe_from_pandas geo parameters. Now there is only one list of geo_cols, which supports column references as well as (lon, lat) tuples, and one SRID parameter for all columns

  • When you 'create_dataframe_from_pandas' and pass a GeoPandas DataFrame, the geometry column will be detected automatically and processed as a geometry. You don't need to add it manually to geo_cols

  • The Graph constructor is simplified. You can instantiate a graph simply by the workspace name.

  • Enhanced ModelStorage for APL to support HANA Data Lake.

New functions:
  • Introduced hana_ml.graph.algorithms which contains all graph algorithms in the future. The package provides a AlgorithmBase class which can be used to build additional algorithms for a graph.

  • Add hana_ml.graph.algorithms.ShortestPath, which replaces Graph.shortest_path

  • Add hana_ml.graph.algorithms.Neighbors, which replaces Graph.neighbors

  • Add hana_ml.graph.algorithms.NeighborsSubgraph, which replaces Graph.neighbors_with_edges

  • Add hana_ml.graph.algorithms.KShortestPaths

  • Add hana_ml.graph.algorithms.ShortestPathsOneToAll

  • Add hana_ml.graph.discovery.discover_graph_workspace, which reads the metadata of a graph

  • Add hana_ml.graph.create_graph_from_edges_dataframe

  • Add hana_ml.graph.Graph.has_vertices, to check if a list of vertices exist in a graph

  • Add hana_ml.graph.Graph.subgraph, to create a vertices or edges induced subgraph

  • Add hana_ml.graph.Graph.describe, to get some statistics

  • Add hana_ml.graph.Graph.degree_distribution

  • Add hana_ml.DataFrame.srids, which returns the SRS of each geometry column

  • Add hana_ml.DataFrame.geometries, which returns the geometry columns if there are any

  • Add hana_ml.spatial package, that contains

    • create_predefined_srs

    • is_srs_created

    • get_created_srses

  • Add hana_ml.docstore package, that contains

    • create_collection_from_elements

  • Added BCPD() for bayesian change point detection.

  • Added shape in dataframe.

  • Added sort_values, sort_index in dataframe.

  • Added scheduler for model renew in model_storage.

  • Added min, max, mean, median, sum, value_counts in dataframe.

  • Added SHAP support for unified regression.

  • Added data lake support in model_storage.

  • Added data lake support in dataframe functions.

  • Added line plot for time seires forecast.

  • Added split_column().

  • Added concat_columns().

  • Added outlier_detection_kmeans(), which detects outliers in datasets based on the result of k-means clustering.

  • Added intermittent_forecast() for forecasting intermittent demand data(time-series).

  • Added OnlineLinearRegression() which is an online version of the Linear Regression.

API change:
  • Removed geo_cols from dataframe.create_dataframe_from_shapefile

  • Removed geo_cols from ConnectionContext.sql()

  • Removed geo_cols from ConnectionContext.table()

  • Removed Graph.neighbors and Graph.neighbors_with_edges

  • Removed Graph.shortest_path

  • Removed hana_ml.graph.Path. This is not used anymore

  • Removed hana_ml.graph.create_hana_graph_from_existing_workspace. This is replaced by a simplified Graph object constructor.

  • Renamed hana_ml.graph.create_hana_graph_from_vertex_and_edge_frames to create_graph_from_dataframes

  • Changed the type of geo_cols in create_dataframe_from_pandas to list, which supports direct column references or (lon, lat) tuples for generating POINT geometries

Bug fixes:
  • Fixed inflexible default locations of selected columns of input data, e.g. key, features and endog.

  • Fixed model report's feature importance when it has 0 importance.

What's New and Changed in version 2.8.210421

Version 2.8.210421 supports SAP HANA SPS05 and SAP HANA Cloud

Bug fixes:
  • Fixed model report's feature importance when it has 0 importance.

  • Fixed pivot_table with mutlitple index issue.

  • Fixed the verbose missing for RDT regressor.

  • Fixed the shap display for categorical columns.

What's New and Changed in version 2.8.210321

Version 2.8.210321 supports SAP HANA SPS05 and SAP HANA Cloud

Enhancement:
  • Enhanced sql() to enable multiline execution.

  • Enhanced save() to add append option.

  • Enhanced diff() to enable negative input.

  • Enhanced model report functionality of UnifiedClassification with added model and data visualization.

  • Enhanced dataset_report module with a optimized process of report generation and better user experience.

  • Enhanced UnifiedClustering to support 'distance_level' in AgglomerateHierarchicalClustering and DBSCAN functions. Please refer to documentation for details.

  • Enahnced model storage to support unified report.

New functions:
  • Added generate_html_report() and generate_notebook_iframe_report() functions for UnifiedRegression which could display the output, e.g. statistic and model.

  • APL Gradient Boosting: the other_params parameter is now supported.

  • APL all models: a new method, get_model_info, is created, allowing users to retrieve the summary and the performance metrics of a saved model.

  • APL all models: users can now specify the weight of explanatory variables via the weight parameter.

  • Added LSTM.

  • Added Text Mining functions support for both SAP HANA on-premise and cloud version.

    • tf_analysis

    • text_classification

    • get_related_doc

    • get_related_term

    • get_relevant_doc

    • get_relevant_term

    • get_suggested_term

  • Added unified report.

New dependency:
  • Added new dependency 'htmlmin' for generating dataset and model report.

API change:
  • KMeans with two added parameters 'use_fast_library' and 'use_float'.

  • UnifiedRegression with one added parameter 'build_report'.

  • Added a parameter 'distance_level' in UnifiedClustering when 'func' is AgglomerateHierarchicalClustering and DBSCAN. Please refer to documentation for details.

  • Renamed 'batch_size' with 'chunk_size' in create_dataframe_from_pandas.

  • OnlineARIMA has two added parameters 'random_state', 'random_initialization' and its partial_fit() function supports two parameters 'learning_rate' and 'epsilon' for updating the values in the input model.

Bug fixes:
  • Fixed onlineARIMA model storage support.

  • Fixed inflexible default locations of selected columns of input data, e.g. key, features and endog.

  • Fixed accuracy_measure issue in AutoExponentialSmoothing.

What's New and Changed in version 2.6.210126

Version 2.6.210126 supports SAP HANA SPS05 and SAP HANA Cloud

Bug fixes:
  • Fixed uuid issue for Python 3.8.

  • Fixed wrong legend for unified classification model report.

  • Fixed dataset report to handle the dataset with missing value.

What's New and Changed in version 2.6.210113

Version 2.6.210113 supports SAP HANA SPS05 and SAP HANA Cloud

Bug fixes:
  • Fixed load_model issue for KMeans clustering.

  • Removed pypi installation of Shapely for windows user.

  • Fixed duplicate rows bug in save() function.

  • Fixed loading issue in model report.

  • Replaced the option 'batch_size' with 'chunk_size' in create_dataframe_from_pandas.

What's New and Changed in version 2.6.201209

Version 2.6.201209 supports SAP HANA SPS05 and SAP HANA Cloud

Bug fixes:
  • Remove shap from installation.

  • Fixed bugs in dataframe functions when autocommit=False.

  • Fixed font properties bugs in eda functions.

  • APL Documentation: other_train_apl_aliases is now documented.

  • APL Gradient Boosting Classification: the target variable won't be displayed in prediction if it is not given in input.

  • APL Gradient Boosting: the default parameter values are now set in the APL backend level. They won't be set in the Python API level.

  • Fixed handling of geometry columns in the context of Dataframe.collect calls.

  • Fixed shapely not being a required dependency.

  • Fixed the displacement of parameter 'dispersion' in CPD.

What's New and Changed in version 2.6.201106

Version 2.6.201116 supports SAP HANA SPS05 and SAP HANA Cloud

Enhancement:
  • Enhanced collect() performance for large datasets.

  • Enhanced create_dataframe_from_pandas performance for large datasets.

New functions:
  • Added kdeplot() for 1D and 2D kde plotting.

  • Added SHAPLEY visualization.

Bug fixes:
  • Fixed incompatibility issue with matplotlib>=3.3.0.

What's New and Changed in version 2.6.201016(2.6.200928)

Version 2.6.201016 supports SAP HANA SPS05 and SAP HANA Cloud

API change:
  • HybridGradientBoostingClassifier, HybridGradientBoostingRegressor: added a parameter 'adopt_prior' to indicate whether to adopt the prior distribution as the initial point.

  • SVC, SVR, OneClassSVM, SVRanking: added parameters 'compression', 'max_bits', 'max_quantization_iter' for model compression.

  • RDTClassifier: added parameters 'compression', 'max_bits', 'quantize_rate' for model compression.

  • RDTRegressor: added parameters 'compression', 'max_bits', 'quantize_rate', 'fittings_quantization' for model compression.

  • In prediction function ARIMA and AutoARIMA, new value 'truncation_algorithm' of forecast_method is introduced to improve the prediction performance.

  • New parameters 'string_variable', 'variable_weight' are added in KNNClassifier, KNNRegressor and DBSCAN to enable distance calculation based on String distance.

  • New parameters 'extrapolation', 'smooth_width', 'auxiliary_normalitytest' are added in seasonal_decompose function.

New functions:
  • Added dataset manager.

  • Added graph and spatial modules.

  • Added dataset report.

  • Added clustering function: SlightSilhouette.

  • Added native storage support in model storage service and dataset manager.

  • Added vector ARIMA.

  • Added unified regression.

  • Added unified clustering.

Bug fixes:
  • Fixed ROC curve display in model report with disordered points.

  • Fixed load_model for unified_classification in model storage service.

  • Fixed model_selection for unified_classification.

What's New and Changed in version 2.5.200626

Version 2.5.200626 supports SAP HANA SPS05 and SAP HANA Cloud

API change:
  • Removed parameter ConnectionContext in PAL functions.

  • Updated parameter algorithm from mandatory to optional in DecisionTreeClassifier/Regressor(), with default value 'cart'.

  • Added parameter key in fit() function of tsa.ARIMA() and tsa.AutoARIMA().

  • Added parameter decompose_type in tsa.seasonal_decompose().

  • Added parameter save_alignment and a new output statistic table in tsa.fast_dtw().

  • Added parameter table_structure in create_dataframe_from_pandas().

  • Added parameter resampling_method and param_search_strategy in HybridGradientBoostingClassifier/Regressor().

New functions:
  • Added functions in dataframe.py: melt(), read_pickle().

  • Added unified classification function. Especially, generate_html_report() and generate_notebook_iframe_report() are provided to visualize the output, e.g. confusion matrix and ROC curve.

  • Added mcmc function.

  • Added model selection services.

  • Added visualizers (model Debriefing).

Enhancement:
  • Enhanced smart sampling for visualizers.

  • Enhanced import function to SAP HANA.

  • Enhanced bytes, TIMESTAMP and BIGINT support in create_dataframe_from_pandas() in dataframe.py.

  • Enhanced TIMESTAMP and DATE support in describe() in dataframe.py.

  • Predictions made with APL gradient boosting can now be complemented with the reasons that led to these predictions: number of top or bottom explanatory variables, strength values, etc.

  • Supported more data types, SMALLINT, DECIMAL, TINYINT, BIGINT, CLOB and BLOB in DataFrame.dtypes(), generate_table_type() and is_numeric().

  • Enhanced the missing value handling in hana_ml.visualizers.eda bar/box/pie plot in the groupby column by creating a new class for missing values.

  • Predictions made with APL gradient boosting can now be complemented with the reasons that led to these predictions: number of top or bottom explanatory variables, strength values, etc.

  • APL gradient boosting can provide metrics about feature interactions strength.

  • The connection parameter is no longer required for APL model creation.

Bug fixes:
  • Fixed wrong ID issue in fit function by adding key option in tsa.ARIMA() and tsa.AutoARIMA().

  • Fixed CLOB type issue in create_dataframe_from_pandas() by adding table_structure and drop_exit_tab options.

  • Fixed pivot_table() index naming bug.

  • Fixed temporary view from temporary table issue in APL time series function by adding sort_data and get_horizon_wide_metric.

  • Fixed bugs in create_dataframe_from_pandas() if the table is temporary.

  • Fixed bugs for data type of init centers in GMM().

  • Fixed bugs when some data types, e.g. SMALLINT, DECIMAL or TINYINT are not supported in DataFrame.dtypes(), generate_table_type() and is_numeric().

  • Fixed bugs when data types, e.g. DATE and TIMESTAMP, are not supported in DataFrame.describe().

  • Fixed the table overwrite bug in DataFrame.save() if the table name is duplicate.

  • Fixed missing quotation mark in column name bugs in hana_ml.visualizers.eda.

  • Users can set 'Cutting Strategy' in APL Gradient Boosting.

  • APL models are saved correctly.

Deprecated functions:
  • GradientBoostingClassifier.

  • GradientBoostingRegressor.

What's New and Changed in version 1.0.8

Version 1.0.8 supports SAP HANA SP04

New functions: Added the following algorithms in the PAL package (there is now 100% coverage in SAP HANA SPS04 PAL algorithms):
  • preprocessing : Multidimensional Scaling(MDS), Synthetic Minoritye Over-Sampling Technique(SMOTE, only supported in SAP HANA SPS05), Sampling, Variance Test.

  • statistics : condition index, Cumulative Distribution Function(cdf), Distribution fitting, Distribution Quantile, Entropy, Equal Vairance Test, Factor Analysis, Grubbs' Test, Kaplan-Meier Survival Analysis, Kernel Density, One-Sample Median Test, Wilcox Signed Rank Test.

  • time series : Linear Regression with Damped Trend and Seasonal Adjust, Additive Model Forecast, Hierarchical Forecast, Correlation Function, online algorithms and dynamic time warping(fast DTW).

  • miscellaneous : ABC Analysis, T-distributed Stochastic Neighbour Embedding(TSNE), Weighted Score Table.

  • Added functions in dataframe.py: data_manipulation().

  • Added cross-validation options to SAP HANA PAL functions.

  • Added visualizers (EDA profiler).

  • Added model storage services.