Changelog

What's New and Changed in version 2.10.210918

New functions:

  • Added dtw() for generic dynamic time warping with predefined and custom defined step pattern.

  • Added wavedec() for multi-level discrete wavelet transformation, and waverec() for the corresponding inverse transformation.

  • Added wpdec() and wprec() for multi-level (discrete) wavelet packet transformation and inverse.

  • Added OnlineMultiLogisticRegression() which is the online version of Multi-Class Logistic Regression.

  • Added spectral clustering.

  • Added LSTM with attention.

  • Added OneHotEncoding.

  • Added unified preprocessor.

  • Added Pipeline plot.

  • Added UnifiedExponentialSmoothing().

Enhancement:

  • Enhanced the model storage support for OnlineLinearRegression().

  • Enhanced multi-threading in tm functions.

  • Enhanced HDL container option.

  • Enhanced timestamp support for ARIMA(), AutoARIMA(), VectorARIMA(), OnlineARIMA(), SingleExponentialSmoothing(), DoubleExponentialSmoothing(), TripleExponentialSmoothing(), AutoExponentialSmoothing(), BrownExponentialSmoothing(), Croston(), LR_seasonal_adjust().

  • Enhanced new distributions for MCMC sampling.

  • Support mutilple accuracy_measure methods in Single/Double/Triple ExponentialSmoothing, BrownExponentialSmoothing, Croston and LR_seasonal_adjust.

  • Added plotly support.

API change:

  • Added 'key', 'endog', 'exog', 'categorical_variable' in the fit() function of AdditiveModelForecast().

  • Added 'prediction_confidence_1' and 'prediction_confidence_2' in BrownExponentialSmoothing().

What's New and Changed in version 2.9.210709

Bug fixes:

  • Fixed missing WeaklyConnectedComponents in hana_ml.graph.algorithms.

  • Fixed missing statistics in hana_ml.graph.Graph.describe.

  • Fixed a bug, where the Graph object creation and discover_graph_workspace() and Graph.describe() do not work on an on-premise system

What's New and Changed in version 2.9.210630

Bug fixes:

  • Fixed accuracy_measure issue in Single/Double/Triple/Auto Exponential Smoothing().

  • Fixed empty input table error in Croston()

  • Fixed class_map error for multiclass logisticregreesion in UnifiedClassification().

What's New and Changed in version 2.9.210619

Enhancement:

  • Constants for directions used in graph functions can be found in hana_ml.graph.constants.DIRECTION_*

  • Following functions and objects are now available in hana_ml.graph for import
    • Graph object

    • create_graph_from_dataframes and create_graph_from_hana_dataframes factory methods

    • discover_graph_workspaces

    • discover_graph_workspace

  • The geometries do not need to be to be specified when creating a DataFrame instance anymore. The geometries are analyzed automatically.

  • Support list of targets and trans_param in feature_tool.

  • Enhanced unified report for unified_regression to view feature importance.

  • Enhanced join() to support list of DataFrame.

  • Enhanced union() to support list of DataFrame.

  • Streamlined the create_dataframe_from_pandas geo parameters. Now there is only one list of geo_cols, which supports column references as well as (lon, lat) tuples, and one SRID parameter for all columns

  • When you 'create_dataframe_from_pandas' and pass a GeoPandas DataFrame, the geometry column will be detected automatically and processed as a geometry. You don't need to add it manually to geo_cols

  • The Graph constructor is simplified. You can instantiate a graph simply by the workspace name.

  • Enhanced ModelStorage for APL to support HANA Data Lake.

New functions:

  • Introduced hana_ml.graph.algorithms which contains all graph algorithms in the future. The package provides a AlgorithmBase class which can be used to build additional algorithms for a graph.

  • Add hana_ml.graph.algorithms.ShortestPath, which replaces Graph.shortest_path

  • Add hana_ml.graph.algorithms.Neighbors, which replaces Graph.neighbors

  • Add hana_ml.graph.algorithms.NeighborsSubgraph, which replaces Graph.neighbors_with_edges

  • Add hana_ml.graph.algorithms.KShortestPaths

  • Add hana_ml.graph.algorithms.ShortestPathsOneToAll

  • Add hana_ml.graph.discovery.discover_graph_workspace, which reads the metadata of a graph

  • Add hana_ml.graph.create_graph_from_edges_dataframe

  • Add hana_ml.graph.Graph.has_vertices, to check if a list of vertices exist in a graph

  • Add hana_ml.graph.Graph.subgraph, to create a vertices or edges induced subgraph

  • Add hana_ml.graph.Graph.describe, to get some statistics

  • Add hana_ml.graph.Graph.degree_distribution

  • Add hana_ml.DataFrame.srids, which returns the SRS of each geometry column

  • Add hana_ml.DataFrame.geometries, which returns the geometry columns if there are any

  • Add hana_ml.spatial package, that contains
    • create_predefined_srs

    • is_srs_created

    • get_created_srses

  • Add hana_ml.docstore package, that contains
    • create_collection_from_elements

  • Added BCPD() for bayesian change point detection.

  • Added shape in dataframe.

  • Added sort_values, sort_index in dataframe.

  • Added scheduler for model renew in model_storage.

  • Added min, max, mean, median, sum, value_counts in dataframe.

  • Added SHAP support for unified regression.

  • Added data lake support in model_storage.

  • Added data lake support in dataframe functions.

  • Added line plot for time seires forecast.

  • Added split_column().

  • Added concat_columns().

  • Added outlier_detection_kmeans(), which detects outliers in datasets based on the result of k-means clustering.

  • Added intermittent_forecast() for forecasting intermittent demand data(time-series).

  • Added OnlineLinearRegression() which is an online version of the Linear Regression.

API change:

  • Removed geo_cols from dataframe.create_dataframe_from_shapefile

  • Removed geo_cols from ConnectionContext.sql()

  • Removed geo_cols from ConnectionContext.table()

  • Removed Graph.neighbors and Graph.neighbors_with_edges

  • Removed Graph.shortest_path

  • Removed hana_ml.graph.Path. This is not used anymore

  • Removed hana_ml.graph.create_hana_graph_from_existing_workspace. This is replaced by a simplified Graph object constructor.

  • Renamed hana_ml.graph.create_hana_graph_from_vertex_and_edge_frames to create_graph_from_dataframes

  • Changed the type of geo_cols in create_dataframe_from_pandas to list, which supports direct column references or (lon, lat) tuples for generating POINT geometries

Bug fixes:

  • Fixed inflexible default locations of selected columns of input data, e.g. key, features and endog.

  • Fixed model report's feature importance when it has 0 importance.

What's New and Changed in version 2.8.210421

Version 2.8.210421 supports SAP HANA SPS05 and SAP HANA Cloud

Bug fixes:

  • Fixed model report's feature importance when it has 0 importance.

  • Fixed pivot_table with mutlitple index issue.

  • Fixed the verbose missing for RDT regressor.

  • Fixed the shap display for categorical columns.

What's New and Changed in version 2.8.210321

Version 2.8.210321 supports SAP HANA SPS05 and SAP HANA Cloud

Enhancement:

  • Enhanced sql() to enable multiline execution.

  • Enhanced save() to add append option.

  • Enhanced diff() to enable negative input.

  • Enhanced model report functionality of UnifiedClassification with added model and data visualization.

  • Enhanced dataset_report module with a optimized process of report generation and better user experience.

  • Enhanced UnifiedClustering to support 'distance_level' in AgglomerateHierarchicalClustering and DBSCAN functions. Please refer to documentation for details.

  • Enahnced model storage to support unified report.

New functions:

  • Added generate_html_report() and generate_notebook_iframe_report() functions for UnifiedRegression which could display the output, e.g. statistic and model.

  • APL Gradient Boosting: the other_params parameter is now supported.

  • APL all models: a new method, get_model_info, is created, allowing users to retrieve the summary and the performance metrics of a saved model.

  • APL all models: users can now specify the weight of explanatory variables via the weight parameter.

  • Added LSTM.

  • Added Text Mining functions support for both SAP HANA on-premise and cloud version.
    • tf_analysis

    • text_classification

    • get_related_doc

    • get_related_term

    • get_relevant_doc

    • get_relevant_term

    • get_suggested_term

  • Added unified report.

New dependency:

  • Added new dependency 'htmlmin' for generating dataset and model report.

API change:

  • KMeans with two added parameters 'use_fast_library' and 'use_float'.

  • UnifiedRegression with one added parameter 'build_report'.

  • Added a parameter 'distance_level' in UnifiedClustering when 'func' is AgglomerateHierarchicalClustering and DBSCAN. Please refer to documentation for details.

  • Renamed 'batch_size' with 'chunk_size' in create_dataframe_from_pandas.

  • OnlineARIMA has two added parameters 'random_state', 'random_initialization' and its partial_fit() function supports two parameters 'learning_rate' and 'epsilon' for updating the values in the input model.

Bug fixes:

  • Fixed onlineARIMA model storage support.

  • Fixed inflexible default locations of selected columns of input data, e.g. key, features and endog.

  • Fixed accuracy_measure issue in AutoExponentialSmoothing.

What's New and Changed in version 2.6.210126

Version 2.6.210126 supports SAP HANA SPS05 and SAP HANA Cloud

Bug fixes:

  • Fixed uuid issue for Python 3.8.

  • Fixed wrong legend for unified classification model report.

  • Fixed dataset report to handle the dataset with missing value.

What's New and Changed in version 2.6.210113

Version 2.6.210113 supports SAP HANA SPS05 and SAP HANA Cloud

Bug fixes:

  • Fixed load_model issue for KMeans clustering.

  • Removed pypi installation of Shapely for windows user.

  • Fixed duplicate rows bug in save() function.

  • Fixed loading issue in model report.

  • Replaced the option 'batch_size' with 'chunk_size' in create_dataframe_from_pandas.

What's New and Changed in version 2.6.201209

Version 2.6.201209 supports SAP HANA SPS05 and SAP HANA Cloud

Bug fixes:

  • Remove shap from installation.

  • Fixed bugs in dataframe functions when autocommit=False.

  • Fixed font properties bugs in eda functions.

  • APL Documentation: other_train_apl_aliases is now documented.

  • APL Gradient Boosting Classification: the target variable won't be displayed in prediction if it is not given in input.

  • APL Gradient Boosting: the default parameter values are now set in the APL backend level. They won't be set in the Python API level.

  • Fixed handling of geometry columns in the context of Dataframe.collect calls.

  • Fixed shapely not being a required dependency.

  • Fixed the displacement of parameter 'dispersion' in CPD.

What's New and Changed in version 2.6.201106

Version 2.6.201116 supports SAP HANA SPS05 and SAP HANA Cloud

Enhancement:

  • Enhanced collect() performance for large datasets.

  • Enhanced create_dataframe_from_pandas performance for large datasets.

New functions:

  • Added kdeplot() for 1D and 2D kde plotting.

  • Added SHAPLEY visualization.

Bug fixes:

  • Fixed incompatibility issue with matplotlib>=3.3.0.

What's New and Changed in version 2.6.201016(2.6.200928)

Version 2.6.201016 supports SAP HANA SPS05 and SAP HANA Cloud

API change:

  • HybridGradientBoostingClassifier, HybridGradientBoostingRegressor: added a parameter 'adopt_prior' to indicate whether to adopt the prior distribution as the initial point.

  • SVC, SVR, OneClassSVM, SVRanking: added parameters 'compression', 'max_bits', 'max_quantization_iter' for model compression.

  • RDTClassifier: added parameters 'compression', 'max_bits', 'quantize_rate' for model compression.

  • RDTRegressor: added parameters 'compression', 'max_bits', 'quantize_rate', 'fittings_quantization' for model compression.

  • In prediction function ARIMA and AutoARIMA, new value 'truncation_algorithm' of forecast_method is introduced to improve the prediction performance.

  • New parameters 'string_variable', 'variable_weight' are added in KNNClassifier, KNNRegressor and DBSCAN to enable distance calculation based on String distance.

  • New parameters 'extrapolation', 'smooth_width', 'auxiliary_normalitytest' are added in seasonal_decompose function.

New functions:

  • Added dataset manager.

  • Added graph and spatial modules.

  • Added dataset report.

  • Added clustering function: SlightSilhouette.

  • Added native storage support in model storage service and dataset manager.

  • Added vector ARIMA.

  • Added unified regression.

  • Added unified clustering.

Bug fixes:

  • Fixed ROC curve display in model report with disordered points.

  • Fixed load_model for unified_classification in model storage service.

  • Fixed model_selection for unified_classification.

What's New and Changed in version 2.5.200626

Version 2.5.200626 supports SAP HANA SPS05 and SAP HANA Cloud

API change:

  • Removed parameter ConnectionContext in PAL functions.

  • Updated parameter algorithm from mandatory to optional in DecisionTreeClassifier/Regressor(), with default value 'cart'.

  • Added parameter key in fit() function of tsa.ARIMA() and tsa.AutoARIMA().

  • Added parameter decompose_type in tsa.seasonal_decompose().

  • Added parameter save_alignment and a new output statistic table in tsa.fast_dtw().

  • Added parameter table_structure in create_dataframe_from_pandas().

  • Added parameter resampling_method and param_search_strategy in HybridGradientBoostingClassifier/Regressor().

New functions:

  • Added functions in dataframe.py: melt(), read_pickle().

  • Added unified classification function. Especially, generate_html_report() and generate_notebook_iframe_report() are provided to visualize the output, e.g. confusion matrix and ROC curve.

  • Added mcmc function.

  • Added model selection services.

  • Added visualizers (model Debriefing).

Enhancement:

  • Enhanced smart sampling for visualizers.

  • Enhanced import function to SAP HANA.

  • Enhanced bytes, TIMESTAMP and BIGINT support in create_dataframe_from_pandas() in dataframe.py.

  • Enhanced TIMESTAMP and DATE support in describe() in dataframe.py.

  • Predictions made with APL gradient boosting can now be complemented with the reasons that led to these predictions: number of top or bottom explanatory variables, strength values, etc.

  • Supported more data types, SMALLINT, DECIMAL, TINYINT, BIGINT, CLOB and BLOB in DataFrame.dtypes(), generate_table_type() and is_numeric().

  • Enhanced the missing value handling in hana_ml.visualizers.eda bar/box/pie plot in the groupby column by creating a new class for missing values.

  • Predictions made with APL gradient boosting can now be complemented with the reasons that led to these predictions: number of top or bottom explanatory variables, strength values, etc.

  • APL gradient boosting can provide metrics about feature interactions strength.

  • The connection parameter is no longer required for APL model creation.

Bug fixes:

  • Fixed wrong ID issue in fit function by adding key option in tsa.ARIMA() and tsa.AutoARIMA().

  • Fixed CLOB type issue in create_dataframe_from_pandas() by adding table_structure and drop_exit_tab options.

  • Fixed pivot_table() index naming bug.

  • Fixed temporary view from temporary table issue in APL time series function by adding sort_data and get_horizon_wide_metric.

  • Fixed bugs in create_dataframe_from_pandas() if the table is temporary.

  • Fixed bugs for data type of init centers in GMM().

  • Fixed bugs when some data types, e.g. SMALLINT, DECIMAL or TINYINT are not supported in DataFrame.dtypes(), generate_table_type() and is_numeric().

  • Fixed bugs when data types, e.g. DATE and TIMESTAMP, are not supported in DataFrame.describe().

  • Fixed the table overwrite bug in DataFrame.save() if the table name is duplicate.

  • Fixed missing quotation mark in column name bugs in hana_ml.visualizers.eda.

  • Users can set 'Cutting Strategy' in APL Gradient Boosting.

  • APL models are saved correctly.

Deprecated functions:

  • GradientBoostingClassifier.

  • GradientBoostingRegressor.

What's New and Changed in version 1.0.8

Version 1.0.8 supports SAP HANA SP04

New functions: Added the following algorithms in the PAL package (there is now 100% coverage in SAP HANA SPS04 PAL algorithms):

  • preprocessing : Multidimensional Scaling(MDS), Synthetic Minoritye Over-Sampling Technique(SMOTE, only supported in SAP HANA SPS05), Sampling, Variance Test.

  • statistics : condition index, Cumulative Distribution Function(cdf), Distribution fitting, Distribution Quantile, Entropy, Equal Vairance Test, Factor Analysis, Grubbs' Test, Kaplan-Meier Survival Analysis, Kernel Density, One-Sample Median Test, Wilcox Signed Rank Test.

  • time series : Linear Regression with Damped Trend and Seasonal Adjust, Additive Model Forecast, Hierarchical Forecast, Correlation Function, online algorithms and dynamic time warping(fast DTW).

  • miscellaneous : ABC Analysis, T-distributed Stochastic Neighbour Embedding(TSNE), Weighted Score Table.

  • Added functions in dataframe.py: data_manipulation().

  • Added cross-validation options to SAP HANA PAL functions.

  • Added visualizers (EDA profiler).

  • Added model storage services.