Changelog

Version 2.17.230808

Bug Fixes
  • Fixed default cmap value in scatter_plot.

  • Fixed display error for monthly data in tsa functions.

  • Fixed index display error in seasonal_decompose.

  • Fixed time series report index sort issue.

Version 2.17.230727

Bug Fixes
  • Fixed Decimal issue in Explainer item and other related items in time series report.

  • Fixed shadow option in EDA pie plot.

  • Fixed enable_stopwords issue in tf_analysis. The same issue in wordcloud plot.

  • Fixed year legend sort issue in time series report.

Version 2.17.230714

Bug Fixes
  • Fixed wrong error message in HANA scheduler.

  • Fixed corr issue that the column misses quotes.

  • Fixed front-end connection reset issue in AutoML to avoid too many query from progress table.

  • Fixed cron missing issue by adding NULL check.

  • Fixed the Decimal issue in time series report.

  • Fixed the x-axis order issue in time series report.

Version 2.17.230628

Bug Fixes
  • Fixed CAP generation issues for APL.

  • Fixed duplicated prefix for predict artifact in CAP generation.

  • Fixed parameter checking for APL.

Version 2.17.230622

New Functions
Enhancements
  • Enhanced the support of plotly for eda functions like quarter_plot(), seasonal_plot() ...

  • Enhanced the support of spectral clustering in UnifiedClustering

  • Enhanced HANA artifacts generation for pipeline module.

  • Enhanced auto-ml with reason code option.

  • Enhanced time series report with confidence interval.

  • Enhanced unified regression's RDT with prediction interval.

  • Enhanced model storage with server-side scheduler.

  • Enhanced unified API for pivoted input data.

  • Enhanced diff() to support datetime column.

Version 2.16.230601

Bug Fixes
  • Fixed model load issue for pipeline module in model storage.

  • Fixed parameter missing in HGBT.

Version 2.16.230526

Bug Fixes
  • Fixed pipeline missing evaluation function.

  • Fixed tips and chart width for model report.

  • Fixed built-in operation missing in pipeline module.

  • Fixed fixed wordcloud issues to disable stopwords.

Version 2.16.230519

Bug Fixes
  • Fixed auto-ml time series config_dict template.

  • Fixed progress logging in auto-ml.

  • Fixed the progress monitor issue when early_stop is enabled.

  • Fixed KNN NaN issue due to the pandas new changes.

  • Fixed describe function to support SMALLINT.

Version 2.16.230508

Bug Fixes
  • Fixed pipeline module for model storage.

  • Fixed stuck in progress monitor when progress table is not empty.

  • Fixed quote issue in serializing pipeline object.

  • Fixed parameter missing in FACCM.

  • Fixed dependency issue for pydotplus.

  • Fixed tail function with default rel_col.

  • Fixed NaN in KNN optimal parameter collect.

Version 2.16.230413

Bug Fixes

Version 2.16.230323

Bug Fixes
  • Fixed wrong error message when "hint" has been used.

  • Fixed load_model issue for APL model.

Version 2.16.230316

New Functions
  • Added model report items to time series report.

  • Added time series report to unified report.

  • Added TimeSeriesClassification.

  • Added TUDF code generation function.

  • Added AMDP generator for pipeline.

  • Added import_csv_from() function for importing csv file from the cloud storage locations like Azure, Amazon(AWS) Google Cloud, SAP HANA Cloud, Data Lake Files(HDLFS).

  • Added set_scale_out() to enable APL functions execution in scaling out environment.

Enhancements
API Changes
  • Added a parameter called 'decom_state' in UnifiedExponentialSmoothing for the control of reason code display.

  • Added a parameter called 'lang' in WordCloud for language selection.

Version 2.15.230217

Bug Fixes
  • Fixed cmap issues in eda visualizer.

  • Fixed FFM label bug.

  • Fixed missing WordCloud module issue.

Version 2.15.230111

Bug Fixes
  • Fixed the blank chart issue in DatasetReportBuilder.

  • Fixed dataset report crash due to empty column.

Version 2.15.221223

Bug Fixes
  • Fixed detected season change-points missing error of BCPD.

  • Fixed Change points Chart in TimeSeriesReport.

Version 2.15.221216

New Functions
  • Added nullif function in dataframe.

  • Added long-term time series forecast algorithm LTSF.

  • Added plot_change_points() to plot change points.

  • Added plot_psd() to plot power spectral density.

  • Added periodogram() to perform power spectral density estimate of the input signal.

  • Added change_point detection item to TimeSeriesReport.

Enhancements
API Changes
  • Change the "JSON" column to NCLOB table type in ModelStorage.

Version 2.14.221208

Bug Fixes
  • Fixed dependency issues in dataset report.

  • Fixed documentation link in PyPI portal.

  • Fixed replace function to support NULL replacement.

Version 2.14.221201

Bug Fixes
  • Fixed dataset report display issue:
    • wrong binning method for distribution plot.

    • NA handling issue in scatter matrix.

  • Fixed SQL generation issue in pipeline module.

  • Fixed model state creation in KNN.

  • Fixed missing parameters in AutomaticTimeSeries.

  • Fixed wrong type of split_method in AutomaticTimeSeries.

Version 2.14.221028

Bug Fixes
  • Fixed pipeline monitor when password contains ','.

  • Fixed message not defined error in auto-ml.

  • Fixed pipeline error for PCA, DT and FN when HANA execution is disabled.

  • Fixed json pipeline generation for HGBT and RDT.

  • Fixed parameter name typos for DT and RDT in unified classification.

  • Fixed execute_statement parser when parameters contain special characters.

Version 2.14.221014

Bug Fixes
  • Fixed legend issues in forecast_line_plot.

  • Fixed duplicated outputs issue in artifact generator.

  • Fixed best pipeline report that points exceed the chart.

  • Fixed progress bar counter issue for auto-ml time series.

  • Fixed predefined partition in unified API.

Version 2.14.220923

Bug Fixes
  • Fixed degree_values issue in unified regression.

  • Fixed legend order in seasonal_plot.

  • Fixed cross validation parameters in automatic time series forecast.

Version 2.14.220918

New Functions
  • Added replace function in dataframe.

  • Added the time series outlier detection algorithm called ts_outlier_detection().

  • Added AutoML Time Series.

  • Added make_future_dataframe

  • Added force_plot for SHAP explainer.

  • Added time series imputer.

  • Added time series data report.

  • Added KS test.

  • Added create_dataframe_from_spark.

  • Added set_model_state function.

  • Added outlier profiling.

  • Added outlier plot in EDA.

Enhancements
  • Enhanced model storage
    • support pipeline in auto-ml.

    • support model report.

  • MLFlow integration for auto-ml.

  • Pipeline module enhancement with PAL_PIPELINE_FIT and PAL_PIPELINE_PREDICT.

  • Enhanced dataframe function with enable_abap_sql.

  • Successive halving for HGBT, KNN, SVM and MLR.

  • Enhanced auto-ml with lightweight config dict option.

  • Enhanced JSON model support in Multi-class LogisticRegression and LinearRegression.

  • Enhanced the support of French and Russian in tf_analysis, text_classification, get_related_doc, get_related_term, get_relevant_doc, get_relevant_term, get_suggested_term functions.

  • Enhanced the support of pre-defined period setting in seasonal_decompose with a new parameter 'periods'.

API Changes
  • Added 'handling_missing' and 'json_export' in LinearRegression.

  • Added 'json_export', 'precompute_lms_sketch', 'stable_sketch_alg', 'sparse_sketch_alg' in Multi-class LinearRegression.

  • Added 'periods' in seasonal_decompose.

  • Added parameters for APL segmented modeling, segmented forecast and parallel apply: 'max_tasks' and 'segment_column_name' (see APL 2209 and APL 2211 release notes).

Version 2.13.220722

Bug Fixes
  • Fixed early_stop in auto-ml.

  • Fixed display issue in unified report for APL.

Version 2.13.220715

Bug Fixes
  • Fixed class_map0, class_map1 issue in UnifiedClassification.

  • Fixed early_stop parameter missing in AutomaticClassification and AutomaticRegression.

  • Fixed binary_classification_debriefing: divided by zero issue.

Version 2.13.220701

Bug Fixes
  • Fixed table name too long in model storage save function.

  • Fixed mlflow autologging with additional fit parameters.

  • Fixed no mlflow model info display issue.

  • Fixed metric sampling for model report.

  • Fixed wrong schedule template in model storage.

Version 2.13.220608

Bug Fixes
  • fixed identifier length too long issue for function outputs.

Version 2.13.220511

New Functions
  • Added upsert/update streams data in dataframe function.

  • Added stationarity_test function.

  • Added CrostonTSB function.

  • Added get_temporary_tables and clean_up_temporary_tables functions.

  • Added Pipeline class json outputs for auto-ml pipeline_fit.

  • Added EDA for time series data.
    • Added plot_pacf, plot_acf

    • Added plot_moving_average

    • Added plot_rolling_stddev

    • Added seasonal_plot

    • Added timeseries_box_plot

    • Added plot_seasonal_decompose

    • Added quarter_plot

  • Added rolling window in generate_feature function.

  • Added get_connection_id, restart_session and cancel_session_operation in dataframe function.

Enhancements
  • Added support of the following collection of new parameters for HGBT in UnifiedClassification and UnifiedRegression: 'replacemissing', 'default_missing_direction', 'feature_grouping', 'tol_rate', 'compression', 'max_bin_num'.

  • Improved the performance of box_plot.

  • Enhanced the massive mode support of UninfiedClassification, UnifiedRegression, ARIMA, AutoARIMA, AdditiveModelForecast.

  • Enhanced MLFlow autologging for unified classification and regression.

API Changes
  • Added 'interpret' in predict() of KNNClassifier and KNNRegressor for enabling procedure PAL_KNN_INTERPRET.

  • Added 'sample_size', 'top_k_attributions', 'random_state' in predict() of KNNClassifier and KNNRegressor for generating local interpretation result.

  • Enabled missing value handling for input data by adding imputation related parameters in fit(), predict() and score() functions of both UnifiedClassification and UnifiedRegression.

  • Added 'model_type' in GARCH initialization for allowing variant GARCH models.

Bug Fixes
  • Fixed key error bug for parameter 'param_values' in DecisionTreeClassifier/Regressor.

  • Fixed the encoding error of imputation strategy of NONE type in Imputer.

  • Fixed the key error bug when enabling AFL states for clustering algorithms.

Version 2.12.220428

Bug Fixes
  • Adapted the auto-ml logging according to the PAL function changes.

Version 2.12.220425

Bug Fixes
  • Fixed the display issue for the pipeline report.

  • Fixed the missing ptype issue in automl evaluate function.

  • Fixed the transform issue in pipeline fit_predict function.

Version 2.12.220408

Bug Fixes
  • Fixed cancellation button in auto_ml.

  • Fixed pivot_table for handling NULL values.

  • Fixed tree debriefing dot visualizer for decision trees.

  • Fixed the display issue for dataset report with NULL values.

Version 2.12.220325

New Functions
  • Added IsolationForest.

  • Added auto_ml including AutomaticClassification, AutomaticRegression and Preprocessing.

  • Added progress monitor called PipelineProgressStatusMonitor for AutomaticClassification and AutomaticRegression.

  • Added best pipeline report called BestPipelineReport for AutomaticClassification and AutomaticRegression.

  • Added to_datetime(), to_tail() in hanaml.dataframe.

Enhancements
  • Added validation procedure for n_components in CATPCA.

  • Improved display name in pivot_table.

  • Added compression and thresholding in wavelet transform.

  • Moved generate_feature to dataframe function.

  • Enhanced create_dataframe_from_pandas() with upsert option.

  • Added ignore_scatter_matrix option in dataset report.

  • Expose APL variable selection parameters.

  • Enhanced text mining with German support.

  • Support more loss functions in HybridGradientBoostingClassifier and HybridGradientBoostingRegressor.

  • Enhanced white_noise_test() with an option: the degree of freedom, model_df.

  • Enhanced Attention with local interpretability of model.

  • Enhanced integer index support for TimeSeriesExplainer.explain_arima_model() for ARIMA and AutoARIMA.

  • Added procomputed affinity for AgglomerateHierarchicalClustering.

  • Added model compression related parameters for HybridGradientBoostingClassifier and HybridGradientBoostingRegressor.

Bug Fixes
  • Fixed M4 sampling with lowercase column name.

  • Fixed inconsistent IDs assigned to solvers between LOGR and M_LOGR.

  • Fixed a parameter naming error in fft(): flattop_model -> flattop_mode.

  • Fixed a validation error for endog parameter in Attention predict().

API Changes
  • Added 'model_df' in the white_noise_test() for selecting the degree of freedom.

  • Added 'explain_mode' in predict() of GRUAttention for selecting the mechanism for generating the reason code for inference results.

Version 2.11.220209

Bug Fixes

  • Fixed wrong arg check for 'histogram' in HGBT split method.

  • Fixed bug in deploy_class with transport_request.

Version 2.11.220107

Bug Fixes
  • Fixed box plot with lower case column name.

  • Fixed add_id when the rel_col input is list type.

  • Fixed shortest_path and shortest_path_one_to_all type cast error.

  • Fixed fast DTW alignment error.

  • Position correction for random search times in LOGR.

  • Fixed HANA hint script generation for resource restriction.

Version 2.11.211211

New Functions
  • Added FeatureSelection.

  • Added BSTS.

  • Added Word Cloud.

  • Added hdbprocedure generation in pal_base and applied to all functions.

  • Added GARCH.

  • APL classification, regression, clustering: a new method, 'export_apply_code', generates code which can be used to apply a trained model outside APL.

Enhancements
  • Enhanced Preprocessing with FeatureSelection.

  • Enhanced the model storage with fit parameters in json format.

  • Enhanced PCA categorical support.

  • Enhanced model storage with fit parameters info.

  • Enhanced UnifiedExponentialSmoothing with massive mode.

  • Enhanced AMDP generation as a function in unified_classification.

  • Enhanced ARIMA with a explainer in the predict function.

  • Enhanced additive_model_forecast with a explainer in the predict function.

  • Enhanced HybridGradientBoostingClassifier with continue training of a trained HybridGradientBoostingClassifier model.

  • Enhanced APL AutoTimeSeries with advanced predict outputs: the 'APL/ApplyExtraMode' parameter can be set in 'extra_applyout_settings'.

  • Enhanced the stored procedure information retrieval.

  • Enhanced fillna to support non-numeric columns.

  • Enhanced dataset report to convert PAL unsupported type.

API Changes
  • Added 'background_size' in the init() and 'thread_ratio', 'top_k_attributions', 'trend_mod', 'trend_width', 'seasonal_width' in the predict() function of ARIMA() and AutoARIMA().

  • Added 'show_explainer', 'decompose_seasonality', 'decompose_holiday' in the predict() function of additive_model_forecast().

  • Added 'warm_start' in the fit() function of HybridGradientBoostingClassifier() and HybridGradientBoostingRegressor() for continuing training with existing model.

Bug Fixes
  • Fixed index creation bug in on-premise text_classification api.

  • Fixed multi-class logistic regression init check bug.

  • Fix has_table with local temporary tables.

Version 2.10.210918

New Functions
  • Added dtw() for generic dynamic time warping with predefined and custom defined step pattern.

  • Added wavedec() for multi-level discrete wavelet transformation, and waverec() for the corresponding inverse transformation.

  • Added wpdec() and wprec() for multi-level (discrete) wavelet packet transformation and inverse.

  • Added OnlineMultiLogisticRegression() which is the online version of Multi-Class Logistic Regression.

  • Added spectral clustering.

  • Added LSTM with attention.

  • Added OneHotEncoding.

  • Added unified preprocessor.

  • Added Pipeline plot.

  • Added UnifiedExponentialSmoothing().

Enhancements
  • Enhanced the model storage support for OnlineLinearRegression().

  • Enhanced multi-threading in tm functions.

  • Enhanced HDL container option.

  • Enhanced timestamp support for ARIMA(), AutoARIMA(), VectorARIMA(), OnlineARIMA(), SingleExponentialSmoothing(), DoubleExponentialSmoothing(), TripleExponentialSmoothing(), AutoExponentialSmoothing(), BrownExponentialSmoothing(), Croston(), LR_seasonal_adjust().

  • Enhanced new distributions for MCMC sampling.

  • Support multiple accuracy_measure methods in Single/Double/Triple ExponentialSmoothing, BrownExponentialSmoothing, Croston and LR_seasonal_adjust.

  • Added plotly support.

API Changes
  • Added 'key', 'endog', 'exog', 'categorical_variable' in the fit() function of AdditiveModelForecast().

  • Added 'prediction_confidence_1' and 'prediction_confidence_2' in BrownExponentialSmoothing().

Version 2.9.210726

Bug Fixes
  • Fixed load_model initialized error in model storage service.

  • Fixed bad link in pypi portal.

Version 2.9.210709

Bug Fixes
  • Fixed missing WeaklyConnectedComponents in hana_ml.graph.algorithms.

  • Fixed missing statistics in hana_ml.graph.Graph.describe.

  • Fixed a bug, where the Graph object creation and discover_graph_workspace() and Graph.describe() do not work on an on-premise system

Version 2.9.210630

Bug Fixes
  • Fixed accuracy_measure issue in Single/Double/Triple/Auto Exponential Smoothing().

  • Fixed empty input table error in Croston()

  • Fixed class_map error for multiclass logisticregreesion in UnifiedClassification().

Version 2.9.210619

Enhancements
  • Constants for directions used in graph functions can be found in hana_ml.graph.constants.DIRECTION_*

  • Following functions and objects are now available in hana_ml.graph for import

    • Graph object

    • create_graph_from_dataframes and create_graph_from_hana_dataframes factory methods

    • discover_graph_workspaces

    • discover_graph_workspace

  • The geometries do not need to be to be specified when creating a DataFrame instance anymore. The geometries are analyzed automatically.

  • Support list of targets and trans_param in feature_tool.

  • Enhanced unified report for unified_regression to view feature importance.

  • Enhanced join() to support list of DataFrame.

  • Enhanced union() to support list of DataFrame.

  • Streamlined the create_dataframe_from_pandas geo parameters. Now there is only one list of geo_cols, which supports column references as well as (lon, lat) tuples, and one SRID parameter for all columns

  • When you 'create_dataframe_from_pandas' and pass a GeoPandas DataFrame, the geometry column will be detected automatically and processed as a geometry. You don't need to add it manually to geo_cols

  • The Graph constructor is simplified. You can instantiate a graph simply by the workspace name.

  • Enhanced ModelStorage for APL to support HANA Data Lake.

New Functions
  • Introduced hana_ml.graph.algorithms which contains all graph algorithms in the future. The package provides a AlgorithmBase class which can be used to build additional algorithms for a graph.

  • Add hana_ml.graph.algorithms.ShortestPath, which replaces Graph.shortest_path

  • Add hana_ml.graph.algorithms.Neighbors, which replaces Graph.neighbors

  • Add hana_ml.graph.algorithms.NeighborsSubgraph, which replaces Graph.neighbors_with_edges

  • Add hana_ml.graph.algorithms.KShortestPaths

  • Add hana_ml.graph.algorithms.ShortestPathsOneToAll

  • Add hana_ml.graph.discovery.discover_graph_workspace, which reads the metadata of a graph

  • Add hana_ml.graph.create_graph_from_edges_dataframe

  • Add hana_ml.graph.Graph.has_vertices, to check if a list of vertices exist in a graph

  • Add hana_ml.graph.Graph.subgraph, to create a vertices or edges induced subgraph

  • Add hana_ml.graph.Graph.describe, to get some statistics

  • Add hana_ml.graph.Graph.degree_distribution

  • Add hana_ml.DataFrame.srids, which returns the SRS of each geometry column

  • Add hana_ml.DataFrame.geometries, which returns the geometry columns if there are any

  • Add hana_ml.spatial package, that contains

    • create_predefined_srs

    • is_srs_created

    • get_created_srses

  • Add hana_ml.docstore package, that contains

    • create_collection_from_elements

  • Added BCPD() for Bayesian change point detection.

  • Added shape in dataframe.

  • Added sort_values, sort_index in dataframe.

  • Added scheduler for model renew in model_storage.

  • Added min, max, mean, median, sum, value_counts in dataframe.

  • Added SHAP support for unified regression.

  • Added data lake support in model_storage.

  • Added data lake support in dataframe functions.

  • Added line plot for time series forecast.

  • Added split_column().

  • Added concat_columns().

  • Added outlier_detection_kmeans(), which detects outliers in datasets based on the result of k-means clustering.

  • Added intermittent_forecast() for forecasting intermittent demand data(time-series).

  • Added OnlineLinearRegression() which is an online version of the Linear Regression.

API Changes
  • Removed geo_cols from dataframe.create_dataframe_from_shapefile

  • Removed geo_cols from ConnectionContext.sql()

  • Removed geo_cols from ConnectionContext.table()

  • Removed Graph.neighbors and Graph.neighbors_with_edges

  • Removed Graph.shortest_path

  • Removed hana_ml.graph.Path. This is not used anymore

  • Removed hana_ml.graph.create_hana_graph_from_existing_workspace. This is replaced by a simplified Graph object constructor.

  • Renamed hana_ml.graph.create_hana_graph_from_vertex_and_edge_frames to create_graph_from_dataframes

  • Changed the type of geo_cols in create_dataframe_from_pandas to list, which supports direct column references or (lon, lat) tuples for generating POINT geometries

Bug Fixes
  • Fixed inflexible default locations of selected columns of input data, e.g. key, features and endog.

  • Fixed model report's feature importance when it has 0 importance.

Version 2.8.210421

Version 2.8.210421 supports SAP HANA SPS05 and SAP HANA Cloud

Bug Fixes
  • Fixed model report's feature importance when it has 0 importance.

  • Fixed pivot_table with multiple index issue.

  • Fixed the verbose missing for RDT regressor.

  • Fixed the shap display for categorical columns.

Version 2.8.210321

Version 2.8.210321 supports SAP HANA SPS05 and SAP HANA Cloud

Enhancements
  • Enhanced sql() to enable multiline execution.

  • Enhanced save() to add append option.

  • Enhanced diff() to enable negative input.

  • Enhanced model report functionality of UnifiedClassification with added model and data visualization.

  • Enhanced dataset_report module with a optimized process of report generation and better user experience.

  • Enhanced UnifiedClustering to support 'distance_level' in AgglomerateHierarchicalClustering and DBSCAN functions. Please refer to documentation for details.

  • Enhanced model storage to support unified report.

New Functions
  • Added generate_html_report() and generate_notebook_iframe_report() functions for UnifiedRegression which could display the output, e.g. statistic and model.

  • APL Gradient Boosting: the other_params parameter is now supported.

  • APL all models: a new method, get_model_info, is created, allowing users to retrieve the summary and the performance metrics of a saved model.

  • APL all models: users can now specify the weight of explanatory variables via the weight parameter.

  • Added LSTM.

  • Added Text Mining functions support for both SAP HANA on-premise and cloud version.

    • tf_analysis

    • text_classification

    • get_related_doc

    • get_related_term

    • get_relevant_doc

    • get_relevant_term

    • get_suggested_term

  • Added unified report.

New dependency:
  • Added new dependency 'htmlmin' for generating dataset and model report.

API Changes
  • KMeans with two added parameters 'use_fast_library' and 'use_float'.

  • UnifiedRegression with one added parameter 'build_report'.

  • Added a parameter 'distance_level' in UnifiedClustering when 'func' is AgglomerateHierarchicalClustering and DBSCAN. Please refer to documentation for details.

  • Renamed 'batch_size' with 'chunk_size' in create_dataframe_from_pandas.

  • OnlineARIMA has two added parameters 'random_state', 'random_initialization' and its partial_fit() function supports two parameters 'learning_rate' and 'epsilon' for updating the values in the input model.

Bug Fixes
  • Fixed onlineARIMA model storage support.

  • Fixed inflexible default locations of selected columns of input data, e.g. key, features and endog.

  • Fixed accuracy_measure issue in AutoExponentialSmoothing.

Version 2.6.210126

Version 2.6.210126 supports SAP HANA SPS05 and SAP HANA Cloud

Bug Fixes
  • Fixed uuid issue for Python 3.8.

  • Fixed wrong legend for unified classification model report.

  • Fixed dataset report to handle the dataset with missing value.

Version 2.6.210113

Version 2.6.210113 supports SAP HANA SPS05 and SAP HANA Cloud

Bug Fixes
  • Fixed load_model issue for KMeans clustering.

  • Removed pypi installation of Shapely for windows user.

  • Fixed duplicate rows bug in save() function.

  • Fixed loading issue in model report.

  • Replaced the option 'batch_size' with 'chunk_size' in create_dataframe_from_pandas.

Version 2.6.201209

Version 2.6.201209 supports SAP HANA SPS05 and SAP HANA Cloud

Bug Fixes
  • Remove shap from installation.

  • Fixed bugs in dataframe functions when autocommit=False.

  • Fixed font properties bugs in eda functions.

  • APL Documentation: other_train_apl_aliases is now documented.

  • APL Gradient Boosting Classification: the target variable won't be displayed in prediction if it is not given in input.

  • APL Gradient Boosting: the default parameter values are now set in the APL backend level. They won't be set in the Python API level.

  • Fixed handling of geometry columns in the context of Dataframe.collect calls.

  • Fixed shapely not being a required dependency.

  • Fixed the displacement of parameter 'dispersion' in CPD.

Version 2.6.201106

Version 2.6.201116 supports SAP HANA SPS05 and SAP HANA Cloud

Enhancements
  • Enhanced collect() performance for large datasets.

  • Enhanced create_dataframe_from_pandas performance for large datasets.

New Functions
  • Added kdeplot() for 1D and 2D kde plotting.

  • Added SHAPLEY visualization.

Bug Fixes
  • Fixed incompatibility issue with matplotlib>=3.3.0.

Version 2.6.201016(2.6.200928)

Version 2.6.201016 supports SAP HANA SPS05 and SAP HANA Cloud

API Changes
  • HybridGradientBoostingClassifier, HybridGradientBoostingRegressor: added a parameter 'adopt_prior' to indicate whether to adopt the prior distribution as the initial point.

  • SVC, SVR, OneClassSVM, SVRanking: added parameters 'compression', 'max_bits', 'max_quantization_iter' for model compression.

  • RDTClassifier: added parameters 'compression', 'max_bits', 'quantize_rate' for model compression.

  • RDTRegressor: added parameters 'compression', 'max_bits', 'quantize_rate', 'fittings_quantization' for model compression.

  • In prediction function ARIMA and AutoARIMA, new value 'truncation_algorithm' of forecast_method is introduced to improve the prediction performance.

  • New parameters 'string_variable', 'variable_weight' are added in KNNClassifier, KNNRegressor and DBSCAN to enable distance calculation based on String distance.

  • New parameters 'extrapolation', 'smooth_width', 'auxiliary_normalitytest' are added in seasonal_decompose function.

New functions:
  • Added dataset manager.

  • Added graph and spatial modules.

  • Added dataset report.

  • Added clustering function: SlightSilhouette.

  • Added native storage support in model storage service and dataset manager.

  • Added vector ARIMA.

  • Added unified regression.

  • Added unified clustering.

Bug Fixes
  • Fixed ROC curve display in model report with disordered points.

  • Fixed load_model for unified_classification in model storage service.

  • Fixed model_selection for unified_classification.

Version 2.5.200626

Version 2.5.200626 supports SAP HANA SPS05 and SAP HANA Cloud

API Changes
  • Removed parameter ConnectionContext in PAL functions.

  • Updated parameter algorithm from mandatory to optional in DecisionTreeClassifier/Regressor(), with default value 'cart'.

  • Added parameter key in fit() function of tsa.ARIMA() and tsa.AutoARIMA().

  • Added parameter decompose_type in tsa.seasonal_decompose().

  • Added parameter save_alignment and a new output statistic table in tsa.fast_dtw().

  • Added parameter table_structure in create_dataframe_from_pandas().

  • Added parameter resampling_method and param_search_strategy in HybridGradientBoostingClassifier/Regressor().

New Functions
  • Added functions in dataframe.py: melt(), read_pickle().

  • Added unified classification function. Especially, generate_html_report() and generate_notebook_iframe_report() are provided to visualize the output, e.g. confusion matrix and ROC curve.

  • Added mcmc function.

  • Added model selection services.

  • Added visualizers (model Debriefing).

Enhancements
  • Enhanced smart sampling for visualizers.

  • Enhanced import function to SAP HANA.

  • Enhanced bytes, TIMESTAMP and BIGINT support in create_dataframe_from_pandas() in dataframe.py.

  • Enhanced TIMESTAMP and DATE support in describe() in dataframe.py.

  • Predictions made with APL gradient boosting can now be complemented with the reasons that led to these predictions: number of top or bottom explanatory variables, strength values, etc.

  • Supported more data types, SMALLINT, DECIMAL, TINYINT, BIGINT, CLOB and BLOB in DataFrame.dtypes(), generate_table_type() and is_numeric().

  • Enhanced the missing value handling in hana_ml.visualizers.eda bar/box/pie plot in the groupby column by creating a new class for missing values.

  • Predictions made with APL gradient boosting can now be complemented with the reasons that led to these predictions: number of top or bottom explanatory variables, strength values, etc.

  • APL gradient boosting can provide metrics about feature interactions strength.

  • The connection parameter is no longer required for APL model creation.

Bug Fixes
  • Fixed wrong ID issue in fit function by adding key option in tsa.ARIMA() and tsa.AutoARIMA().

  • Fixed CLOB type issue in create_dataframe_from_pandas() by adding table_structure and drop_exit_tab options.

  • Fixed pivot_table() index naming bug.

  • Fixed temporary view from temporary table issue in APL time series function by adding sort_data and get_horizon_wide_metric.

  • Fixed bugs in create_dataframe_from_pandas() if the table is temporary.

  • Fixed bugs for data type of init centers in GMM().

  • Fixed bugs when some data types, e.g. SMALLINT, DECIMAL or TINYINT are not supported in DataFrame.dtypes(), generate_table_type() and is_numeric().

  • Fixed bugs when data types, e.g. DATE and TIMESTAMP, are not supported in DataFrame.describe().

  • Fixed the table overwrite bug in DataFrame.save() if the table name is duplicate.

  • Fixed missing quotation mark in column name bugs in hana_ml.visualizers.eda.

  • Users can set 'Cutting Strategy' in APL Gradient Boosting.

  • APL models are saved correctly.

Deprecated Functions:
  • GradientBoostingClassifier.

  • GradientBoostingRegressor.

Version 1.0.8

Version 1.0.8 supports SAP HANA SP04 (100% coverage for SAP HANA SPS04 PAL algorithms)

New Functions in the PAL package:
  • preprocessing : Multidimensional Scaling(MDS), Synthetic Minority Over-Sampling Technique(SMOTE, only supported in SAP HANA SPS05), Sampling, Variance Test.

  • statistics : condition index, Cumulative Distribution Function(cdf), Distribution fitting, Distribution Quantile, Entropy, Equal Variance Test, Factor Analysis, Grubbs' Test, Kaplan-Meier Survival Analysis, Kernel Density, One-Sample Median Test, Wilcox Signed Rank Test.

  • time series : Linear Regression with Damped Trend and Seasonal Adjust, Additive Model Forecast, Hierarchical Forecast, Correlation Function, online algorithms and dynamic time warping(fast DTW).

  • miscellaneous : ABC Analysis, T-distributed Stochastic Neighbour Embedding(TSNE), Weighted Score Table.

  • Added functions in dataframe.py: data_manipulation().

  • Added cross-validation options to SAP HANA PAL functions.

  • Added visualizers (EDA profiler).

  • Added model storage services.