Changelog ========= **Version 2.18.230927** ``Bug Fixes`` - Fixed cancellation issue in AutoML progress monitor. - Fixed log cleanup issue in AutoML. - Fixed usage of concat in Graph.describe(). - Fixed temporary table issue by replacing with table variable. - Fixed missing parameter issue in HGBT regression. - Fixed syntax error in louvain. **Version 2.18.230914** ``New Functions`` - Added :func:`~hana_ml.algorithms.pal.tsa.additive_model_forecast.AdditiveModelForecast.make_future_dataframe`. - Added :func:`~hana_ml.algorithms.pal.stats.interval_quality`. - Added :func:`~hana_ml.dataframe.ConnectionContext.drop_view`. - Added :class:`hana_ml.graph.algorithms.CommunitiesLouvain`. ``Enhancements`` - Enhanced the support of Portuguese in tf_analysis, text_classification, get_related_doc, get_related_term, get_relevant_doc, get_relevant_term, get_suggested_term functions. - Enhanced the support of different type of network like 'NLinear', 'DLinear', 'XLinear', 'SCINet' in :class:`~hana_ml.algorithms.pal.tsa.ltsf.LTSF`. - Support the massive mode of :func:`~hana_ml.algorithms.pal.tsa.accuracy_measure.accuracy_measure`. - Support the massive mode of :func:`~hana_ml.algorithms.pal.preprocessing.IsolationForest`. - Enhanced model storage list model with display option. - Enhanced model selection with range support. - Enhanced SHAP plot with dependence plot. - Enhanced AutoML with explanation visualization. - Enhanced unified_regression with predict interval and visualization. - Enhanced unified_classification with feature importance support. - Enhanced outlier_detection with auto mode. - Simplified the AutoML fit with background_size. - Enhanced AutoML and Pipeline modules with score function. - Enhanced progress monitor in AutoML with the evaluating tab. ``API Changes`` - Added a parameter called 'network_type' in :class:`~hana_ml.algorithms.pal.tsa.ltsf.LTSF` for network selection. - Enhanced HANA scheduler by removing manual parameters input. ``Bug Fixes`` - Fixed scatter plot error from ax.scatter c to cmap. - Fixed BAS incompatibility issue. - Fixed time diff error when creating new timeframe. - Fixed date type issue in dataset report. - Fixed time series report issue in changepoints_item. **Version 2.17.230808** ``Bug Fixes`` - Fixed default cmap value in scatter_plot. - Fixed display error for monthly data in tsa functions. - Fixed index display error in seasonal_decompose. - Fixed time series report index sort issue. **Version 2.17.230727** ``Bug Fixes`` - Fixed Decimal issue in Explainer item and other related items in time series report. - Fixed shadow option in EDA pie plot. - Fixed enable_stopwords issue in tf_analysis. The same issue in wordcloud plot. - Fixed year legend sort issue in time series report. **Version 2.17.230714** ``Bug Fixes`` - Fixed wrong error message in HANA scheduler. - Fixed corr issue that the column misses quotes. - Fixed front-end connection reset issue in AutoML to avoid too many query from progress table. - Fixed cron missing issue by adding NULL check. - Fixed the Decimal issue in time series report. - Fixed the x-axis order issue in time series report. **Version 2.17.230628** ``Bug Fixes`` - Fixed CAP generation issues for APL. - Fixed duplicated prefix for predict artifact in CAP generation. - Fixed parameter checking for APL. **Version 2.17.230622** ``New Functions`` - Added :func:`~hana_ml.algorithms.apl.apl_base.APLBase.set_scale_out` to enable APL functions execution in scaling out environment. - Added :class:`~hana_ml.algorithms.pal.tsa.changepoint.OnlineBCPD`. - Added :class:`~hana_ml.artifacts.generators.hana.HANAGeneratorForCAP`. - Added :class:`~hana_ml.algorithms.pal.preprocessing.PowerTransform`. - Added :class:`~hana_ml.hana_scheduler.HANAScheduler`. ``Enhancements`` - Enhanced the support of plotly for eda functions like quarter_plot(), seasonal_plot() ... - Enhanced the support of spectral clustering in :class:`~hana_ml.algorithms.pal.unified_clustering.UnifiedClustering` - Enhanced HANA artifacts generation for pipeline module. - Enhanced auto-ml with reason code option. - Enhanced time series report with confidence interval. - Enhanced unified regression's RDT with prediction interval. - Enhanced model storage with server-side scheduler. - Enhanced unified API for pivoted input data. - Enhanced :func:`~hana_ml.dataframe.diff` to support datetime column. **Version 2.16.230601** ``Bug Fixes`` - Fixed model load issue for pipeline module in model storage. - Fixed parameter missing in HGBT. **Version 2.16.230526** ``Bug Fixes`` - Fixed pipeline missing evaluation function. - Fixed tips and chart width for model report. - Fixed built-in operation missing in pipeline module. - Fixed fixed wordcloud issues to disable stopwords. **Version 2.16.230519** ``Bug Fixes`` - Fixed auto-ml time series config_dict template. - Fixed progress logging in auto-ml. - Fixed the progress monitor issue when early_stop is enabled. - Fixed KNN NaN issue due to the pandas new changes. - Fixed describe function to support SMALLINT. **Version 2.16.230508** ``Bug Fixes`` - Fixed pipeline module for model storage. - Fixed stuck in progress monitor when progress table is not empty. - Fixed quote issue in serializing pipeline object. - Fixed parameter missing in FACCM. - Fixed dependency issue for pydotplus. - Fixed tail function with default rel_col. - Fixed NaN in KNN optimal parameter collect. **Version 2.16.230413** ``Bug Fixes`` - Fixed unsupported issue of precomputed distance matrix in predict() of :class:`~hana_ml.algorithms.pal.unified_clustering.UnifiedClustering` - Fixed wrong error message when PAL functions are executed in HANA SPS07. - Fixed no timestamp support in :func:`~hana_ml.visualizers.eda.plot_time_series_outlier`. - Fixed timestamp error in progress monitor in :func:`~hana_ml.algorithms.pal.auto_ml.AutomaticClassification`, :func:`~hana_ml.algorithms.pal.auto_ml.AutomaticRegression`, :func:`~hana_ml.algorithms.pal.auto_ml.AutomaticTimeSeries`. - Fixed polynomial feature generation category undefined issue. **Version 2.16.230323** ``Bug Fixes`` - Fixed wrong error message when "hint" has been used. - Fixed load_model issue for APL model. **Version 2.16.230316** ``New Functions`` - Added model report items to time series report. - Added time series report to unified report. - Added :class:`~hana_ml.algorithms.pal.tsa.classification.TimeSeriesClassification`. - Added TUDF code generation function. - Added AMDP generator for pipeline. - Added :func:`~hana_ml.dataframe.import_csv_from` function for importing csv file from the cloud storage locations like Azure, Amazon(AWS) Google Cloud, SAP HANA Cloud, Data Lake Files(HDLFS). - Added :func:`~hana_ml.algorithms.apl.apl_base.APLBase.set_scale_out` to enable APL functions execution in scaling out environment. ``Enhancements`` - Enhanced model report with new framework. - Enhanced MLR with prediction/confidence interval. - Enhanced :func:`~hana_ml.algorithms.pal.tsa.accuracy_measure.accuracy_measure` with SPEC measure. - Enhanced :class:`~hana_ml.algorithms.pal.unified_exponentialsmoothing.UnifiedExponentialSmoothing` with reason code control parameter. - Enhanced precalculated distances matrix input for KMEDOIDS in :class:`~hana_ml.algorithms.pal.unified_clustering.UnifiedClustering`. - Enhanced :func:`~hana_ml.algorithms.pal.auto_ml.AutomaticClassification` and :func:`~hana_ml.algorithms.pal.auto_ml.AutomaticRegression` with successive halving. ``API Changes`` - Added a parameter called 'decom_state' in :class:`~hana_ml.algorithms.pal.unified_exponentialsmoothing.UnifiedExponentialSmoothing` for the control of reason code display. - Added a parameter called 'lang' in :class:`~hana_ml.visualizers.word_cloud.WordCloud` for language selection. **Version 2.15.230217** ``Bug Fixes`` - Fixed cmap issues in eda visualizer. - Fixed FFM label bug. - Fixed missing :class:`~hana_ml.visualizers.word_cloud.WordCloud` module issue. **Version 2.15.230111** ``Bug Fixes`` - Fixed the blank chart issue in :class:`~hana_ml.visualizers.dataset_report.DatasetReportBuilder`. - Fixed dataset report crash due to empty column. **Version 2.15.221223** ``Bug Fixes`` - Fixed detected season change-points missing error of :class:`~hana_ml.algorithms.pal.tsa.changepoint.BCPD`. - Fixed Change points Chart in :class:`~hana_ml.visualizers.time_series_report.TimeSeriesReport`. **Version 2.15.221216** ``New Functions`` - Added nullif function in dataframe. - Added long-term time series forecast algorithm :class:`~hana_ml.algorithms.pal.tsa.ltsf.LTSF`. - Added :func:`~hana_ml.visualizers.eda.plot_change_points` to plot change points. - Added :func:`~hana_ml.visualizers.eda.plot_psd` to plot power spectral density. - Added :func:`~hana_ml.algorithms.pal.tsa.periodogram.periodogram` to perform power spectral density estimate of the input signal. - Added change_point detection item to :class:`~hana_ml.visualizers.time_series_report.TimeSeriesReport`. ``Enhancements`` - Enhanced successive halving/hyperband in :class:`~hana_ml.algorithms.pal.recommender.FRM` and :class:`~hana_ml.algorithms.pal.recommender.ALS`. - Enhanced :class:`~hana_ml.algorithms.pal.tsa.outlier_detection.OutlierDetectionTS` with IsolationForest and DBSCAN. - Enhanced Data Type Timestamp support in :class:`~hana_ml.algorithms.pal.tsa.changepoint.BCPD`. - Enhanced :class:`~hana_ml.algorithms.pal.unified_clustering.UnifiedClustering` with massive mode. - Enhanced :class:`~hana_ml.visualizers.unified_report.UnifiedReport` with scoring report. - Enhanced :class:`~hana_ml.model_storage.ModelStorage` with model export/import function. - Enhanced MLFlow integration with pipeline module and model export. - Enhanced :class:`~hana_ml.visualizers.unified_report.UnifiedReport` with model debriefing and pipeline report. - Enhanced force_plot in :class:`~hana_ml.visualizers.shap.ShapleyExplainer` to handle missing features. - Enhanced :class:`~hana_ml.algorithms.pal.trees.HybridGradientBoostingClassifier` and :class:`~hana_ml.algorithms.pal.trees.HybridGradientBoostingRegressor` with early_stop option. - Enhanced pipeline and auto-ml module (:class:`~hana_ml.algorithms.pal.auto_ml.AutomaticClassification`, :class:`~hana_ml.algorithms.pal.auto_ml.AutomaticRegression` and :class:`~hana_ml.algorithms.pal.auto_ml.AutomaticTimeSeries`) with predefined output tables. ``API Changes`` - Change the "JSON" column to NCLOB table type in :class:`~hana_ml.model_storage.ModelStorage`. **Version 2.14.221208** ``Bug Fixes`` - Fixed dependency issues in dataset report. - Fixed documentation link in PyPI portal. - Fixed replace function to support NULL replacement. **Version 2.14.221201** ``Bug Fixes`` - Fixed dataset report display issue: - wrong binning method for distribution plot. - NA handling issue in scatter matrix. - Fixed SQL generation issue in pipeline module. - Fixed model state creation in KNN. - Fixed missing parameters in :class:`~hana_ml.algorithms.pal.auto_ml.AutomaticTimeSeries`. - Fixed wrong type of split_method in :class:`~hana_ml.algorithms.pal.auto_ml.AutomaticTimeSeries`. **Version 2.14.221028** ``Bug Fixes`` - Fixed pipeline monitor when password contains ','. - Fixed message not defined error in auto-ml. - Fixed pipeline error for PCA, DT and FN when HANA execution is disabled. - Fixed json pipeline generation for HGBT and RDT. - Fixed parameter name typos for DT and RDT in unified classification. - Fixed execute_statement parser when parameters contain special characters. **Version 2.14.221014** ``Bug Fixes`` - Fixed legend issues in forecast_line_plot. - Fixed duplicated outputs issue in artifact generator. - Fixed best pipeline report that points exceed the chart. - Fixed progress bar counter issue for auto-ml time series. - Fixed predefined partition in unified API. **Version 2.14.220923** ``Bug Fixes`` - Fixed degree_values issue in unified regression. - Fixed legend order in seasonal_plot. - Fixed cross validation parameters in automatic time series forecast. **Version 2.14.220918** ``New Functions`` - Added replace function in dataframe. - Added the time series outlier detection algorithm called ts_outlier_detection(). - Added AutoML Time Series. - Added make_future_dataframe - Added force_plot for SHAP explainer. - Added time series imputer. - Added time series data report. - Added KS test. - Added create_dataframe_from_spark. - Added set_model_state function. - Added outlier profiling. - Added outlier plot in EDA. ``Enhancements`` - Enhanced model storage - support pipeline in auto-ml. - support model report. - MLFlow integration for auto-ml. - Pipeline module enhancement with PAL_PIPELINE_FIT and PAL_PIPELINE_PREDICT. - Enhanced dataframe function with enable_abap_sql. - Successive halving for HGBT, KNN, SVM and MLR. - Enhanced auto-ml with lightweight config dict option. - Enhanced JSON model support in Multi-class LogisticRegression and LinearRegression. - Enhanced the support of French and Russian in tf_analysis, text_classification, get_related_doc, get_related_term, get_relevant_doc, get_relevant_term, get_suggested_term functions. - Enhanced the support of pre-defined period setting in seasonal_decompose with a new parameter 'periods'. ``API Changes`` - Added 'handling_missing' and 'json_export' in LinearRegression. - Added 'json_export', 'precompute_lms_sketch', 'stable_sketch_alg', 'sparse_sketch_alg' in Multi-class LinearRegression. - Added 'periods' in seasonal_decompose. - Added parameters for APL segmented modeling, segmented forecast and parallel apply: 'max_tasks' and 'segment_column_name' (see APL 2209 and APL 2211 release notes). **Version 2.13.220722** ``Bug Fixes`` - Fixed early_stop in auto-ml. - Fixed display issue in unified report for APL. **Version 2.13.220715** ``Bug Fixes`` - Fixed class_map0, class_map1 issue in UnifiedClassification. - Fixed early_stop parameter missing in AutomaticClassification and AutomaticRegression. - Fixed binary_classification_debriefing: divided by zero issue. **Version 2.13.220701** ``Bug Fixes`` - Fixed table name too long in model storage save function. - Fixed mlflow autologging with additional fit parameters. - Fixed no mlflow model info display issue. - Fixed metric sampling for model report. - Fixed wrong schedule template in model storage. **Version 2.13.220608** ``Bug Fixes`` - fixed identifier length too long issue for function outputs. **Version 2.13.220511** ``New Functions`` - Added upsert/update streams data in dataframe function. - Added stationarity_test function. - Added CrostonTSB function. - Added get_temporary_tables and clean_up_temporary_tables functions. - Added Pipeline class json outputs for auto-ml pipeline_fit. - Added EDA for time series data. - Added plot_pacf, plot_acf - Added plot_moving_average - Added plot_rolling_stddev - Added seasonal_plot - Added timeseries_box_plot - Added plot_seasonal_decompose - Added quarter_plot - Added rolling window in generate_feature function. - Added get_connection_id, restart_session and cancel_session_operation in dataframe function. ``Enhancements`` - Added support of the following collection of new parameters for HGBT in UnifiedClassification and UnifiedRegression: 'replacemissing', 'default_missing_direction', 'feature_grouping', 'tol_rate', 'compression', 'max_bin_num'. - Improved the performance of box_plot. - Enhanced the massive mode support of UninfiedClassification, UnifiedRegression, ARIMA, AutoARIMA, AdditiveModelForecast. - Enhanced MLFlow autologging for unified classification and regression. ``API Changes`` - Added 'interpret' in predict() of KNNClassifier and KNNRegressor for enabling procedure PAL_KNN_INTERPRET. - Added 'sample_size', 'top_k_attributions', 'random_state' in predict() of KNNClassifier and KNNRegressor for generating local interpretation result. - Enabled missing value handling for input data by adding imputation related parameters in fit(), predict() and score() functions of both UnifiedClassification and UnifiedRegression. - Added 'model_type' in GARCH initialization for allowing variant GARCH models. ``Bug Fixes`` - Fixed key error bug for parameter 'param_values' in DecisionTreeClassifier/Regressor. - Fixed the encoding error of imputation strategy of NONE type in Imputer. - Fixed the key error bug when enabling AFL states for clustering algorithms. **Version 2.12.220428** ``Bug Fixes`` - Adapted the auto-ml logging according to the PAL function changes. **Version 2.12.220425** ``Bug Fixes`` - Fixed the display issue for the pipeline report. - Fixed the missing ptype issue in automl evaluate function. - Fixed the transform issue in pipeline fit_predict function. **Version 2.12.220408** ``Bug Fixes`` - Fixed cancellation button in auto_ml. - Fixed pivot_table for handling NULL values. - Fixed tree debriefing dot visualizer for decision trees. - Fixed the display issue for dataset report with NULL values. **Version 2.12.220325** ``New Functions`` - Added IsolationForest. - Added auto_ml including AutomaticClassification, AutomaticRegression and Preprocessing. - Added progress monitor called PipelineProgressStatusMonitor for AutomaticClassification and AutomaticRegression. - Added best pipeline report called BestPipelineReport for AutomaticClassification and AutomaticRegression. - Added to_datetime(), to_tail() in hanaml.dataframe. ``Enhancements`` - Added validation procedure for n_components in CATPCA. - Improved display name in pivot_table. - Added compression and thresholding in wavelet transform. - Moved generate_feature to dataframe function. - Enhanced create_dataframe_from_pandas() with upsert option. - Added ignore_scatter_matrix option in dataset report. - Expose APL variable selection parameters. - Enhanced text mining with German support. - Support more loss functions in HybridGradientBoostingClassifier and HybridGradientBoostingRegressor. - Enhanced white_noise_test() with an option: the degree of freedom, model_df. - Enhanced Attention with local interpretability of model. - Enhanced integer index support for TimeSeriesExplainer.explain_arima_model() for ARIMA and AutoARIMA. - Added procomputed affinity for AgglomerateHierarchicalClustering. - Added model compression related parameters for HybridGradientBoostingClassifier and HybridGradientBoostingRegressor. ``Bug Fixes`` - Fixed M4 sampling with lowercase column name. - Fixed inconsistent IDs assigned to solvers between LOGR and M_LOGR. - Fixed a parameter naming error in fft(): flattop_model -> flattop_mode. - Fixed a validation error for endog parameter in Attention predict(). ``API Changes`` - Added 'model_df' in the white_noise_test() for selecting the degree of freedom. - Added 'explain_mode' in predict() of GRUAttention for selecting the mechanism for generating the reason code for inference results. **Version 2.11.220209** ``Bug Fixes`` - Fixed wrong arg check for 'histogram' in HGBT split method. - Fixed bug in deploy_class with transport_request. **Version 2.11.220107** ``Bug Fixes`` - Fixed box plot with lower case column name. - Fixed add_id when the rel_col input is list type. - Fixed shortest_path and shortest_path_one_to_all type cast error. - Fixed fast DTW alignment error. - Position correction for random search times in LOGR. - Fixed HANA hint script generation for resource restriction. **Version 2.11.211211** ``New Functions`` - Added FeatureSelection. - Added BSTS. - Added Word Cloud. - Added hdbprocedure generation in pal_base and applied to all functions. - Added GARCH. - APL classification, regression, clustering: a new method, 'export_apply_code', generates code which can be used to apply a trained model outside APL. ``Enhancements`` - Enhanced Preprocessing with FeatureSelection. - Enhanced the model storage with fit parameters in json format. - Enhanced PCA categorical support. - Enhanced model storage with fit parameters info. - Enhanced UnifiedExponentialSmoothing with massive mode. - Enhanced AMDP generation as a function in unified_classification. - Enhanced ARIMA with a explainer in the predict function. - Enhanced additive_model_forecast with a explainer in the predict function. - Enhanced HybridGradientBoostingClassifier with continue training of a trained HybridGradientBoostingClassifier model. - Enhanced APL AutoTimeSeries with advanced predict outputs: the 'APL/ApplyExtraMode' parameter can be set in 'extra_applyout_settings'. - Enhanced the stored procedure information retrieval. - Enhanced fillna to support non-numeric columns. - Enhanced dataset report to convert PAL unsupported type. ``API Changes`` - Added 'background_size' in the init() and 'thread_ratio', 'top_k_attributions', 'trend_mod', 'trend_width', 'seasonal_width' in the predict() function of ARIMA() and AutoARIMA(). - Added 'show_explainer', 'decompose_seasonality', 'decompose_holiday' in the predict() function of additive_model_forecast(). - Added 'warm_start' in the fit() function of HybridGradientBoostingClassifier() and HybridGradientBoostingRegressor() for continuing training with existing model. ``Bug Fixes`` - Fixed index creation bug in on-premise text_classification api. - Fixed multi-class logistic regression init check bug. - Fix has_table with local temporary tables. **Version 2.10.210918** ``New Functions`` - Added dtw() for generic dynamic time warping with predefined and custom defined step pattern. - Added wavedec() for multi-level discrete wavelet transformation, and waverec() for the corresponding inverse transformation. - Added wpdec() and wprec() for multi-level (discrete) wavelet packet transformation and inverse. - Added OnlineMultiLogisticRegression() which is the online version of Multi-Class Logistic Regression. - Added spectral clustering. - Added LSTM with attention. - Added OneHotEncoding. - Added unified preprocessor. - Added Pipeline plot. - Added UnifiedExponentialSmoothing(). ``Enhancements`` - Enhanced the model storage support for OnlineLinearRegression(). - Enhanced multi-threading in tm functions. - Enhanced HDL container option. - Enhanced timestamp support for ARIMA(), AutoARIMA(), VectorARIMA(), OnlineARIMA(), SingleExponentialSmoothing(), DoubleExponentialSmoothing(), TripleExponentialSmoothing(), AutoExponentialSmoothing(), BrownExponentialSmoothing(), Croston(), LR_seasonal_adjust(). - Enhanced new distributions for MCMC sampling. - Support multiple accuracy_measure methods in Single/Double/Triple ExponentialSmoothing, BrownExponentialSmoothing, Croston and LR_seasonal_adjust. - Added plotly support. ``API Changes`` - Added 'key', 'endog', 'exog', 'categorical_variable' in the fit() function of AdditiveModelForecast(). - Added 'prediction_confidence_1' and 'prediction_confidence_2' in BrownExponentialSmoothing(). **Version 2.9.210726** ``Bug Fixes`` - Fixed load_model initialized error in model storage service. - Fixed bad link in pypi portal. **Version 2.9.210709** ``Bug Fixes`` - Fixed missing WeaklyConnectedComponents in hana_ml.graph.algorithms. - Fixed missing statistics in hana_ml.graph.Graph.describe. - Fixed a bug, where the `Graph` object creation and `discover_graph_workspace()` and `Graph.describe()` do not work on an on-premise system **Version 2.9.210630** ``Bug Fixes`` - Fixed accuracy_measure issue in Single/Double/Triple/Auto Exponential Smoothing(). - Fixed empty input table error in Croston() - Fixed class_map error for multiclass logisticregreesion in UnifiedClassification(). **Version 2.9.210619** ``Enhancements`` - Constants for directions used in graph functions can be found in `hana_ml.graph.constants.DIRECTION_*` - Following functions and objects are now available in `hana_ml.graph` for import - `Graph` object - `create_graph_from_dataframes` and `create_graph_from_hana_dataframes` factory methods - `discover_graph_workspaces` - `discover_graph_workspace` - The geometries do not need to be to be specified when creating a DataFrame instance anymore. The geometries are analyzed automatically. - Support list of targets and trans_param in feature_tool. - Enhanced unified report for unified_regression to view feature importance. - Enhanced join() to support list of DataFrame. - Enhanced union() to support list of DataFrame. - Streamlined the `create_dataframe_from_pandas` geo parameters. Now there is only one list of geo_cols, which supports column references as well as (lon, lat) tuples, and one SRID parameter for all columns - When you 'create_dataframe_from_pandas' and pass a GeoPandas DataFrame, the geometry column will be detected automatically and processed as a geometry. You don't need to add it manually to `geo_cols` - The `Graph` constructor is simplified. You can instantiate a graph simply by the workspace name. - Enhanced ModelStorage for APL to support HANA Data Lake. ``New Functions`` - Introduced `hana_ml.graph.algorithms` which contains all graph algorithms in the future. The package provides a `AlgorithmBase` class which can be used to build additional algorithms for a graph. - Add `hana_ml.graph.algorithms.ShortestPath`, which replaces `Graph.shortest_path` - Add `hana_ml.graph.algorithms.Neighbors`, which replaces `Graph.neighbors` - Add `hana_ml.graph.algorithms.NeighborsSubgraph`, which replaces `Graph.neighbors_with_edges` - Add `hana_ml.graph.algorithms.KShortestPaths` - Add `hana_ml.graph.algorithms.ShortestPathsOneToAll` - Add `hana_ml.graph.discovery.discover_graph_workspace`, which reads the metadata of a graph - Add `hana_ml.graph.create_graph_from_edges_dataframe` - Add `hana_ml.graph.Graph.has_vertices`, to check if a list of vertices exist in a graph - Add `hana_ml.graph.Graph.subgraph`, to create a vertices or edges induced subgraph - Add `hana_ml.graph.Graph.describe`, to get some statistics - Add `hana_ml.graph.Graph.degree_distribution` - Add `hana_ml.DataFrame.srids`, which returns the SRS of each geometry column - Add `hana_ml.DataFrame.geometries`, which returns the geometry columns if there are any - Add `hana_ml.spatial` package, that contains - `create_predefined_srs` - `is_srs_created` - `get_created_srses` - Add `hana_ml.docstore` package, that contains - `create_collection_from_elements` - Added BCPD() for Bayesian change point detection. - Added shape in dataframe. - Added sort_values, sort_index in dataframe. - Added scheduler for model renew in model_storage. - Added min, max, mean, median, sum, value_counts in dataframe. - Added SHAP support for unified regression. - Added data lake support in model_storage. - Added data lake support in dataframe functions. - Added line plot for time series forecast. - Added split_column(). - Added concat_columns(). - Added outlier_detection_kmeans(), which detects outliers in datasets based on the result of k-means clustering. - Added intermittent_forecast() for forecasting intermittent demand data(time-series). - Added OnlineLinearRegression() which is an online version of the Linear Regression. ``API Changes`` - Removed `geo_cols` from `dataframe.create_dataframe_from_shapefile` - Removed `geo_cols` from `ConnectionContext.sql()` - Removed `geo_cols` from `ConnectionContext.table()` - Removed `Graph.neighbors` and `Graph.neighbors_with_edges` - Removed `Graph.shortest_path` - Removed `hana_ml.graph.Path`. This is not used anymore - Removed `hana_ml.graph.create_hana_graph_from_existing_workspace`. This is replaced by a simplified `Graph` object constructor. - Renamed `hana_ml.graph.create_hana_graph_from_vertex_and_edge_frames` to `create_graph_from_dataframes` - Changed the type of `geo_cols` in `create_dataframe_from_pandas` to list, which supports direct column references or (lon, lat) tuples for generating POINT geometries ``Bug Fixes`` - Fixed inflexible default locations of selected columns of input data, e.g. key, features and endog. - Fixed model report's feature importance when it has 0 importance. **Version 2.8.210421** Version 2.8.210421 supports **SAP HANA SPS05** and **SAP HANA Cloud** ``Bug Fixes`` - Fixed model report's feature importance when it has 0 importance. - Fixed pivot_table with multiple index issue. - Fixed the verbose missing for RDT regressor. - Fixed the shap display for categorical columns. **Version 2.8.210321** Version 2.8.210321 supports **SAP HANA SPS05** and **SAP HANA Cloud** ``Enhancements`` - Enhanced sql() to enable multiline execution. - Enhanced save() to add append option. - Enhanced diff() to enable negative input. - Enhanced model report functionality of UnifiedClassification with added model and data visualization. - Enhanced dataset_report module with a optimized process of report generation and better user experience. - Enhanced UnifiedClustering to support 'distance_level' in AgglomerateHierarchicalClustering and DBSCAN functions. Please refer to documentation for details. - Enhanced model storage to support unified report. ``New Functions`` - Added generate_html_report() and generate_notebook_iframe_report() functions for UnifiedRegression which could display the output, e.g. statistic and model. - APL Gradient Boosting: the **other_params** parameter is now supported. - APL all models: a new method, **get_model_info**, is created, allowing users to retrieve the summary and the performance metrics of a saved model. - APL all models: users can now specify the weight of explanatory variables via the **weight** parameter. - Added LSTM. - Added Text Mining functions support for both SAP HANA on-premise and cloud version. - tf_analysis - text_classification - get_related_doc - get_related_term - get_relevant_doc - get_relevant_term - get_suggested_term - Added unified report. New dependency: - Added new dependency 'htmlmin' for generating dataset and model report. ``API Changes`` - KMeans with two added parameters 'use_fast_library' and 'use_float'. - UnifiedRegression with one added parameter 'build_report'. - Added a parameter 'distance_level' in UnifiedClustering when 'func' is AgglomerateHierarchicalClustering and DBSCAN. Please refer to documentation for details. - Renamed 'batch_size' with 'chunk_size' in create_dataframe_from_pandas. - OnlineARIMA has two added parameters 'random_state', 'random_initialization' and its partial_fit() function supports two parameters 'learning_rate' and 'epsilon' for updating the values in the input model. ``Bug Fixes`` - Fixed onlineARIMA model storage support. - Fixed inflexible default locations of selected columns of input data, e.g. key, features and endog. - Fixed accuracy_measure issue in AutoExponentialSmoothing. **Version 2.6.210126** Version 2.6.210126 supports **SAP HANA SPS05** and **SAP HANA Cloud** ``Bug Fixes`` - Fixed uuid issue for Python 3.8. - Fixed wrong legend for unified classification model report. - Fixed dataset report to handle the dataset with missing value. **Version 2.6.210113** Version 2.6.210113 supports **SAP HANA SPS05** and **SAP HANA Cloud** ``Bug Fixes`` - Fixed load_model issue for KMeans clustering. - Removed pypi installation of Shapely for windows user. - Fixed duplicate rows bug in save() function. - Fixed loading issue in model report. - Replaced the option 'batch_size' with 'chunk_size' in create_dataframe_from_pandas. **Version 2.6.201209** Version 2.6.201209 supports **SAP HANA SPS05** and **SAP HANA Cloud** ``Bug Fixes`` - Remove shap from installation. - Fixed bugs in dataframe functions when autocommit=False. - Fixed font properties bugs in eda functions. - APL Documentation: **other_train_apl_aliases** is now documented. - APL Gradient Boosting Classification: the target variable won't be displayed in prediction if it is not given in input. - APL Gradient Boosting: the default parameter values are now set in the APL backend level. They won't be set in the Python API level. - Fixed handling of geometry columns in the context of Dataframe.collect calls. - Fixed shapely not being a required dependency. - Fixed the displacement of parameter 'dispersion' in CPD. **Version 2.6.201106** Version 2.6.201116 supports **SAP HANA SPS05** and **SAP HANA Cloud** ``Enhancements`` - Enhanced collect() performance for large datasets. - Enhanced create_dataframe_from_pandas performance for large datasets. ``New Functions`` - Added kdeplot() for 1D and 2D kde plotting. - Added SHAPLEY visualization. ``Bug Fixes`` - Fixed incompatibility issue with matplotlib>=3.3.0. **Version 2.6.201016(2.6.200928)** Version 2.6.201016 supports **SAP HANA SPS05** and **SAP HANA Cloud** ``API Changes`` - HybridGradientBoostingClassifier, HybridGradientBoostingRegressor: added a parameter 'adopt_prior' to indicate whether to adopt the prior distribution as the initial point. - SVC, SVR, OneClassSVM, SVRanking: added parameters 'compression', 'max_bits', 'max_quantization_iter' for model compression. - RDTClassifier: added parameters 'compression', 'max_bits', 'quantize_rate' for model compression. - RDTRegressor: added parameters 'compression', 'max_bits', 'quantize_rate', 'fittings_quantization' for model compression. - In prediction function ARIMA and AutoARIMA, new value 'truncation_algorithm' of forecast_method is introduced to improve the prediction performance. - New parameters 'string_variable', 'variable_weight' are added in KNNClassifier, KNNRegressor and DBSCAN to enable distance calculation based on String distance. - New parameters 'extrapolation', 'smooth_width', 'auxiliary_normalitytest' are added in seasonal_decompose function. New functions: - Added dataset manager. - Added graph and spatial modules. - Added dataset report. - Added clustering function: SlightSilhouette. - Added native storage support in model storage service and dataset manager. - Added vector ARIMA. - Added unified regression. - Added unified clustering. ``Bug Fixes`` - Fixed ROC curve display in model report with disordered points. - Fixed load_model for unified_classification in model storage service. - Fixed model_selection for unified_classification. **Version 2.5.200626** Version 2.5.200626 supports **SAP HANA SPS05** and **SAP HANA Cloud** ``API Changes`` - Removed parameter ConnectionContext in PAL functions. - Updated parameter algorithm from mandatory to optional in DecisionTreeClassifier/Regressor(), with default value 'cart'. - Added parameter key in fit() function of tsa.ARIMA() and tsa.AutoARIMA(). - Added parameter decompose_type in tsa.seasonal_decompose(). - Added parameter save_alignment and a new output statistic table in tsa.fast_dtw(). - Added parameter table_structure in create_dataframe_from_pandas(). - Added parameter resampling_method and param_search_strategy in HybridGradientBoostingClassifier/Regressor(). ``New Functions`` - Added functions in dataframe.py: melt(), read_pickle(). - Added unified classification function. Especially, generate_html_report() and generate_notebook_iframe_report() are provided to visualize the output, e.g. confusion matrix and ROC curve. - Added mcmc function. - Added model selection services. - Added visualizers (model Debriefing). ``Enhancements`` - Enhanced smart sampling for visualizers. - Enhanced import function to SAP HANA. - Enhanced bytes, TIMESTAMP and BIGINT support in create_dataframe_from_pandas() in dataframe.py. - Enhanced TIMESTAMP and DATE support in describe() in dataframe.py. - Predictions made with APL gradient boosting can now be complemented with the reasons that led to these predictions: number of top or bottom explanatory variables, strength values, etc. - Supported more data types, SMALLINT, DECIMAL, TINYINT, BIGINT, CLOB and BLOB in DataFrame.dtypes(), generate_table_type() and is_numeric(). - Enhanced the missing value handling in hana_ml.visualizers.eda bar/box/pie plot in the groupby column by creating a new class for missing values. - Predictions made with APL gradient boosting can now be complemented with the reasons that led to these predictions: number of top or bottom explanatory variables, strength values, etc. - APL gradient boosting can provide metrics about feature interactions strength. - The connection parameter is no longer required for APL model creation. ``Bug Fixes`` - Fixed wrong ID issue in fit function by adding key option in tsa.ARIMA() and tsa.AutoARIMA(). - Fixed CLOB type issue in create_dataframe_from_pandas() by adding table_structure and drop_exit_tab options. - Fixed pivot_table() index naming bug. - Fixed temporary view from temporary table issue in APL time series function by adding sort_data and get_horizon_wide_metric. - Fixed bugs in create_dataframe_from_pandas() if the table is temporary. - Fixed bugs for data type of init centers in GMM(). - Fixed bugs when some data types, e.g. SMALLINT, DECIMAL or TINYINT are not supported in DataFrame.dtypes(), generate_table_type() and is_numeric(). - Fixed bugs when data types, e.g. DATE and TIMESTAMP, are not supported in DataFrame.describe(). - Fixed the table overwrite bug in DataFrame.save() if the table name is duplicate. - Fixed missing quotation mark in column name bugs in hana_ml.visualizers.eda. - Users can set 'Cutting Strategy' in APL Gradient Boosting. - APL models are saved correctly. Deprecated Functions: - GradientBoostingClassifier. - GradientBoostingRegressor. **Version 1.0.8** Version 1.0.8 supports **SAP HANA SP04** (100% coverage for SAP HANA SPS04 PAL algorithms) New Functions in the **PAL** package: - preprocessing : Multidimensional Scaling(MDS), Synthetic Minority Over-Sampling Technique(SMOTE, only supported in **SAP HANA SPS05**), Sampling, Variance Test. - statistics : condition index, Cumulative Distribution Function(cdf), Distribution fitting, Distribution Quantile, Entropy, Equal Variance Test, Factor Analysis, Grubbs' Test, Kaplan-Meier Survival Analysis, Kernel Density, One-Sample Median Test, Wilcox Signed Rank Test. - time series : Linear Regression with Damped Trend and Seasonal Adjust, Additive Model Forecast, Hierarchical Forecast, Correlation Function, online algorithms and dynamic time warping(fast DTW). - miscellaneous : ABC Analysis, T-distributed Stochastic Neighbour Embedding(TSNE), Weighted Score Table. - Added functions in dataframe.py: data_manipulation(). - Added cross-validation options to SAP HANA PAL functions. - Added visualizers (EDA profiler). - Added model storage services.