Changelog
Version 2.16.230601
Bug Fixes
Fixed model load issue for pipeline module in model storage.
Fixed parameter missing in HGBT.
Version 2.16.230526
Bug Fixes
Fixed pipeline missing evaluation function.
Fixed tips and chart width for model report.
Fixed built-in operation missing in pipeline module.
Fixed fixed wordcloud issues to disable stopwords.
Version 2.16.230519
Bug Fixes
Fixed auto-ml time series config_dict template.
Fixed progress logging in auto-ml.
Fixed the progress monitor issue when early_stop is enabled.
Fixed KNN NaN issue due to the pandas new changes.
Fixed describe function to support SMALLINT.
Version 2.16.230508
Bug Fixes
Fixed pipeline module for model storage.
Fixed stuck in progress monitor when progress table is not empty.
Fixed quote issue in serializing pipeline object.
Fixed parameter missing in FACCM.
Fixed dependency issue for pydotplus.
Fixed tail function with default rel_col.
Fixed NaN in KNN optimal parameter collect.
Version 2.16.230413
Bug Fixes
Fixed unsupported issue of precomputed distance matrix in predict() of
UnifiedClustering
Fixed wrong error message when PAL functions are executed in HANA SPS07.
Fixed no timestamp support in
plot_time_series_outlier
.Fixed timestamp error in progress monitor in
AutomaticClassification()
,AutomaticRegression()
,AutomaticTimeSeries
.Fixed polynomial feature generation category undefined issue.
Version 2.16.230323
Bug Fixes
Fixed wrong error message when "hint" has been used.
Fixed load_model issue for APL model.
Version 2.16.230316
New Functions
Added model report items to time series report.
Added time series report to unified report.
Added
TimeSeriesClassification
for time series classification.Added TUDF code generation function.
Added AMDP generator for pipeline.
Added
import_csv_from()
function for importing csv file from the cloud storage locations like Azure, Amazon(AWS) Google Cloud, SAP HANA Cloud, Data Lake Files(HDLFS).Added
set_scale_out()
to enable APL functions execution in scaling out environment.
Enhancements
Enhanced model report with new framework.
Enhanced MLR with prediction/confidence interval.
Enhanced
accuracy_measure()
with SPEC measure.Enhanced
UnifiedExponentialSmoothing
with reason code control parameter.Enhanced precalculated distances matrix input for KMEDOIDS in
UnifiedClustering
.Enhanced
AutomaticClassification()
andAutomaticRegression()
with successive halving.
API Changes
Added a parameter called 'decom_state' in
UnifiedExponentialSmoothing
for the control of reason code display.Added a parameter called 'lang' in
WordCloud
for language selection.
Version 2.15.230217
Bug Fixes
Fixed cmap issues in eda visualizer.
Fixed FFM label bug.
Fixed missing
WordCloud
module issue.
Version 2.15.230111
Bug Fixes
Fixed the blank chart issue in
DatasetReportBuilder
.Fixed dataset report crash due to empty column.
Version 2.15.221223
Bug Fixes
Fixed detected season change-points missing error of
BCPD
.Fixed Changepoints Chart in
TimeSeriesReport
.
Version 2.15.221216
New Functions
Added nullif function in dataframe.
Added long-term time series forecast algorithm
LTSF
.Added
plot_change_points()
to plot change points.Added
plot_psd()
to plot power spectral density.Added
periodogram()
to perform power spectral density estimate of the input signal.Added change_point detection item to
TimeSeriesReport
.
Enhancements
Enhanced
OutlierDetectionTS
with IsolationForest and DBSCAN.Enhanced Data Type Timestamp support in
BCPD
.Enhanced
UnifiedClustering
with massive mode.Enhanced
UnifiedReport
with scoring report.Enhanced
ModelStorage
with model export/import function.Enhanced MLFlow integration with pipeline module and model export.
Enhanced
UnifiedReport
with model debriefing and pipeline report.Enhanced force_plot in
ShapleyExplainer
to handle missing features.Enhanced
HybridGradientBoostingClassifier
andHybridGradientBoostingRegressor
with early_stop option.Enhanced pipeline and auto-ml module (
AutomaticClassification
,AutomaticRegression
andAutomaticTimeSeries
) with predefined output tabels.Enhanced PAL functions to generate SQL without HANA execution.
API Changes
Change the "JSON" column to NCLOB table type in
ModelStorage
.
Version 2.14.221208
Bug Fixes
Fixed dependency issues in dataset report.
Fixed documentation link in PyPI portal.
Fixed replace function to support NULL replacement.
Version 2.14.221201
Bug Fixes
- Fixed dataset report display issue:
wrong binning method for distribution plot.
NA handling issue in scatter matrix.
Fixed SQL generation issue in pipeline module.
Fixed model state creation in KNN.
Fixed missing parameters in
AutomaticTimeSeries
.Fixed wrong type of split_method in
AutomaticTimeSeries
.
Version 2.14.221028
Bug Fixes
Fixed pipeline monitor when password contains ','.
Fixed message not defined error in auto-ml.
Fixed pipeline error for PCA, DT and FN when HANA execution is disabled.
Fixed json pipeline generation for HGBT and RDT.
Fixed parameter name typos for DT and RDT in unified classification.
Fixed execute_statement parser when parameters contain special characters.
Version 2.14.221014
Bug Fixes
Fixed legend issues in forecast_line_plot.
Fixed duplicated outputs issue in artifact generator.
Fixed best pipeline report that points exceed the chart.
Fixed progress bar counter issue for auto-ml time series.
Fixed predefined partition in unified API.
Version 2.14.220923
Bug Fixes
Fixed degree_values issue in unified regression.
Fixed legend order in seasonal_plot.
Fixed cross validation parameters in automatic time series forecast.
Version 2.14.220918
New Functions
Added replace function in dataframe.
Added the time series outlier detection algorithm called ts_outlier_detection().
Added AutoML Time Series.
Added make_future_dataframe
Added force_plot for SHAP explainer.
Added time series imputer.
Added time series data report.
Added KS test.
Added create_dataframe_from_spark.
Added set_model_state function.
Added outlier profiling.
Added outlier plot in EDA.
Enhancements
- Enhanced model storage
support pipeline in auto-ml.
support model report.
MLFlow integration for auto-ml.
Pipeline module enhancement with PAL_PIPELINE_FIT and PAL_PIPELINE_PREDICT.
Enhanced dataframe function with enable_abap_sql.
Successive halving for HGBT, KNN, SVM and MLR.
Enhanced auto-ml with lightweight config dict option.
Enhanced JSON model support in Multi-class LogisticRegression and LinearRegression.
Enhanced the support of French and Russian in tf_analysis, text_classification, get_related_doc, get_related_term, get_relevant_doc, get_relevant_term, get_suggested_term functions.
Enhanced the support of pre-defined period setting in seasonal_decompose with a new parameter 'periods'.
API Changes
Added 'handling_missing' and 'json_export' in LinearRegression.
Added 'json_export', 'precompute_lms_sketch', 'stable_sketch_alg', 'sparse_sketch_alg' in Multi-class LinearRegression.
Added 'periods' in seasonal_decompose.
Added parameters for APL segmented modeling, segmented forecast and parallel apply: 'max_tasks' and 'segment_column_name' (see APL 2209 and APL 2211 release notes).
Version 2.13.220722
Bug Fixes
Fixed early_stop in auto-ml.
Fixed display issue in unified report for APL.
Version 2.13.220715
Bug Fixes
Fixed class_map0, class_map1 issue in UnifiedClassification.
Fixed early_stop parameter missing in AutomaticClassification and AutomaticRegression.
Fixed binary_classification_debriefing: divided by zero issue.
Version 2.13.220701
Bug Fixes
Fixed table name too long in model storage save function.
Fixed mlflow autologging with additional fit parameters.
Fixed no mlflow model info display issue.
Fixed metric sampling for model report.
Fixed wrong schedule template in model storage.
Version 2.13.220608
Bug Fixes
fixed identifier length too long issue for function outputs.
Version 2.13.220511
New Functions
Added upsert/update streams data in dataframe function.
Added stationarity_test function.
Added CrostonTSB function.
Added get_temporary_tables and clean_up_temporary_tables functions.
Added Pipeline class json outputs for auto-ml pipeline_fit.
- Added EDA for time series data.
Added plot_pacf, plot_acf
Added plot_moving_average
Added plot_rolling_stddev
Added seasonal_plot
Added timeseries_box_plot
Added plot_seasonal_decompose
Added quarter_plot
Added rolling window in generate_feature function.
Added get_connection_id, restart_session and cancel_session_operation in dataframe function.
Enhancements
Added support of the following collection of new parameters for HGBT in UnifiedClassification and UnifiedRegression: 'replacemissing', 'default_missing_direction', 'feature_grouping', 'tol_rate', 'compression', 'max_bin_num'.
Improved the performance of box_plot.
Enhanced the massive mode support of UninfiedClassification, UnifiedRegression, ARIMA, AutoARIMA, AdditiveModelForecast.
Enhanced MLFlow autologging for unified classification and regression.
API Changes
Added 'interpret' in predict() of KNNClassifier and KNNRegressor for enabling procedure PAL_KNN_INTERPRET.
Added 'sample_size', 'top_k_attributions', 'random_state' in predict() of KNNClassifier and KNNRegressor for generating local interpretation result.
Enabled missing value handling for input data by adding imputation related parameters in fit(), predict() and score() functions of both UnifiedClassification and UnifiedRegression.
Added 'model_type' in GARCH initialization for allowing variant GARCH models.
Bug Fixes
Fixed key error bug for parameter 'param_values' in DecisionTreeClassifier/Regressor.
Fixed the encoding error of imputation strategy of NONE type in Imputer.
Fixed the key error bug when enabling AFL states for clustering algorithms.
Version 2.12.220428
Bug Fixes
Adapted the auto-ml logging according to the PAL function changes.
Version 2.12.220425
Bug Fixes
Fixed the display issue for the pipeline report.
Fixed the missing ptype issue in automl evaluate function.
Fixed the transform issue in pipeline fit_predict function.
Version 2.12.220408
Bug Fixes
Fixed cancellation button in auto_ml.
Fixed pivot_table for handling NULL values.
Fixed tree debriefing dot visualizer for decision trees.
Fixed the display issue for dataset report with NULL values.
Version 2.12.220325
New Functions
Added IsolationForest.
Added auto_ml including AutomaticClassification, AutomaticRegression and Preprocessing.
Added progress monitor called PipelineProgressStatusMonitor for AutomaticClassification and AutomaticRegression.
Added best pipeline report called BestPipelineReport for AutomaticClassification and AutomaticRegression.
Added to_datetime(), to_tail() in hanaml.dataframe.
Enhancements
Added validation procedure for n_components in CATPCA.
Improved display name in pivot_table.
Added compression and thresholding in wavelet transform.
Moved generate_feature to dataframe function.
Enhanced create_dataframe_from_pandas() with upsert option.
Added ignore_scatter_matrix option in dataset report.
Expose APL variable selection parameters.
Enhanced text mining with German support.
Support more loss functions in HybridGradientBoostingClassifier and HybridGradientBoostingRegressor.
Enhanced white_noise_test() with an option: the degree of freedom, model_df.
Enhanced Attention with local interpretability of model.
Enhanced integer index support for TimeSeriesExplainer.explain_arima_model() for ARIMA and AutoARIMA.
Added procomputed affinity for AgglomerateHierarchicalClustering.
Added model compression related parameters for HybridGradientBoostingClassifier and HybridGradientBoostingRegressor.
Bug Fixes
Fixed M4 sampling with lowercase column name.
Fixed inconsistent IDs assigned to solvers between LOGR and M_LOGR.
Fixed a parameter naming error in fft(): flattop_model -> flattop_mode.
Fixed a validation error for endog parameter in Attention predict().
API Changes
Added 'model_df' in the white_noise_test() for selecting the degree of freedom.
Added 'explain_mode' in predict() of GRUAttention for selecting the mechanism for generating the reason code for inference results.
Version 2.11.220209
Bug Fixes
Fixed wrong arg check for 'histogram' in HGBT split method.
Fixed bug in deploy_class with transport_request.
Version 2.11.220107
Bug Fixes
Fixed box plot with lower case column name.
Fixed add_id when the rel_col input is list type.
Fixed shortest_path and shortest_path_one_to_all type cast error.
Fixed fast DTW alignment error.
Position correction for random search times in LOGR.
Fixed HANA hint script generation for resource restriction.
Version 2.11.211211
New Functions
Added FeatureSelection.
Added BSTS.
Added Word Cloud.
Added hdbprocedure generation in pal_base and applied to all functions.
Added GARCH.
APL classification, regression, clustering: a new method, 'export_apply_code', generates code which can be used to apply a trained model outside APL.
Enhancements
Enhanced Preprocessing with FeatureSelection.
Enhanced the model storage with fit parameters in json format.
Enhanced PCA categorical support.
Enhanced model storage with fit parameters info.
Enhanced UnifiedExponentialSmoothing with massive mode.
Enhanced AMDP generation as a function in unified_classification.
Enhanced ARIMA with a explainer in the predict function.
Enhanced additive_model_forecast with a explainer in the predict function.
Enhanced HybridGradientBoostingClassifier with continue training of a trained HybridGradientBoostingClassifier model.
Enhanced APL AutoTimeSeries with advanced predict outputs: the 'APL/ApplyExtraMode' parameter can be set in 'extra_applyout_settings'.
Enhanced the stored procedure information retrieval.
Enhanced fillna to support non-numeric columns.
Enhanced dataset report to convert PAL unsupported type.
API Changes
Added 'background_size' in the init() and 'thread_ratio', 'top_k_attributions', 'trend_mod', 'trend_width', 'seasonal_width' in the predict() function of ARIMA() and AutoARIMA().
Added 'show_explainer', 'decompose_seasonality', 'decompose_holiday' in the predict() function of additive_model_forecast().
Added 'warm_start' in the fit() function of HybridGradientBoostingClassifier() and HybridGradientBoostingRegressor() for continuing training with existing model.
Bug Fixes
Fixed index creation bug in on-premise text_classification api.
Fixed multi-class logistic regression init check bug.
Fix has_table with local temporary tables.
Version 2.10.210918
New Functions
Added dtw() for generic dynamic time warping with predefined and custom defined step pattern.
Added wavedec() for multi-level discrete wavelet transformation, and waverec() for the corresponding inverse transformation.
Added wpdec() and wprec() for multi-level (discrete) wavelet packet transformation and inverse.
Added OnlineMultiLogisticRegression() which is the online version of Multi-Class Logistic Regression.
Added spectral clustering.
Added LSTM with attention.
Added OneHotEncoding.
Added unified preprocessor.
Added Pipeline plot.
Added UnifiedExponentialSmoothing().
Enhancements
Enhanced the model storage support for OnlineLinearRegression().
Enhanced multi-threading in tm functions.
Enhanced HDL container option.
Enhanced timestamp support for ARIMA(), AutoARIMA(), VectorARIMA(), OnlineARIMA(), SingleExponentialSmoothing(), DoubleExponentialSmoothing(), TripleExponentialSmoothing(), AutoExponentialSmoothing(), BrownExponentialSmoothing(), Croston(), LR_seasonal_adjust().
Enhanced new distributions for MCMC sampling.
Support multiple accuracy_measure methods in Single/Double/Triple ExponentialSmoothing, BrownExponentialSmoothing, Croston and LR_seasonal_adjust.
Added plotly support.
API Changes
Added 'key', 'endog', 'exog', 'categorical_variable' in the fit() function of AdditiveModelForecast().
Added 'prediction_confidence_1' and 'prediction_confidence_2' in BrownExponentialSmoothing().
Version 2.9.210726
Bug Fixes
Fixed load_model initialized error in model storage service.
Fixed bad link in pypi portal.
Version 2.9.210709
Bug Fixes
Fixed missing WeaklyConnectedComponents in hana_ml.graph.algorithms.
Fixed missing statistics in hana_ml.graph.Graph.describe.
Fixed a bug, where the Graph object creation and discover_graph_workspace() and Graph.describe() do not work on an on-premise system
Version 2.9.210630
Bug Fixes
Fixed accuracy_measure issue in Single/Double/Triple/Auto Exponential Smoothing().
Fixed empty input table error in Croston()
Fixed class_map error for multiclass logisticregreesion in UnifiedClassification().
Version 2.9.210619
Enhancements
Constants for directions used in graph functions can be found in hana_ml.graph.constants.DIRECTION_*
Following functions and objects are now available in hana_ml.graph for import
Graph object
create_graph_from_dataframes and create_graph_from_hana_dataframes factory methods
discover_graph_workspaces
discover_graph_workspace
The geometries do not need to be to be specified when creating a DataFrame instance anymore. The geometries are analyzed automatically.
Support list of targets and trans_param in feature_tool.
Enhanced unified report for unified_regression to view feature importance.
Enhanced join() to support list of DataFrame.
Enhanced union() to support list of DataFrame.
Streamlined the create_dataframe_from_pandas geo parameters. Now there is only one list of geo_cols, which supports column references as well as (lon, lat) tuples, and one SRID parameter for all columns
When you 'create_dataframe_from_pandas' and pass a GeoPandas DataFrame, the geometry column will be detected automatically and processed as a geometry. You don't need to add it manually to geo_cols
The Graph constructor is simplified. You can instantiate a graph simply by the workspace name.
Enhanced ModelStorage for APL to support HANA Data Lake.
New Functions
Introduced hana_ml.graph.algorithms which contains all graph algorithms in the future. The package provides a AlgorithmBase class which can be used to build additional algorithms for a graph.
Add hana_ml.graph.algorithms.ShortestPath, which replaces Graph.shortest_path
Add hana_ml.graph.algorithms.Neighbors, which replaces Graph.neighbors
Add hana_ml.graph.algorithms.NeighborsSubgraph, which replaces Graph.neighbors_with_edges
Add hana_ml.graph.algorithms.KShortestPaths
Add hana_ml.graph.algorithms.ShortestPathsOneToAll
Add hana_ml.graph.discovery.discover_graph_workspace, which reads the metadata of a graph
Add hana_ml.graph.create_graph_from_edges_dataframe
Add hana_ml.graph.Graph.has_vertices, to check if a list of vertices exist in a graph
Add hana_ml.graph.Graph.subgraph, to create a vertices or edges induced subgraph
Add hana_ml.graph.Graph.describe, to get some statistics
Add hana_ml.graph.Graph.degree_distribution
Add hana_ml.DataFrame.srids, which returns the SRS of each geometry column
Add hana_ml.DataFrame.geometries, which returns the geometry columns if there are any
Add hana_ml.spatial package, that contains
create_predefined_srs
is_srs_created
get_created_srses
Add hana_ml.docstore package, that contains
create_collection_from_elements
Added BCPD() for Bayesian change point detection.
Added shape in dataframe.
Added sort_values, sort_index in dataframe.
Added scheduler for model renew in model_storage.
Added min, max, mean, median, sum, value_counts in dataframe.
Added SHAP support for unified regression.
Added data lake support in model_storage.
Added data lake support in dataframe functions.
Added line plot for time series forecast.
Added split_column().
Added concat_columns().
Added outlier_detection_kmeans(), which detects outliers in datasets based on the result of k-means clustering.
Added intermittent_forecast() for forecasting intermittent demand data(time-series).
Added OnlineLinearRegression() which is an online version of the Linear Regression.
API Changes
Removed geo_cols from dataframe.create_dataframe_from_shapefile
Removed geo_cols from ConnectionContext.sql()
Removed geo_cols from ConnectionContext.table()
Removed Graph.neighbors and Graph.neighbors_with_edges
Removed Graph.shortest_path
Removed hana_ml.graph.Path. This is not used anymore
Removed hana_ml.graph.create_hana_graph_from_existing_workspace. This is replaced by a simplified Graph object constructor.
Renamed hana_ml.graph.create_hana_graph_from_vertex_and_edge_frames to create_graph_from_dataframes
Changed the type of geo_cols in create_dataframe_from_pandas to list, which supports direct column references or (lon, lat) tuples for generating POINT geometries
Bug Fixes
Fixed inflexible default locations of selected columns of input data, e.g. key, features and endog.
Fixed model report's feature importance when it has 0 importance.
Version 2.8.210421
Version 2.8.210421 supports SAP HANA SPS05 and SAP HANA Cloud
Bug Fixes
Fixed model report's feature importance when it has 0 importance.
Fixed pivot_table with multiple index issue.
Fixed the verbose missing for RDT regressor.
Fixed the shap display for categorical columns.
Version 2.8.210321
Version 2.8.210321 supports SAP HANA SPS05 and SAP HANA Cloud
Enhancements
Enhanced sql() to enable multiline execution.
Enhanced save() to add append option.
Enhanced diff() to enable negative input.
Enhanced model report functionality of UnifiedClassification with added model and data visualization.
Enhanced dataset_report module with a optimized process of report generation and better user experience.
Enhanced UnifiedClustering to support 'distance_level' in AgglomerateHierarchicalClustering and DBSCAN functions. Please refer to documentation for details.
Enhanced model storage to support unified report.
New Functions
Added generate_html_report() and generate_notebook_iframe_report() functions for UnifiedRegression which could display the output, e.g. statistic and model.
APL Gradient Boosting: the other_params parameter is now supported.
APL all models: a new method, get_model_info, is created, allowing users to retrieve the summary and the performance metrics of a saved model.
APL all models: users can now specify the weight of explanatory variables via the weight parameter.
Added LSTM.
Added Text Mining functions support for both SAP HANA on-premise and cloud version.
tf_analysis
text_classification
get_related_doc
get_related_term
get_relevant_doc
get_relevant_term
get_suggested_term
Added unified report.
- New dependency:
Added new dependency 'htmlmin' for generating dataset and model report.
API Changes
KMeans with two added parameters 'use_fast_library' and 'use_float'.
UnifiedRegression with one added parameter 'build_report'.
Added a parameter 'distance_level' in UnifiedClustering when 'func' is AgglomerateHierarchicalClustering and DBSCAN. Please refer to documentation for details.
Renamed 'batch_size' with 'chunk_size' in create_dataframe_from_pandas.
OnlineARIMA has two added parameters 'random_state', 'random_initialization' and its partial_fit() function supports two parameters 'learning_rate' and 'epsilon' for updating the values in the input model.
Bug Fixes
Fixed onlineARIMA model storage support.
Fixed inflexible default locations of selected columns of input data, e.g. key, features and endog.
Fixed accuracy_measure issue in AutoExponentialSmoothing.
Version 2.6.210126
Version 2.6.210126 supports SAP HANA SPS05 and SAP HANA Cloud
Bug Fixes
Fixed uuid issue for Python 3.8.
Fixed wrong legend for unified classification model report.
Fixed dataset report to handle the dataset with missing value.
Version 2.6.210113
Version 2.6.210113 supports SAP HANA SPS05 and SAP HANA Cloud
Bug Fixes
Fixed load_model issue for KMeans clustering.
Removed pypi installation of Shapely for windows user.
Fixed duplicate rows bug in save() function.
Fixed loading issue in model report.
Replaced the option 'batch_size' with 'chunk_size' in create_dataframe_from_pandas.
Version 2.6.201209
Version 2.6.201209 supports SAP HANA SPS05 and SAP HANA Cloud
Bug Fixes
Remove shap from installation.
Fixed bugs in dataframe functions when autocommit=False.
Fixed font properties bugs in eda functions.
APL Documentation: other_train_apl_aliases is now documented.
APL Gradient Boosting Classification: the target variable won't be displayed in prediction if it is not given in input.
APL Gradient Boosting: the default parameter values are now set in the APL backend level. They won't be set in the Python API level.
Fixed handling of geometry columns in the context of Dataframe.collect calls.
Fixed shapely not being a required dependency.
Fixed the displacement of parameter 'dispersion' in CPD.
Version 2.6.201106
Version 2.6.201116 supports SAP HANA SPS05 and SAP HANA Cloud
Enhancements
Enhanced collect() performance for large datasets.
Enhanced create_dataframe_from_pandas performance for large datasets.
New Functions
Added kdeplot() for 1D and 2D kde plotting.
Added SHAPLEY visualization.
Bug Fixes
Fixed incompatibility issue with matplotlib>=3.3.0.
Version 2.6.201016(2.6.200928)
Version 2.6.201016 supports SAP HANA SPS05 and SAP HANA Cloud
API Changes
HybridGradientBoostingClassifier, HybridGradientBoostingRegressor: added a parameter 'adopt_prior' to indicate whether to adopt the prior distribution as the initial point.
SVC, SVR, OneClassSVM, SVRanking: added parameters 'compression', 'max_bits', 'max_quantization_iter' for model compression.
RDTClassifier: added parameters 'compression', 'max_bits', 'quantize_rate' for model compression.
RDTRegressor: added parameters 'compression', 'max_bits', 'quantize_rate', 'fittings_quantization' for model compression.
In prediction function ARIMA and AutoARIMA, new value 'truncation_algorithm' of forecast_method is introduced to improve the prediction performance.
New parameters 'string_variable', 'variable_weight' are added in KNNClassifier, KNNRegressor and DBSCAN to enable distance calculation based on String distance.
New parameters 'extrapolation', 'smooth_width', 'auxiliary_normalitytest' are added in seasonal_decompose function.
- New functions:
Added dataset manager.
Added graph and spatial modules.
Added dataset report.
Added clustering function: SlightSilhouette.
Added native storage support in model storage service and dataset manager.
Added vector ARIMA.
Added unified regression.
Added unified clustering.
Bug Fixes
Fixed ROC curve display in model report with disordered points.
Fixed load_model for unified_classification in model storage service.
Fixed model_selection for unified_classification.
Version 2.5.200626
Version 2.5.200626 supports SAP HANA SPS05 and SAP HANA Cloud
API Changes
Removed parameter ConnectionContext in PAL functions.
Updated parameter algorithm from mandatory to optional in DecisionTreeClassifier/Regressor(), with default value 'cart'.
Added parameter key in fit() function of tsa.ARIMA() and tsa.AutoARIMA().
Added parameter decompose_type in tsa.seasonal_decompose().
Added parameter save_alignment and a new output statistic table in tsa.fast_dtw().
Added parameter table_structure in create_dataframe_from_pandas().
Added parameter resampling_method and param_search_strategy in HybridGradientBoostingClassifier/Regressor().
New Functions
Added functions in dataframe.py: melt(), read_pickle().
Added unified classification function. Especially, generate_html_report() and generate_notebook_iframe_report() are provided to visualize the output, e.g. confusion matrix and ROC curve.
Added mcmc function.
Added model selection services.
Added visualizers (model Debriefing).
Enhancements
Enhanced smart sampling for visualizers.
Enhanced import function to SAP HANA.
Enhanced bytes, TIMESTAMP and BIGINT support in create_dataframe_from_pandas() in dataframe.py.
Enhanced TIMESTAMP and DATE support in describe() in dataframe.py.
Predictions made with APL gradient boosting can now be complemented with the reasons that led to these predictions: number of top or bottom explanatory variables, strength values, etc.
Supported more data types, SMALLINT, DECIMAL, TINYINT, BIGINT, CLOB and BLOB in DataFrame.dtypes(), generate_table_type() and is_numeric().
Enhanced the missing value handling in hana_ml.visualizers.eda bar/box/pie plot in the groupby column by creating a new class for missing values.
Predictions made with APL gradient boosting can now be complemented with the reasons that led to these predictions: number of top or bottom explanatory variables, strength values, etc.
APL gradient boosting can provide metrics about feature interactions strength.
The connection parameter is no longer required for APL model creation.
Bug Fixes
Fixed wrong ID issue in fit function by adding key option in tsa.ARIMA() and tsa.AutoARIMA().
Fixed CLOB type issue in create_dataframe_from_pandas() by adding table_structure and drop_exit_tab options.
Fixed pivot_table() index naming bug.
Fixed temporary view from temporary table issue in APL time series function by adding sort_data and get_horizon_wide_metric.
Fixed bugs in create_dataframe_from_pandas() if the table is temporary.
Fixed bugs for data type of init centers in GMM().
Fixed bugs when some data types, e.g. SMALLINT, DECIMAL or TINYINT are not supported in DataFrame.dtypes(), generate_table_type() and is_numeric().
Fixed bugs when data types, e.g. DATE and TIMESTAMP, are not supported in DataFrame.describe().
Fixed the table overwrite bug in DataFrame.save() if the table name is duplicate.
Fixed missing quotation mark in column name bugs in hana_ml.visualizers.eda.
Users can set 'Cutting Strategy' in APL Gradient Boosting.
APL models are saved correctly.
- Deprecated Functions:
GradientBoostingClassifier.
GradientBoostingRegressor.
Version 1.0.8
Version 1.0.8 supports SAP HANA SP04 (100% coverage for SAP HANA SPS04 PAL algorithms)
- New Functions in the PAL package:
preprocessing : Multidimensional Scaling(MDS), Synthetic Minority Over-Sampling Technique(SMOTE, only supported in SAP HANA SPS05), Sampling, Variance Test.
statistics : condition index, Cumulative Distribution Function(cdf), Distribution fitting, Distribution Quantile, Entropy, Equal Variance Test, Factor Analysis, Grubbs' Test, Kaplan-Meier Survival Analysis, Kernel Density, One-Sample Median Test, Wilcox Signed Rank Test.
time series : Linear Regression with Damped Trend and Seasonal Adjust, Additive Model Forecast, Hierarchical Forecast, Correlation Function, online algorithms and dynamic time warping(fast DTW).
miscellaneous : ABC Analysis, T-distributed Stochastic Neighbour Embedding(TSNE), Weighted Score Table.
Added functions in dataframe.py: data_manipulation().
Added cross-validation options to SAP HANA PAL functions.
Added visualizers (EDA profiler).
Added model storage services.