Changelog
Version 2.22.241011
Bug Fixes
Fixed get_best_pipeline function sorting issue in massive AutoML.
Fixed error message display issue in massive AutoML.
Fixed the key display issue in new version of dataset report.
Fixed importing issue in AutoML progress monitor and configuration.
Version 2.22.240917
New Functions
Added Outlier Detection for regression
OutlierDetectionRegression
.Added Text Classification with model
TextClassificationWithModel
.Added Multi-task MLP Classification model
MLPMultiTaskClassifier
.Added Multi-task MLP Regression model
MLPMultiTaskRegressor
.Added Model Card creation fucntion
create_model_card()
.Added Model Card parser
parse_model_card()
.Added to_sqlalchemy function in ConnectionContext.
Added Massive AutoML model
MassiveAutomaticClassification
,MassiveAutomaticRegression
andMassiveAutomaticTimeSeries
.
Enhancements
Enhanced dataset report with new style.
Enhanced model storage with model card integration.
Enhanced make_future_dataframe with increment_type option.
Enhanced hana_scheduler to support the schema of the output tables.
Enhanced json pipeline input in Pipeline Module.
Bug Fixed
Fixed numpy 2.0 incompatibility issue.
Version 2.21.240909
Bug Fixes
Fixed the group id INT issue with special group id.
Fixed display issue in dataset report when y-axis starts from 0.
Fixed the massive model issue with empty params.
Fixed cross join issue in dataframe function.
Version 2.21.240726
Bug Fixes
Fixed the display order of hana_scheduler job.
Fixed the missing schema support of output tables in hana_scheduler.
Fixed the missing meta table issue in model storage import_model function.
Fixed the timezone warning in hana_scheduler.
Improved the Weibull distribution fitting with the location parameter.
Version 2.21.240712
Bug Fixes
Fixed the AutoML config issue related to different runtime_platform.
Fixed the model storage issue due to missing SQL training script.
Fixed the AutoML Time Series parameter registration issue.
Version 2.21.240628
Bug Fixes
Fixed melt function with multiple keys.
Fixed dtypes issue to support REAL_VECTOR for hdbcli 2.21.
Version 2.21.240624
New Functions
Added Hull-White Simulation function
hull_white_simulate()
.Added Finetune Best Pipeline in AutoML
AutomaticClassification
andAutomaticRegression
.Added Massive Text Mining in methods
get_related_doc()
,get_related_term()
,get_relevant_doc()
,get_relevant_term()
,get_suggested_term()
.Added Benford Analysis
benford_analysis()
.Added AutoML config Dict Visualizer
AutoMLConfig
.Added text classification with model
TextClassificationWithModel
.
Enhancements
Enhanced DataFrame functions to support real vector type.
Optimize default value for AutoML.
Enhanced AutoML connections visualization with zoom in/out.
Enhanced pipeline module with explainability.
Enable get_permutation_importance() method for time series in ARIMA, AutoARIMA, AdditiveModelForecast, LTSF and BSTS
Bug Fixes
Fixed SHAP display issues in RDT/DT/HGBT.
Fixed EDA display issues.
Version 2.20.240426
Bug Fixes
Fixed the placement issue of confusion matrix in model report.
Fixed the shapely dependency installation issue.
Fixed the concat function issue in hana_graph.
Fixed the wrong thread_ratio parameter in LinkPredict.
Fixed the pipeline module in model storage.
Version 2.20.240319
New Functions
Added Bubble Plot
hana_ml.visualizers.eda.bubble_plot()
and Parallel Co-ordinate Plothana_ml.visualizers.eda.parallel_coordinates()
in EDA.Added time series permutation feature importance
permutation_importance()
.Added MLP Recommender
hana_ml.algorithms.pal.recommender.MLPRecommender
.
Enhancements
Enhanced progress monitor for AutoML to display at anytime. Especially for the scheduled job.
Support algorithm-specific parameters in automl/pipeline predict. Effect in both pipline and automl module.
Enhanced AutoML Auto SQL Content Integration for progress log management.
Enhanced pipeline report to display connection scores.
Enhanced AutoML config_dict templates with new operators.
Enhanced AutoML with connection constraints option.
Enhanced AUtoML with different logging levels.
Enhanced AutoML with random/grid search option.
Enhanced outlier_detection with voting logic.
Enhanced online BCPD in massive mode.
Version 2.19.240221
Bug Fixes
Fixed incorrect shap plot.
Fixed incorrect dependence plot.
Fixed wrong SQL generation result in fairml.
Version 2.19.240131
Default Value Changes
Changed the default value of enable_plotly to True in all visualization module.
Removed the matplotlib dependency.
Version 2.19.240124
Bug Fixes
Fixed missing fairml metrics in binary_classification_debriefing.
Fixed missing parameter issues in fairml.
Version 2.19.240115
Bug Fixes
Fixed model_debriefing issue for RDT and HGBT.
Version 2.19.240104
Bug Fixes
Fixed fairml prediction issue with fixed uuid.
Fixed fairml issue when submodel is used.
Fixed box_plot issue when plotly is enabled abd groupby is None.
Fixed unified regression issue when key is not specified.
Fixed massive IsolationForest without key issue.
Fixed variable importance issue in model report.
Fixed model storage issue when odbc is used.
Version 2.19.231207
New Functions
Added
FairMLClassification
andFairMLRegression
to provide fairness in machine learning.Added
auto_cast()
in dataframe module.Added
description
anddescription_ext
property of dataframe.Added
QuantileTransform
.Added
summary()
.Added score function of AutoML module.
Added
fit()
,transform()
andscore()
functions in Pipeline module.
Enhancements
Support STL decomposition method in
seasonal_decompose()
by offering new parametersdecompose_method
,stl_robust
andstl_seasonal_average
.Provide explainability support in
LTSF
.Offer ignore zero calculation support in
UnifiedRegression
when calculating MPE or MAPE.Provide top N verbose classes in predict () of
UnifiedClassification
by offering a new parameterverbose_top_n
.Support Stock keeping oriented Prediction Error Costs (SPEC) in
AutomaticTimeSeries
.Enhanced AutoML logging with Auto SQL Content.
Enhanced operation, mutate and filter in DataFrame.
Enhanced reset_config_dict by allowing the custom config_dict.
Boost dataset report with new PAL describe function.
Enhanced proximal gradient support in HGBT including unified classification.
Enhanced tf_analysis with enable_stopwords and keep_numeric parameters.
Enhanced indicator for model deletion in model storage.
Enhanced OneHotEncoding with onehot_min_frequency and onehot_max_categories parameters.
Enhanced AutoML with connection optimization.
Enhanced model list in model storage by adding partial info display option.
Enhanced AutoML progress monitor with evaluating tab.
API Changes
Added parameters
decompose_method
,stl_robust
,stl_seasonal_average
andsmooth_method_non_seasonal
inseasonal_decompose()
.Added parameters
show_explainer
andreference_dict
to provide explainability support inLTSF
.Added a parameter
ignore_zero
to offer ignore zero calculation support inUnifiedRegression
when calculating MPE or MAPE.Added a parameter
verbose_top_n
in predict () ofUnifiedClassification
to present top N verbose classes.
Version 2.18.231114
Bug Fixes
Fixed pipeline score issue.
Fixed model table error in automl fit with reason code.
Version 2.18.231103
Bug Fixes
Fixed key issue in ARIMA explainer.
Fixed hana-ml parameter registration issue.
Fixed PCA issue in pipeline fit.
Fixed missing attributes issue in time series SHAP.
Fixed NULL value issue in time series report.
Fixed NULL value issue in SHAP visualizer.
Fixed describe function issue with duplicate "unique" column.
Fixed key issue in timeseries_box_plot when plotly is enabled.
Fixed incompatibility issue with pandas 2.0.
Fixed dataframe issue when hint clause is used.
Fixed csrf token issue in amdp deployer.
Version 2.18.230927
Bug Fixes
Fixed cancellation issue in AutoML progress monitor.
Fixed log cleanup issue in AutoML.
Fixed usage of concat in Graph.describe().
Fixed temporary table issue by replacing with table variable.
Fixed missing parameter issue in HGBT regression.
Fixed syntax error in louvain.
Version 2.18.230914
New Functions
Added
make_future_dataframe()
.Added
interval_quality()
.Added
drop_view()
.Added
hana_ml.graph.algorithms.CommunitiesLouvain
.
Enhancements
Enhanced the support of Portuguese in
tf_analysis()
,text_classification()
,get_related_doc()
,get_related_term()
,get_relevant_doc()
,get_relevant_term()
,get_suggested_term functions()
.Enhanced the support of different type of network like
NLinear
,DLinear
,XLinear
,SCINet
inLTSF
.Support the massive mode of
accuracy_measure()
.Support the massive mode of
IsolationForest()
.Enhanced model storage method
list_models
with display option.Enhanced model selection with range support.
Enhanced
ShapleyExplainer
with dependence plot.Enhanced AutoML with explanation visualization.
Enhanced
UnifiedRegression
with predict interval and visualization.Enhanced
UnifiedClassification
with feature importance support.Enhanced
OutlierDetectionTS
with 'auto' mode.Simplified the AutoML fit with background_size.
Enhanced AutoML and Pipeline modules with score function.
Enhanced progress monitor in AutoML with the evaluating tab.
API Changes
Added a parameter called
network_type
inLTSF
for network selection.Enhanced HANA scheduler by removing manual parameters input.
Bug Fixes
Fixed scatter plot error from ax.scatter c to cmap.
Fixed BAS incompatibility issue.
Fixed time diff error when creating new timeframe.
Fixed date type issue in dataset report.
Fixed time series report issue in changepoints_item.
Version 2.17.230808
Bug Fixes
Fixed default cmap value in
scatter_plot()
.Fixed display error for monthly data in tsa functions.
Fixed index display error in
seasonal_decompose()
.Fixed
TimeSeriesReport
index sort issue.
Version 2.17.230727
Bug Fixes
Fixed Decimal issue in Explainer item and other related items in
TimeSeriesReport
.Fixed shadow option in EDA
pie_plot()
.Fixed enable_stopwords issue in
tf_analysis()
. The same issue in wordcloud plot.Fixed year legend sort issue in
TimeSeriesReport
.
Version 2.17.230714
Bug Fixes
Fixed wrong error message in
HANAScheduler
.Fixed corr issue that the column misses quotes.
Fixed front-end connection reset issue in AutoML to avoid too many query from progress table.
Fixed cron missing issue by adding NULL check.
Fixed the Decimal issue in
TimeSeriesReport
.Fixed the x-axis order issue in
TimeSeriesReport
.
Version 2.17.230628
Bug Fixes
Fixed CAP generation issues for APL.
Fixed duplicated prefix for predict artifact in CAP generation.
Fixed parameter checking for APL.
Version 2.17.230622
New Functions
Added
set_scale_out()
to enable APL functions execution in scaling out environment.Added
OnlineBCPD
.Added
HANAGeneratorForCAP
.Added
PowerTransform
.Added
HANAScheduler
.
Enhancements
Enhanced the support of plotly for
eda
functions likequarter_plot()
,seasonal_plot()
...Enhanced the support of spectral clustering in
UnifiedClustering
Enhanced HANA artifacts generation for pipeline module.
Enhanced AutoML with reason code option.
Enhanced
TimeSeriesReport
with confidence interval.Enhanced RDT with prediction interval in
UnifiedRegression
.Enhanced
ModelStorage
with server-side scheduler.Enhanced unified API for pivoted input data.
Enhanced
diff()
to support datetime column.
Version 2.16.230601
Bug Fixes
Fixed model load issue for
Pipeline
module in model storage.Fixed parameter missing in
HybridGradientBoostingClassifier
andHybridGradientBoostingRegressor
.
Version 2.16.230526
Bug Fixes
Fixed pipeline missing evaluation function.
Fixed tips and chart width for model report.
Fixed built-in operation missing in pipeline module.
Fixed
WordCloud
issues to disable stopwords.
Version 2.16.230519
Bug Fixes
Fixed
AutomaticTimeSeries
config_dict template.Fixed progress logging in auto-ml module.
Fixed the progress monitor
GeneralProgressStatusMonitor
issue when early_stop is enabled.Fixed KNN NaN issue due to the pandas new changes.
Fixed
describe()
function to support SMALLINT.
Version 2.16.230508
Bug Fixes
Fixed pipeline module for model storage.
Fixed stuck in progress monitor when progress table is not empty.
Fixed quote issue in serializing pipeline object.
Fixed parameter missing in FACCM.
Fixed dependency issue for pydotplus.
Fixed
tail()
function with defaultrel_col
.Fixed NaN in KNN optimal parameter collect.
Version 2.16.230413
Bug Fixes
Fixed unsupported issue of precomputed distance matrix in
predict()
ofUnifiedClustering
Fixed wrong error message when PAL functions are executed in HANA SPS07.
Fixed no timestamp support in
plot_time_series_outlier()
.Fixed timestamp error in progress monitor in
AutomaticClassification
,AutomaticRegression
,AutomaticTimeSeries
.Fixed polynomial feature generation category undefined issue.
Version 2.16.230323
Bug Fixes
Fixed wrong error message when "hint" has been used.
Fixed load_model issue for APL model.
Version 2.16.230316
New Functions
Added model report items to
TimeSeriesReport
.Added time-series report to
UnifiedReport
.Added
TimeSeriesClassification
.Added TUDF code generation function.
Added AMDP generator
create_amdp_class()
forPipeline
.Added
import_csv_from()
function for importing csv file from the cloud storage locations like Azure, Amazon(AWS) Google Cloud, SAP HANA Cloud, Data Lake Files(HDLFS).Added
set_scale_out()
to enable APL functions execution in scaling out environment.
Enhancements
Enhanced model report with new framework.
Enhanced
LinearRegression
with prediction/confidence interval.Enhanced
accuracy_measure()
with SPEC measure.Enhanced
UnifiedExponentialSmoothing
with reason code control parameter.Enhanced precalculated distances matrix input for KMEDOIDS in
UnifiedClustering
.Enhanced
AutomaticClassification
andAutomaticRegression
with successive halving.
API Changes
Added a parameter called
decom_state
inUnifiedExponentialSmoothing
for the control of reason code display.Added a parameter called
lang
inWordCloud
for language selection.
Version 2.15.230217
Bug Fixes
Fixed cmap issues in
eda
visualizer.Fixed FFM label bug.
Fixed missing
word_cloud
module issue.
Version 2.15.230111
Bug Fixes
Fixed the blank chart issue in dataset report.
Fixed dataset report crash due to empty column.
Version 2.15.221223
Bug Fixes
Fixed detected season change-points missing error of
BCPD
.Fixed Change points Chart in
TimeSeriesReport
.
Version 2.15.221216
New Functions
Added
nullif()
function in dataframe.Added long-term time series forecast algorithm
LTSF
.Added
plot_change_points()
to plot change points.Added
plot_psd()
to plot power spectral density.Added
periodogram()
to perform power spectral density estimate of the input signal.Added change_point detection item to
TimeSeriesReport
.
Enhancements
Enhanced
OutlierDetectionTS
with IsolationForest and DBSCAN.Enhanced Data Type Timestamp support in
BCPD
.Enhanced
UnifiedClustering
with massive mode.Enhanced
UnifiedReport
with scoring report.Enhanced
ModelStorage
with model export/import function.Enhanced MLFlow integration with pipeline module and model export.
Enhanced
UnifiedReport
with model debriefing and pipeline report.Enhanced force_plot in
ShapleyExplainer
to handle missing features.Enhanced
HybridGradientBoostingClassifier
andHybridGradientBoostingRegressor
with early_stop option.Enhanced pipeline and auto-ml module (
AutomaticClassification
,AutomaticRegression
andAutomaticTimeSeries
) with predefined output tables.
API Changes
Change the "JSON" column to NCLOB table type in
ModelStorage
.
Version 2.14.221208
Bug Fixes
Fixed dependency issues in dataset report.
Fixed documentation link in PyPI portal.
Fixed
replace()
function to support NULL replacement.
Version 2.14.221201
Bug Fixes
- Fixed dataset report display issue:
wrong binning method for distribution plot.
NA handling issue in scatter matrix.
Fixed SQL generation issue in
hana_ml.algorithms.pal.pipeline.Pipeline
module.Fixed model state creation in KNN.
Fixed missing parameters in
AutomaticTimeSeries
.Fixed wrong type of split_method in
AutomaticTimeSeries
.
Version 2.14.221028
Bug Fixes
Fixed pipeline monitor when password contains ','.
Fixed message not defined error in auto-ml.
Fixed pipeline error for PCA, DT and FN when
disable_hana_execution()
is executed.Fixed json pipeline generation for HGBT and RDT.
Fixed parameter name typos for DT and RDT in
UnifiedClassification
.Fixed execute_statement parser when parameters contain special characters.
Version 2.14.221014
Bug Fixes
Fixed legend issues in
forecast_line_plot()
.Fixed duplicated outputs issue in artifact generator.
Fixed best pipeline report that points exceed the chart.
Fixed progress bar counter issue for
AutomaticTimeSeries
.Fixed predefined partition in unified API.
Version 2.14.220923
Bug Fixes
Fixed degree_values issue Polynomial regression in
UnifiedRegression
.Fixed legend order in
seasonal_plot()
.Fixed cross validation parameters in
AutomaticTimeSeries
forecast.
Version 2.14.220918
New Functions
Added
replace()
function in dataframe.Added the class
OutlierDetectionTS
for time series outlier detection algorithm called.Added the class
AutomaticTimeSeries
.Added function
make_future_dataframe()
.Added
force_plot()
for SHAP explainer.Added class
ImputeTS
.Added time series data report.
Added function
ks_test()
.Added function
create_dataframe_from_spark()
.Added set_model_state function.
Added outlier profiling.
Added
plot_time_series_outlier()
in EDA.
Enhancements
- Enhanced model storage
support pipeline in auto-ml.
support model report.
MLFlow integration for auto-ml.
Pipeline
module enhancement with PAL_PIPELINE_FIT and PAL_PIPELINE_PREDICT.Enhanced DataFrame function with
enable_abap_sql()
.Successive halving for HGBT, KNN, SVM and MLR.
Enhanced auto-ml with lightweight config dict option.
Enhanced JSON model support in Multi-class
LogisticRegression`
andLinearRegression
.Enhanced the support of French and Russian in
tf_analysis()
,hana_ml.text.tm.text_classification()
,get_related_doc()
,get_related_term()
,get_relevant_doc()
,get_relevant_term()
,get_suggested_term()
functions.Enhanced the support of pre-defined period setting in
seasonal_decompose()
with a new parameterperiods
.
API Changes
Added initialization parameters
handling_missing
andjson_export
inLinearRegression
.Added initialization parameters
json_export
,precompute_lms_sketch
,stable_sketch_alg
,sparse_sketch_alg
in Multi-classLogisticRession
.Added parameter
periods
inseasonal_decompose()
.Added parameters for APL segmented modeling, segmented forecast and parallel apply:
max_tasks
andsegment_column_name
(see APL 2209 and APL 2211 release notes).
Version 2.13.220722
Bug Fixes
Fixed early_stop in auto-ml.
Fixed display issue in unified report for APL.
Version 2.13.220715
Bug Fixes
Fixed class_map0, class_map1 issue in
UnifiedClassification
.Fixed early_stop parameter missing in
AutomaticClassification
andAutomaticRegression
.Fixed
binary_classification_debriefing()
: divided by zero issue.
Version 2.13.220701
Bug Fixes
Fixed table name too long in model storage
save_model()
function.Fixed mlflow autologging with additional fit parameters.
Fixed no mlflow model info display issue.
Fixed metric sampling for model report.
Fixed wrong schedule template in
ModelStorage
.
Version 2.13.220608
Bug Fixes
fixed identifier length too long issue for function outputs.
Version 2.13.220511
New Functions
Added
upsert_streams_data()
andupdate_streams_data()
forDataFrame
.Added
stationary_test()
function.Added class
CrostonTSB
.Added
get_temporary_tables()
andclean_up_temporary_tables()
functions.Added Pipeline class json outputs for auto-ml pipeline_fit.
- Added EDA for time series data.
Added
plot_pacf()
andplot_acf()
Added
plot_moving_average()
Added
plot_rolling_stddev()
Added
seasonal_plot()
Added
timeseries_box_plot()
Added
quarter_plot()
Added rolling window in
generate_feature()
function.Added
get_connection_id()
,restart_session()
andcancel_session_operation()
for classConnectionContext
.
Enhancements
Added support of the following collection of new parameters for HGBT in
UnifiedClassification
andUnifiedRegression
:replacemissing
,default_missing_direction
,feature_grouping
,tol_rate
,compression
,max_bin_num
.Improved the performance of
box_plot()
.Enhanced the massive mode support of
UninfiedClassification
,UnifiedRegression
,ARIMA
,AutoARIMA
,AdditiveModelForecast
.Enhanced MLFlow autologging for
UninfiedClassification
andUnifiedRegression
.
API Changes
Added parameter
interpret
in predict() method ofKNNClassifier
andKNNRegressor
for enabling procedure PAL_KNN_INTERPRET.Added parameters
sample_size
,top_k_attributions
,random_state
in predict() method ofKNNClassifier
andKNNRegressor
for generating local interpretation result.Enabled missing value handling for input data by adding imputation related parameters in fit(), predict() and score() functions of both
UninfiedClassification
andUnifiedRegression
.Added parameter
model_type
inGARCH
for allowing variant GARCH models.
Bug Fixes
Fixed key error bug for parameter
param_values
inDecisionTreeClassifier
andDecisionTreeRegressor
.Fixed the encoding error of imputation strategy of NONE type in
Imputer
.Fixed the key error bug when enabling AFL states for
clustering
algorithms.
Version 2.12.220428
Bug Fixes
Adapted the auto-ml logging according to the PAL function changes.
Version 2.12.220425
Bug Fixes
Fixed the display issue for the pipeline report.
Fixed the missing ptype issue in AutoML evaluate function.
Fixed the transform issue in pipeline
fit_predict`()
function.
Version 2.12.220408
Bug Fixes
Fixed cancellation button in auto_ml.
Fixed
pivot_table()
for handling NULL values.Fixed
TreeModelDebriefing
dot visualizer for Decision Trees.Fixed the display issue for dataset report with NULL values in dataset report.
Version 2.12.220325
New Functions
Added
IsolationForest
.Added auto_ml including
AutomaticClassification
,AutomaticRegression
andPreprocessing
.Added progress monitor called
PipelineProgressStatusMonitor
forAutomaticClassification
andAutomaticRegression
Added best pipeline report called
BestPipelineReport
forAutomaticClassification
andAutomaticRegression
Added
to_datetime()
,to_tail()
methods inDataFrame
.
Enhancements
Added validation procedure for
n_components
inCATPCA
.Improved display name in
pivot_table()
.Added compression and thresholding in
hana_ml.algorithms.pal.tsa.wavelet.wavedec()
.Moved
generate_feature()
toDataFrame
.Enhanced
create_dataframe_from_pandas()
with upsert option.Added
ignore_scatter_matrix
option inbuild()
in dataset report.Expose APL variable selection parameters.
Enhanced text mining module
tm
with German support.Support more objective functions(see
obj_func
) inHybridGradientBoostingClassifier
andHybridGradientBoostingRegressor
.Enhanced
white_noise_test()
with an option: the degree of freedom,model_df
.Enhanced
GRUAttention
with local interpretability of model.Enhanced integer index support for
explain_arima_model()
forARIMA
andAutoARIMA
.Added procomputed affinity for
AgglomerateHierarchicalClustering
.Added model compression related parameters for
HybridGradientBoostingClassifier
andHybridGradientBoostingRegressor
.
Bug Fixes
Fixed
m4_sampling()
with lowercase column name.Fixed inconsistent IDs assigned to solvers in
LogisticRegression()
between LOGR and M_LOGR.Fixed a parameter naming error in
fft()
:flattop_model
-->flattop_mode
.Fixed a validation error for endog parameter in
predict()
in Attention.
API Changes
Added parameter
model_df
in thewhite_noise_test()
for selecting the degree of freedom.Added parameter
explain_mode
inpredict()
ofGRUAttention
for selecting the mechanism for generating the reason code for inference results.
Version 2.11.220209
Bug Fixes
Fixed wrong arg check for 'histogram' in parameter
split_method
ofHybridGradientBoostingClassifier
andHybridGradientBoostingRegressor
.Fixed bug in deploy_class with transport_request.
Version 2.11.220107
Bug Fixes
Fixed
box_plot()
with lower case column name.Fixed
add_id()
when therel_col
input is of list type.Fixed shortest_path and shortest_path_one_to_all type cast error.
Fixed the alignment error in
fast_dtw()
.Position correction for random search times in
LogisticRegression
.Fixed HANA hint script generation for resource restriction.
Version 2.11.211211
New Functions
Added
FeatureSelection
.Added
BSTS
.Added
WordCloud
.Added hdbprocedure generation in
PALBase
and applied to all functions.Added
GARCH
.APL classification, regression, clustering: a new method, 'export_apply_code', generates code which can be used to apply a trained model outside APL.
Enhancements
Enhanced Preprocessing with
FeatureSelection
.Enhanced the
ModelStorage
with fit parameters in json format.Enhanced PCA categorical support.
Enhanced
ModelStorage
with fit parameters info.Enhanced
UnifiedExponentialSmoothing
with massive mode.Enhanced
UnifiedClassification
with AMDP generation as a function.Enhanced
AdditiveModelForecast
with an explainer in thepredict()
function.Enhanced
UnifiedClassification
with continued training of a trained HybridGradientBoostingClassifier model.Enhanced APL AutoTimeSeries with advanced predict outputs: the 'APL/ApplyExtraMode' parameter can be set in 'extra_applyout_settings'.
Enhanced the stored procedure information retrieval.
Enhanced
fillna()
to support non-numeric columns.Enhanced dataset report to convert PAL unsupported type.
API Changes
Added initialization parameter
background_size
, andthread_ratio
,top_k_attributions
,trend_mod
,trend_width
,seasonal_width
in the predict() method ofARIMA
andAutoARIMA
.Added parameters
show_explainer
,decompose_seasonality
,decompose_holiday
in thepredict()
function ofAdditiveModelForecast
.Added
warm_start
in thefit()
method of HybridGradientBoostingClassifier as well as thefit()
method of HybridGradientBoostingRegressor for continued training with existing model.
Bug Fixes
Fixed index creation bug in on-premise text_classification api.
Fixed multi-class
LogisticRegression
init check bug.Fixed
has_table()
error for local temporary tables.
Version 2.10.210918
New Functions
Added
dtw()
for generic dynamic time warping with predefined and custom defined step pattern.Added
wavedec()
for multi-level discrete wavelet transformation, andwaverec()
for the corresponding inverse transformation.Added
wpdec()
andwprec()
for multi-level (discrete) wavelet packet transformation and inverse.Added
OnlineMultiLogisticRegression
which is the online version of Multi-Class Logistic Regression.Added
SpectralClustering
.Added LSTM with attention.
Added OneHotEncoding.
Added unified preprocessor.
Added
UnifiedExponentialSmoothing
.
Enhancements
Enhanced the model storage support for
OnlineLinearRegression
.Enhanced multi-threading in
tm
functions.Enhanced HDL container option.
Enhanced timestamp support for
ARIMA
,AutoARIMA
,VectorARIMA
,OnlineARIMA
,SingleExponentialSmoothing
,DoubleExponentialSmoothing
,TripleExponentialSmoothing
,AutoExponentialSmoothing
,BrownExponentialSmoothing
,Croston
,LR_seasonal_adjust
.Enhanced new distributions for
mcmc()
sampling.Support multiple
accuracy_measure
methods inSingleExponentialSmoothing
,DoubleExponentialSmoothing
,TripleExponentialSmoothing
,AutoExponentialSmoothing
,BrownExponentialSmoothing
,Croston
, andLR_seasonal_adjust
.Added plotly support.
API Changes
Added parameters
key
,endog
,exog
,categorical_variable
in thefit()
function ofAdditiveModelForecast
.Added initialization parameters
prediction_confidence_1
andprediction_confidence_2
inBrownExponentialSmoothing
.
Version 2.9.210726
Bug Fixes
Fixed load model initialized error in model storage service.
Fixed bad link in pypi portal.
Version 2.9.210709
Bug Fixes
Fixed missing WeaklyConnectedComponents in hana_ml.graph.algorithms.
Fixed missing statistics in hana_ml.graph.Graph.describe.
Fixed a bug, where the Graph object creation and discover_graph_workspace() and Graph.describe() do not work on an on-premise system
Version 2.9.210630
Bug Fixes
Fixed accuracy_measure issue in
SingleExponentialSmoothing
,DoubleExponentialSmoothing
,TripleExponentialSmoothing
, andAutoExponentialSmoothing
.Fixed empty input table error in
Croston
.Fixed class_map error for multiclass logisticregreesion in
UnifiedClassification
.
Version 2.9.210619
Enhancements
Constants for directions used in graph functions can be found in hana_ml.graph.constants.DIRECTION_*
Following functions and objects are now available in hana_ml.graph for import
Graph object
create_graph_from_dataframes and create_graph_from_hana_dataframes factory methods
discover_graph_workspaces
discover_graph_workspace
The geometries do not need to be to be specified when creating a DataFrame instance anymore. The geometries are analyzed automatically.
Support list of targets and trans_param in feature_tool.
Enhanced unified report for UnifiedRegression to view feature importance.
Enhanced
join()
to support list of DataFrame.Enhanced
union()
to support list of DataFrame.Streamlined the
create_dataframe_from_pandas()
geo parameters. Now there is only one list of geo_cols, which supports column references as well as (lon, lat) tuples, and one SRID parameter for all columnsWhen you call
create_dataframe_from_pandas`()
and pass a GeoPandas DataFrame, the geometry column will be detected automatically and processed as a geometry. You don't need to add it manually to geo_colsThe Graph constructor is simplified. You can instantiate a graph simply by the workspace name.
Enhanced ModelStorage for APL to support HANA Data Lake.
New Functions
Introduced
hana_ml.graph.algorithms
which contains all graph algorithms in the future. The package provides a AlgorithmBase class which can be used to build additional algorithms for a graph.Add hana_ml.graph.algorithms.ShortestPath, which replaces Graph.shortest_path
Add hana_ml.graph.algorithms.Neighbors, which replaces Graph.neighbors
Add hana_ml.graph.algorithms.NeighborsSubgraph, which replaces Graph.neighbors_with_edges
Add hana_ml.graph.algorithms.KShortestPaths
Add hana_ml.graph.algorithms.ShortestPathsOneToAll
Add hana_ml.graph.discovery.discover_graph_workspace, which reads the metadata of a graph
Add hana_ml.graph.create_graph_from_edges_dataframe
Add hana_ml.graph.Graph.has_vertices, to check if a list of vertices exist in a graph
Add hana_ml.graph.Graph.subgraph, to create a vertices or edges induced subgraph
Add hana_ml.graph.Graph.describe, to get some statistics
Add hana_ml.graph.Graph.degree_distribution
Add hana_ml.DataFrame.srids, which returns the SRS of each geometry column
Add hana_ml.DataFrame.geometries, which returns the geometry columns if there are any
Add hana_ml.spatial package, that contains
create_predefined_srs
is_srs_created
get_created_srses
Add hana_ml.docstore package, that contains
create_collection_from_elements
Added
BCPD
for Bayesian change point detection.Added
sort_values()
,sort_index()
inDataFrame
.Added scheduler for model renew in
ModelStorage
.Added
min()
,max()
,mean()
,median()
,sum()
,value_counts()
inDataFrame
.Added SHAP support for
UnifiedClassification
.Added data lake support in
model_storage
.Added data lake support in
dataframe
functions.Added line plot for time series forecast.
Added
split_column()
method inDataFrame
.Added
concat_columns()
method inDataFrame
.Added
outlier_detection_kmeans()
, which detects outliers in datasets based on the result of k-means clustering.Added intermittent_forecast() for forecasting intermittent demand data(time-series).
Added
OnlineLinearRegression
which is an online version of the Linear Regression.
API Changes
Removed geo_cols from dataframe.create_dataframe_from_shapefile
Removed geo_cols from ConnectionContext.sql()
Removed geo_cols from ConnectionContext.table()
Removed Graph.neighbors and Graph.neighbors_with_edges
Removed Graph.shortest_path
Removed hana_ml.graph.Path. This is not used anymore
Removed hana_ml.graph.create_hana_graph_from_existing_workspace. This is replaced by a simplified Graph object constructor.
Renamed hana_ml.graph.create_hana_graph_from_vertex_and_edge_frames to create_graph_from_dataframes
Changed the type of geo_cols in create_dataframe_from_pandas to list, which supports direct column references or (lon, lat) tuples for generating POINT geometries
Bug Fixes
Fixed inflexible default locations of selected columns of input data, e.g. key, features and endog.
Fixed model report's feature importance when it has 0 importance.
Version 2.8.210421
Version 2.8.210421 supports SAP HANA SPS05 and SAP HANA Cloud
Bug Fixes
Fixed model report's feature importance when it has 0 importance.
Fixed
pivot_table()
with multiple index issue.Fixed the shap display for categorical columns.
Version 2.8.210321
Version 2.8.210321 supports SAP HANA SPS05 and SAP HANA Cloud
Enhancements
Enhanced
sql()
to enable multiline execution.Enhanced
save()
to add append option.Enhanced
diff()
to enable negative input.Enhanced model report functionality of
UnifiedClassification
with added model and data visualization.Enhanced dataset report module with a optimized process of report generation and better user experience.
Enhanced
UnifiedClustering()
to support parameterdistance_level
in AgglomerateHierarchicalClustering and DBSCAN functions. Please refer to documentation for details.Enhanced model storage to support unified report.
New Functions
Added
generate_html_report()
andgenerate_notebook_iframe_report()
functions forUnifiedRegression
which could display the output, e.g. statistic and model.APL Gradient Boosting: the other_params parameter is now supported.
APL all models: a new method, get_model_info, is created, allowing users to retrieve the summary and the performance metrics of a saved model.
APL all models: users can now specify the weight of explanatory variables via the weight parameter.
Added
LSTM
.Added Text Mining functions support for both SAP HANA on-premise and cloud version.
Added
UnifiedReport
.
New Dependency
Added new dependency 'htmlmin' for generating dataset and model report.
API Changes
Added parameters
use_fast_library
anduse_float
toKMeans
.Added parameter
build_report
toUnifiedRegression
.Added parameter
distance_level
inUnifiedClustering
whenfunc
is AgglomerateHierarchicalClustering and DBSCAN. Please refer to documentation for details.Renamed
batch_size
bychunk_size
increate_dataframe_from_pandas()
.Added initialization parameters
random_state
andrandom_initialization
forOnlineARIMA
, and itspartial_fit()
method now supports two additional parameters --learning_rate
andepsilon
for updating the values in the input model.
Bug Fixes
Fixed model storage support issue for
OnlineARIMA
.Fixed inflexible default locations of selected columns of input data, e.g. key, features and endog.
Fixed
accuracy_measure
issue inAutoExponentialSmoothing
.
Version 2.6.210126
Version 2.6.210126 supports SAP HANA SPS05 and SAP HANA Cloud
Bug Fixes
Fixed uuid issue for Python 3.8.
Fixed wrong legend for the model report of
UnifiedClassification
.Fixed dataset report to handle the dataset with missing value.
Version 2.6.210113
Version 2.6.210113 supports SAP HANA SPS05 and SAP HANA Cloud
Bug Fixes
Fixed load_model issue for KMeans clustering.
Removed pypi installation of Shapely for windows user.
Fixed duplicate rows bug in
save()
function.Fixed loading issue in model report.
Replaced the option
batch_size
withchunk_size
increate_dataframe_from_pandas()
.
Version 2.6.201209
Version 2.6.201209 supports SAP HANA SPS05 and SAP HANA Cloud
Bug Fixes
Remove shap from installation.
Fixed bugs in
ConnectionContext
whenautocommit=False
.Fixed font properties bugs in
hana_ml.visualizers.eda
functions.APL Documentation: other_train_apl_aliases is now documented.
APL Gradient Boosting Classification: the target variable won't be displayed in prediction if it is not given in input.
APL Gradient Boosting: the default parameter values are now set in the APL backend level. They won't be set in the Python API level.
Fixed handling of geometry columns in the context of
collect()
calls.Fixed shapely not being a required dependency.
Fixed the displacement of parameter
dispersion
inCPD
.
Version 2.6.201106
Version 2.6.201116 supports SAP HANA SPS05 and SAP HANA Cloud
Enhancements
Enhanced the performance of
collect()
method for large datasets.Enhanced the performance of
create_dataframe_from_pandas()
for large datasets.
New Functions
Bug Fixes
Fixed incompatibility issue with matplotlib>=3.3.0.
Version 2.6.201016(2.6.200928)
Version 2.6.201016 supports SAP HANA SPS05 and SAP HANA Cloud
API Changes
HybridGradientBoostingClassifier
andHybridGradientBoostingRegressor
: added a parameteradopt_prior
to indicate whether to adopt the prior distribution of the target as the initial point.Added parameters
compression
,max_bits
,max_quantization_iter
for the following SVM classes:RDTClassifier
: added parameterscompression
,max_bits
,quantize_rate
for model compression.RDTRegressor
: added parameterscompression
,max_bits
,quantize_rate
,fittings_quantization
for model compression.In predict() method function
ARIMA
andAutoARIMA
, a new value 'truncation_algorithm' of parameterforecast_method
is introduced to improve the prediction performance.New intialization parameters
string_variable
,variable_weight
are added toKNNClassifier
,KNNRegressor
andDBSCAN
to enable distance calculation based on String distance.New parameters
extrapolation
,smooth_width
,auxiliary_normalitytest
are added toseasonal_decompose()
function.
New functions
Added dataset manager.
Added graph and spatial modules.
Added dataset report.
Added clustering function:
SlightSilhouette()
.Added native storage support in model storage service and dataset manager.
Added
VectorARIMA
.Added
UnifiedRegression
.Added
UnifiedClustering
.
Bug Fixes
Fixed ROC curve display in model report with disordered points.
Fixed
load_model()
for UnifiedClassification in model storage service.Fixed model_selection for UnifiedClassification.
Version 2.5.200626
Version 2.5.200626 supports SAP HANA SPS05 and SAP HANA Cloud
API Changes
Removed parameter
ConnectionContext
in PAL functions.Updated parameter
algorithm
from mandatory to optional inDecisionTreeClassifier
andDecisionTreeRegressor
, with default value 'cart'.Added parameter
decompose_type
inseasonal_decompose()
.Added parameter
save_alignment
and a new output statistic table infast_dtw()
.Added parameter
table_structure
increate_dataframe_from_pandas()
.Added parameters
resampling_method
andparam_search_strategy
forHybridGradientBoostingClassifier
andHybridGradientBoostingRegressor
.
New Functions
Added functions
melt()
andread_pickle()
.Added functionalities for
UnifiedClassication
. Especially,generate_html_report()
andgenerate_notebook_iframe_report()
are provided to visualize the output, e.g. confusion matrix and ROC curve.Added
mcmc()
.Added model selection services as follows:
Added visualizers for model debriefing.
Enhancements
Enhanced smart sampling for visualizers.
Enhanced import function to SAP HANA.
Enhanced bytes, TIMESTAMP and BIGINT support in
create_dataframe_from_pandas()
.Enhanced TIMESTAMP and DATE support in
describe()
.Predictions made with APL gradient boosting can now be complemented with the reasons that led to these predictions: number of top or bottom explanatory variables, strength values, etc.
Supported more data types, SMALLINT, DECIMAL, TINYINT, BIGINT, CLOB and BLOB in
dtypes()
,generate_table_type()
andis_numeric()
.Enhanced the missing value handling ability in the groupby column by creating a new class for missing values for the following
EDAVisualizer
functions:Predictions made with APL gradient boosting can now be complemented with the reasons that led to these predictions: number of top or bottom explanatory variables, strength values, etc.
APL gradient boosting can provide metrics about feature interactions strength.
The connection parameter is no longer required for APL model creation.
Bug Fixes
Fixed wrong ID issue in fit function by adding
key
initialization parameter inARIMA
andAutoARIMA
.Fixed CLOB type issue in
create_dataframe_from_pandas()
by adding parameterstable_structure
anddrop_exit_tab
.Fixed
pivot_table()
index naming bug.Fixed temporary view from temporary table issue in APL time series function by adding sort_data and get_horizon_wide_metric.
Fixed bugs in
create_dataframe_from_pandas()
if the table is temporary.Fixed bugs for data type of init centers in
GaussianMixture
.Fixed bugs when some data types, e.g. SMALLINT, DECIMAL or TINYINT are not supported in
dtypes()
,generate_table_type()
andis_numeric()
.Fixed bugs when data types, e.g. DATE and TIMESTAMP, are not supported in
describe()
.Fixed the table overwrite bug in
save()
if the table name is duplicate.Fixed missing quotation mark in column name bugs in EDA.
Users can set 'Cutting Strategy' in APL Gradient Boosting.
APL models are saved correctly.
Deprecated Functions
GradientBoostingClassifier.
GradientBoostingRegressor.
Version 1.0.8
Version 1.0.8 supports SAP HANA SP04 (100% coverage for SAP HANA SPS04 PAL algorithms)
New Functions
Preprocessing:
MDS
(Multidimensional Scaling),SMOTE
(Synthetic Minority Over-Sampling Technique, only supported in SAP HANA SPS05),Sampling
,variance_test()
.Statistics:
condition_index()
,cdf()
(Cumulative Distribution Function),distribution_fit()
,quantile()
(Distribution Quantile),entropy()
,ftest_equal_var()
(Equal Variance Test),factor_analysis()
,grubbs_test()
,kaplan_meier_survival_analysis()
,KDE
(Kernel Density),median_test_1samp()
(One-Sample Median Test),wilcoxon()
(Wilcox Signed Rank Test).Time-Series Analysis:
LR_seasonal_adjust
(Linear Regression with Damped Trend and Seasonal Adjust),AdditiveModelForecast
,Hierarchical_Forecast
,correlation()
(Correlation Function), online algorithms andfast_dtw()
(fast dynamic-time-warping).Miscellaneous:
abc_analysis()
,TSNE
(T-distributed Stochastic Neighbour Embedding),weighted_score_table()
.
Added functions in dataframe.py: data_manipulation().
Added cross-validation options to SAP HANA PAL functions.
Added visualizers (EDA profiler).
Added model storage services.