The hana_ml.algorithms.pal package is consisted of many algorithms, when grouping by categories, these algorithms could be listed as follows:

This module contains supported PAL algorithms.

PAL Base


Subclass for PAL-specific functionality.

Auto ML

auto_ml.AutomaticClassification([scorings, ...])

AutomaticClassification offers an intelligent search amongst machine learning pipelines for supervised classification tasks.

auto_ml.AutomaticRegression([scorings, ...])

AutomaticRegression offers an intelligent search amongst machine learning pipelines for supervised regression tasks.

auto_ml.AutomaticTimeSeries([scorings, ...])

AutomaticTimeSeries offers an intelligent search amongst machine learning pipelines for time series tasks.

auto_ml.Preprocessing(name, **kwargs)

Preprocessing class.

Unified Interface


The Python wrapper for SAP HANA PAL unified-classification function.

unified_regression.UnifiedRegression(func[, ...])

The Python wrapper for SAP HANA PAL unified-regression function.

unified_clustering.UnifiedClustering(func[, ...])

The Python wrapper for SAP HANA PAL Unified Clustering function.


The Python wrapper for SAP HANA PAL Unified Exponential Smoothing function.


clustering.AffinityPropagation(affinity, ...)

Affinity Propagation is an algorithm that identifies exemplars among data points and forms clusters of data points around these exemplars.


Agglomerate Hierarchical Clustering is a widely used clustering method which can find natural groups within a set of data.

clustering.DBSCAN([minpts, eps, ...])

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based data clustering algorithm that finds a number of clusters starting from the estimated density distribution of corresponding nodes.

clustering.GeometryDBSCAN([minpts, eps, ...])

GeometryDBSCAN is a geometry version of DBSCAN, which only accepts geometry points as input data.

clustering.KMeans([n_clusters, ...])

K-means is one of the simplest and most commonly used unsupervised machine learning algorithms for partitioning a dataset into K distinct, non-overlapping clusters based on the distances between the center of the cluster (centroid) and the data points.

clustering.KMedians(n_clusters[, init, ...])

K-Medians clustering algorithm that partitions n observations into K clusters according to their nearest cluster center.

clustering.KMedoids(n_clusters[, init, ...])

K-Medoids clustering algorithm that partitions n observations into K clusters according to their nearest cluster center.

clustering.SpectralClustering(n_clusters[, ...])

Spectral clustering is an algorithm evolved from graph theory, and has been widely used in clustering.

clustering.KMeansOutlier([n_clusters, ...])

Outlier detection based on k-means clustering.

mixture.GaussianMixture(init_param[, ...])

Gaussian Mixture Model (GMM) is a probabilistic model used for modeling data points that are assumed to be generated from a mixture of Gaussian distributions.

som.SOM([convergence_criterion, ...])

Self-organizing feature maps (SOMs) are one of the most popular neural network methods for cluster analysis.

clustering.SlightSilhouette(data[, ...])

Silhouette refers to a method used to validate the cluster of data which provides a succinct graphical representation of how well each object lies within its cluster.

clustering.outlier_detection_kmeans(data[, ...])

Outlier detection based on k-means clustering.



Linear Discriminant Analysis is a supervised learning technique used for classification problems.


Logistic regression models the relationship between a dichotomous dependent variable (also known as explained variable) and one or more continuous or categorical independent variables (also known as explanatory variables).


This algorithm is the online version of Multi-Class Logistic Regression, while the Multi-Class Logistic Regression is offline/batch version.

naive_bayes.NaiveBayes([alpha, ...])

Naive Bayes is a classification algorithm based on Bayes theorem.

neighbors.KNNClassifier([n_neighbors, ...])

K-Nearest Neighbor (KNN) is a memory-based classification or regression method with no explicit training phase.

neural_network.MLPClassifier([activation, ...])

Multi-layer perceptron (MLP) Classifier.

svm.SVC([c, kernel, degree, gamma, ...])

Support Vector Machines (SVMs) refer to a family of supervised learning models using the concept of support vector.

svm.OneClassSVM([c, kernel, degree, gamma, ...])

Support Vector Machines (SVMs) refer to a family of supervised learning models using the concept of support vector.

trees.DecisionTreeClassifier([algorithm, ...])

A decision tree is used as a classifier for determining an appropriate action or decision among a predetermined set of actions for a given case.

trees.RDTClassifier([n_estimators, ...])

The random decision trees algorithm is an ensemble learning method for classification and regression.


Hybrid Gradient Boosting trees model for classification.


linear_model.LinearRegression([solver, ...])

Linear regression is an approach to model the linear relationship between a variable, usually referred to as dependent variable, and one or more variables, usually referred to as independent variables, denoted as predictor vector.


Online linear regression (Stateless) is an online version of the linear regression and is used when the training data are obtained multiple rounds.

neighbors.KNNRegressor([n_neighbors, ...])

K-Nearest Neighbor (KNN) is a memory-based classification or regression method with no explicit training phase.

neural_network.MLPRegressor([activation, ...])

Multi-layer perceptron (MLP) Regressor.

regression.PolynomialRegression([degree, ...])

Polynomial regression is an approach to model the relationship between a scalar variable y and a variable denoted X.

regression.GLM([family, link, solver, ...])

Generalised linear models (GLM) is used to regress responses satisfying exponential distributions, for example, Normal, Poisson, Binomial, Gamma, inverse Gaussian (IG), and negative binomial (NB).


Exponential regression is an approach to modeling the relationship between a scalar variable y and one or more variables denoted X.


Geometric regression is an approach used to model the relationship between a scalar variable y and a variable denoted X.


Bi-variate natural logarithmic regression is an approach to modeling the relationship between a scalar variable y and one variable denoted X.


Cox proportional hazard model (CoxPHM) is a special generalized linear model.

svm.SVR([c, kernel, degree, gamma, ...])

Support Vector Machines (SVMs) refer to a family of supervised learning models using the concept of support vector.

trees.DecisionTreeRegressor([algorithm, ...])

DecisionTreeRegressor is a decision tree-based machine learning model used for regression tasks, which predicts continuous output values by learning simple decision rules inferred from the data features.

trees.RDTRegressor([n_estimators, ...])

The random decision trees algorithm is an ensemble learning method for classification and regression.


Hybrid Gradient Boosting model for regression.


preprocessing.FeatureNormalizer([method, ...])

Normalize a DataFrame.

preprocessing.FeatureSelection(fs_method[, ...])

Feature selection(FS) is a dimensionality reduction technique, which selects a subset of relevant features for model construction, thus reducing the memory storage and improving computational efficiency while avoiding significant loss of information.


Isolation Forest generates anomaly score of each sample.

preprocessing.KBinsDiscretizer(strategy, ...)

Bin continuous data into number of intervals and perform local smoothing.

preprocessing.Imputer([strategy, ...])

Missing value imputation for DataFrame.

preprocessing.Discretize(strategy[, n_bins, ...])

It is an enhanced version of binning function which can be applied to table with multiple columns.

preprocessing.MDS(matrix_type[, ...])

This class serves as a tool for dimensional reduction or data visualization.

preprocessing.SMOTE([smote_amount, ...])

This class is to handle imbalanced dataset.

preprocessing.SMOTETomek([smote_amount, ...])

This class combines over-sampling using SMOTE and cleaning(under-sampling) using Tomek links.

preprocessing.TomekLinks([distance_level, ...])

This class is for performing under-sampling by removing Tomek's links.

preprocessing.Sampling(method[, interval, ...])

This class is used to choose a small portion of the records as representatives.

preprocessing.ImputeTS([imputation_type, ...])

Imputation of multi-dimensional time-series data.

preprocessing.PowerTransform([method, ...])

This class implements a python interface for the power transform algorithm in PAL.


Python wrapper for PAL Quantile Transformer.

decomposition.PCA([scaling, thread_ratio, ...])

Principal component analysis (PCA) aims at reducing the dimensionality of multivariate data while accounting for as much of the variation in the original dataset as possible.

decomposition.CATPCA([scaling, ...])

Principal components analysis algorithm that supports categorical features.

partition.train_test_val_split(data[, ...])

The algorithm partitions an input dataset randomly into three disjoint subsets called training, testing and validation.

preprocessing.variance_test(data, sigma_num)

Variance Test is a method to identify the outliers of n number of numeric data {xi} where 0 < i < n+1, using the mean and the standard deviation of n number of numeric data.

Time Series


Additive Model Time Series Analysis (AMTSA) uses an additive model to forecast time series data.

tsa.arima.ARIMA([order, seasonal_order, ...])

ARIMA, which stands for Autoregressive Integrated Moving Average, is a commonly used statistical method for forecasting and predicting time series data.

tsa.auto_arima.AutoARIMA([seasonal_period, ...])

The ARIMA model, a potent tool in time series analysis, can be challenging due to the difficulty in selecting suitable parameters.

tsa.changepoint.CPD([cost, penalty, solver, ...])

Change-point detection (CPDetection) methods aim at detecting multiple abrupt changes such as change in mean, variance or distribution in an observed time-series data.

tsa.changepoint.BCPD(max_tcp, max_scp[, ...])

Bayesian Change-point detection (BCPD) detects abrupt changes in the time series.

tsa.changepoint.OnlineBCPD([alpha, beta, ...])

Online Bayesian Change-point detection.

tsa.bsts.BSTS([burn, niter, ...])

Bayesian structural time series (BSTS) model is for time series analysis including forecasting, decomposition and feature selection.


Time series classification.


Single exponential smoothing is suitable to model the time series without trend and seasonality.


Double exponential smoothing is suitable to model the time series with trend but without seasonality.


Triple exponential smoothing is used to handle the time series data containing a seasonal component.


Auto exponential smoothing is used to calculate optimal parameters of a set of smoothing functions including Single Exponential Smoothing, Double Exponential Smoothing, and Triple Exponential Smoothing.


Brown exponential smoothing is suitable to model the time series with trend but without seasonality.

tsa.exponential_smoothing.Croston([alpha, ...])

Croston method is a forecast strategy for products with intermittent demand.


Croston TSB method (for Teunter, Syntetos & Babai) is a forecast strategy for products with intermittent demand.

tsa.garch.GARCH([p, q, model_type])

Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) is a statistic model used to analysis variance of error (innovation or residual) term in time series.


Hierarchical forecast algorithm forecast across the hierarchy (that is, ensuring the forecasts sum appropriately across the levels).


Linear regression with damped trend and seasonal adjust is an approach for forecasting when a time series presents a trend.

tsa.lstm.LSTM([learning_rate, gru, ...])

Long short-term memory (LSTM) is one of the most famous modules of Recurrent Neural Networks(RNN).

tsa.ltsf.LTSF([batch_size, num_epochs, ...])

Long-term time series forecasting (LTSF) is a specialized approach within the realm of predictive analysis, focusing on making predictions for extended periods into the long future.

tsa.online_algorithms.OnlineARIMA([order, ...])

Online ARIMA implements an online learning method to estimate the parameters of ARIMA models by reformulating it into a full information online optimization task (without random noise terms), which has no limitations of depending on noise terms and accessing the entire large-scale dataset in advance.


Outlier detection for time-series.

tsa.rnn.GRUAttention([learning_rate, ...])

Gated Recurrent Units(GRU) based encoder-decoder model with Attention mechanism for time series prediction.

tsa.rocket.ROCKET([method, num_features, ...])

RandOm Convolutional KErnel Transform (ROCKET) is an exceptionally efficient algorithm for time series classification.

tsa.vector_arima.VectorARIMA([order, ...])

The vector autoregressive moving average models (VARMA) is a vector form of autoregressive integrated moving average (ARIMA) that can be used to examine the relationships among several variables in multivariate time series analysis, comparing to ARIMA which is used in univariate time series.

tsa.wavelet.DWT(wavelet[, boundary, level, ...])

A designed class for discrete wavelet transform and wavelet packet transform.


Evaluates the forecast accuracy using measures such as:

tsa.correlation_function.correlation(data[, ...])

This correlation function gives the statistical correlation between random variables.

tsa.fft.fft(data[, num_type, inverse, ...])

Fast Fourier Transform (FFT) decomposes a function of time (a signal) into the frequencies that make it up.

tsa.dtw.dtw(query_data, ref_data[, radius, ...])

DTW is an abbreviation for Dynamic Time Warping.

tsa.fast_dtw.fast_dtw(data, radius[, ...])

Dynamic time warping (DTW) calculates the distance or similarity between two time series.


Intermittent Time Series Forecast (ITSF) is a forecast strategy for products with intermittent demand.

tsa.hull_white.hull_white_simulate(data[, ...])

The Hull-White model, as implemented in PAL, is a single-factor interest rate model that plays a crucial role in financial mathematics and risk management.

tsa.periodogram.periodogram(data[, key, ...])

Periodogram is an estimate of the spectral density of a signal or time series.


Permutation importance for time series is an exogenous regressor evaluation method that measures the increase in the model score when randomly shuffling the exogenous regressor's values.


A stationarity test is a statistical test used in time series analysis to determine whether a given time series is stationary or non-stationary.


Seasonal_decompose function tests whether a time series has a seasonality or not.

tsa.trend_test.trend_test(data[, key, ...])

Trend test is a statistical method used in time series analysis to determine whether there is a consistent upward or downward movement over time, and calculate the de-trended time series.

tsa.wavelet.wavedec(data, wavelet[, key, ...])

Python wrapper for PAL multi-level discrete wavelet transform.

tsa.wavelet.waverec(dwt[, wavelet, boundary])

Python wrapper for PAL multi-level inverse discrete wavelet transform.

tsa.wavelet.wpdec(data, wavelet[, key, col, ...])

Python wrapper for PAL multi-level (discrete) wavelet packet transformation.

tsa.wavelet.wprec(dwt[, wavelet, boundary])

Python wrapper for PAL multi-level inverse discrete wavelet transform.


This algorithm is used to identify whether a time series is a white noise series.


random.bernoulli(conn_context[, p, ...])

Draw samples from a Bernoulli distribution.

random.beta(conn_context[, a, b, ...])

Draw samples from a Beta distribution.

random.binomial(conn_context[, n, p, ...])

Draw samples from a binomial distribution.

random.cauchy(conn_context[, location, ...])

Draw samples from a cauchy distribution.

random.chi_squared(conn_context[, dof, ...])

Draw samples from a chi_squared distribution.

random.exponential(conn_context[, lamb, ...])

Draw samples from an exponential distribution.

random.gumbel(conn_context[, location, ...])

Draw samples from a Gumbel distribution, which is one of a class of Generalized Extreme Value (GEV) distributions used in modeling extreme value problems.

random.f(conn_context[, dof1, dof2, ...])

Draw samples from an f distribution.

random.gamma(conn_context[, shape, scale, ...])

Draw samples from a gamma distribution.

random.geometric(conn_context[, p, ...])

Draw samples from a geometric distribution.

random.lognormal(conn_context[, mean, ...])

Draw samples from a lognormal distribution.

random.negative_binomial(conn_context[, n, ...])

Draw samples from a negative_binomial distribution.

random.normal(conn_context[, mean, sigma, ...])

Draw samples from a normal distribution.

random.pert(conn_context[, minimum, mode, ...])

Draw samples from a PERT distribution.

random.poisson(conn_context[, theta, ...])

Draw samples from a poisson distribution.

random.student_t(conn_context[, dof, ...])

Draw samples from a Student's t-distribution.

random.uniform(conn_context[, low, high, ...])

Draw samples from a uniform distribution.

random.weibull(conn_context[, shape, scale, ...])

Draw samples from a weibull distribution.

random.multinomial(conn_context, n, pvals[, ...])

Draw samples from a multinomial distribution.

random.mcmc(conn_context, distribution[, ...])

Given a distribution, this function generates samples of the distribution using Markov chain Monte Carlo simulation.

stats.chi_squared_goodness_of_fit(data, key)

Performs the chi-squared goodness-of fit test to tell whether or not an observed distribution differs from an expected chi-squared distribution.

stats.chi_squared_independence(data, key[, ...])

Performs the chi-squared test of independence to tell whether observations of two variables are independent from each other.

stats.ttest_1samp(data[, col, mu, ...])

Performs the t-test to determine whether a sample of observations could have been generated by a process with a specific mean.

stats.ttest_ind(data[, col1, col2, mu, ...])

Performs the T-test for the mean difference of two independent samples.

stats.ttest_paired(data[, col1, col2, mu, ...])

Performs the t-test for the mean difference of two sets of paired samples.

stats.f_oneway(data[, group, sample, ...])

Performs a 1-way ANOVA.

stats.f_oneway_repeated(data, subject_id[, ...])

Performs one-way repeated measures analysis of variance, along with Mauchly's Test of Sphericity and post hoc multiple comparison tests.

stats.univariate_analysis(data[, key, cols, ...])

Provides an overview of the dataset.

stats.covariance_matrix(data[, cols])

Computes the covariance matrix.

stats.pearsonr_matrix(data[, cols])

Computes a correlation matrix using Pearson's correlation coefficient.

stats.iqr(data, key[, col, multiplier])

Performs the inter-quartile range (IQR) test to find the outliers of the data.

stats.wilcoxon(data[, col, mu, test_type, ...])

Performs a one-sample or paired two-sample non-parametric test to check whether the median of the data is different from a specific value.

stats.median_test_1samp(data[, col, mu, ...])

Performs one-sample non-parametric test to check whether the median of the data is different from a user specified one.

stats.grubbs_test(data, key[, col, method, ...])

Performs grubbs' test to detect outliers from a given univariate dataset.

stats.entropy(data[, col, ...])

Calculates the information entropy of attributes.

stats.condition_index(data[, key, col, ...])

Detects collinearity problem between independent variables which are later used as predictors in a multiple linear regression model.

stats.cdf(data, distr_info[, col, complementary])

Evaluates the probability of a variable x from the cumulative distribution function (CDF) or complementary cumulative distribution function (CCDF) for a given probability distribution.

stats.ftest_equal_var(data_x, data_y[, ...])

Tests the equality of two random variances using F-test.

stats.factor_analysis(data, key, factor_num)

Factor analysis is a statistical method that tries to extract a low number of unobserved variables, i.e. factors, that can best describe the covariance pattern of a larger set of observed variables.

stats.kaplan_meier_survival_analysis(data[, ...])

The Kaplan-Meier estimator is a non-parametric statistic used to estimate the survival function from lifetime data.

stats.quantile(data, distr_info[, col, ...])

Evaluates the inverse of the cumulative distribution function (CDF) or the inverse of the complementary cumulative distribution function (CCDF) for a given probability p and probability distribution.

stats.distribution_fit(data, distr_type[, ...])

Aims to fit a probability distribution for a variable according to a series of measurements to the variable.

stats.ks_test(data[, distribution_name, ...])

Performs one-sample or two-sample Kolmogorov-Smirnov test for goodness of fit.

stats.interval_quality(data, significance_level)

Provides a method to evaluate the quality of interval forecasts, which defined as:

stats.benford_analysis(data[, key, ...])

Benford analysis is a data mining tool based on the Benford's law (Frank Benford, 1938).

kernel_density.KDE([thread_ratio, ...])

Perform Kernel Density to analogue with histograms whereas getting rid of its defects.


association.Apriori(min_support, min_confidence)

Apriori is a classic algorithm used in machine learning for mining frequent itemsets and relevant association rules.

association.AprioriLite(min_support, ...[, ...])

This function runs a lightweight version of the Apriori algorithm for association rule mining.

association.FPGrowth([min_support, ...])

The Frequent Pattern Growth (FP-Growth) algorithm is a technique used for finding frequent patterns in a transaction dataset without generating a candidate itemset.

association.KORD([k, measure, min_support, ...])

The K-Optimal Rule Discovery (KORD) algorithm is a machine learning tool used for generating top-K association rules based on a user-defined measure.

association.SPM(min_support[, relational, ...])

The Sequential Pattern Mining (SPM) algorithm is a method in data mining developed to determine frequent patterns that occur in sequential data.

Recommender System

recommender.ALS([random_state, max_iter, ...])

Alternating least squares (ALS) is a powerful matrix factorization algorithm for building both explicit and implicit feedback based recommender systems.

recommender.FRM([solver, factor_num, init, ...])

Factorized Polynomial Regression Models or Factorization Machines approach.

recommender.FFMClassifier([ordering, ...])

Field-Aware Factorization Machine with the task of classification.

recommender.FFMRegressor([ordering, ...])

Field-Aware Factorization Machine with the task of Regression.

recommender.FFMRanker([ordering, normalise, ...])

Field-Aware Factorization Machine with the task of ranking using ordinal regression.

recommender.MLPRecommender([batch_size, ...])

The python interface for an MLP-based recommender system method in PAL.

Social Network Analysis

linkpred.LinkPrediction(method[, beta, ...])

Link predictor for calculating, in a network, proximity scores between nodes that are not directly linked, which is helpful for predicting missing links(the higher the proximity score is, the more likely the two nodes are to be linked).

pagerank.PageRank([damping, max_iter, tol, ...])

A page rank model.


svm.SVRanking([c, kernel, degree, gamma, ...])

Support Vector Machines (SVMs) refer to a family of supervised learning models using the concept of support vector.


abc_analysis.abc_analysis(data[, key, ...])

ABC analysis is used to classify objects (such as customers, employees, or products) based on a particular measure (such as revenue or profit).

wst.weighted_score_table(data, maps, ...[, ...])

A weighted score table is a method of evaluating alternatives when the importance of each criterion differs.

tsne.TSNE([n_iter, learning_rate, ...])

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a non-linear dimensionality reduction technique that is particularly well-suited for visualizing high-dimensional datasets by reducing them to lower dimensions (typically 2D or 3D) for effective visualization.


FairMLClassification aims at mitigating unfairness of prediction model due to some possible "bias" within dataset regarding features such as sex, race, age etc.

fair_ml.FairMLRegression(fair_bound[, ...])

FairMLRegression aims at mitigating unfairness of prediction model due to some possible "bias" within dataset regarding features such as sex, race, age etc.


metrics.accuracy_score(data, label_true, ...)

Compute mean accuracy score for classification results.

metrics.auc(data[, positive_label, ...])

Computes area under curve (AUC) to evaluate the performance of binary-class classification algorithms.

metrics.confusion_matrix(data, key[, ...])

Computes confusion matrix to evaluate the accuracy of a classification.

metrics.multiclass_auc(data_original, ...)

Computes area under curve (AUC) to evaluate the performance of multi-class classification algorithms.

metrics.r2_score(data, label_true, label_pred)

Computes coefficient of determination for regression results.


Computes debriefing coefficients for binary classification results.

Model and Pipeline

model_selection.ParamSearchCV(estimator, ...)

Exhaustive or random search over specified parameter values for an estimator with crossover validation (CV).

model_selection.GridSearchCV(estimator, ...)

Exhaustive search over specified parameter values for an estimator with crossover validation (CV).

model_selection.RandomSearchCV(estimator, ...)

Random search over specified parameter values for an estimator with crossover validation (CV).


Pipeline construction to run transformers and estimators sequentially.

Text Processing

crf.CRF([lamb, epsilon, max_iter, lbfgs_m, ...])

Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting structured data, such as sequences.


Latent Dirichlet allocation (LDA) is a generative model in which each item (word) of a collection (document) is generated from a finite mixture over several latent groups (topics).

For other text processing methods like text mining, please see text mining module for more details.