LinearDiscriminantAnalysis
- class hana_ml.algorithms.pal.discriminant_analysis.LinearDiscriminantAnalysis(regularization_type=None, regularization_amount=None, projection=None)
Linear Discriminant Analysis is a supervised learning technique used for classification problems. It is particularly useful when the classes are well-separated and the dataset features follow a Gaussian distribution. LDA works by projecting high-dimensional data onto a lower-dimensional space where class separation is maximized. The goal is to find a linear combination of features that best separates the classes. This makes LDA a dimensionality reduction technique as well, similar to Principal Component Analysis (PCA), but with the distinction that LDA takes class labels into account.
- Parameters:
- regularization_type{'mixing', 'diag', 'pseudo'}, optional
The strategy for handling ill-conditioning or rank-deficiency of the empirical covariance matrix.
Defaults to 'mixing'.
- regularization_amountfloat, optional
The convex mixing weight assigned to the diagonal matrix obtained from diagonal of the empirical covariance matrix. Valid range for this parameter is [0,1]. Valid only when
regularization_type
is 'mixing'.Defaults to the smallest number in [0,1] that makes the regularized empirical covariance matrix invertible.
- projectionbool, optional
Whether or not to compute the projection model.
Defaults to True.
Examples
>>> lda = LinearDiscriminantAnalysis(regularization_type='mixing', projection=True)
Perform fit():
>>> lda.fit(data=df, features=['X1', 'X2'], label='CLASS') >>> lda.coef_.collect() >>> lda.proj_model_.collect()
Perform predict():
>>> res = lda.predict(data=df_pred, key='ID', features=['X1', 'X2'], verbose=False) >>> res.collect()
Perform project():
>>> res_proj = lda.project(data=df_proj, key='ID', features=['X1','X2'], proj_dim=2) >>> res_proj.collect()
- Attributes:
- basic_info_DataFrame
Basic information of the training data for linear discriminant analysis.
- priors_DataFrame
The empirical priors for each class in the training data.
- coef_DataFrame
Coefficients (inclusive of intercepts) of each class' linear score function for the training data.
- proj_infoDataFrame
Projection related info, such as standard deviations of the discriminants, variance proportion to the total variance explained by each discriminant, etc.
- proj_modelDataFrame
The projection matrix and overall means for features.
Methods
fit
(data[, key, features, label])Fit the model to the given dataset.
predict
(data[, key, features, verbose, ...])Predict class labels using fitted linear discriminators.
project
(data[, key, features, proj_dim])Project data into lower dimensional spaces using the fitted LDA projection model.
- fit(data, key=None, features=None, label=None)
Fit the model to the given dataset.
- Parameters:
- dataDataFrame
Training data.
- keystr, optional
Name of the ID column. If not provided, then:
if
data
is indexed by a single column, thenkey
defaults to that index columnotherwise, it is assumed that
data
contains no ID column
- featuresa list of str, optional
Names of the feature columns.
If not provided, its defaults to all non-ID, non-label columns.
- labelstr, optional
Name of the class label.
if not provided, it defaults to the last non-ID column.
- Returns:
- A fitted object of class "LinearDiscriminantAnalysis".
- predict(data, key=None, features=None, verbose=None, verbose_top_n=None)
Predict class labels using fitted linear discriminators.
- Parameters:
- dataDataFrame
Data for predicting the class labels.
- keystr, optional
Name of the ID column. Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featuresa list of str, optional
Name of the feature columns.
If not provided, defaults to all non-ID columns.
- verbosebool, optional
Whether or not outputs scores of all classes. If False, only score of the predicted class will be outputted.
Defaults to False.
- verbose_top_nbool, optional
Specifies the number of top n classes to present after sorting with confidences. It cannot exceed the number of classes in label of the training data, and it can be 0, which means to output the confidences of all classes. Effective only when
verbose
is set as True.Defaults to 0.
- Returns:
- DataFrame
Predicted class labels and the corresponding scores.
- project(data, key=None, features=None, proj_dim=None)
Project data into lower dimensional spaces using the fitted LDA projection model.
- Parameters:
- dataDataFrame
Data for linear discriminant projection.
- keystr, optional
Name of the ID column. Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featuresa list of str, optional
Name of the feature columns.
If not provided, defaults to all non-ID columns.
- proj_dimint, optional
Dimension of the projected space, equivalent to the number of discriminant used for projection.
Defaults to the number of obtained discriminants.
- Returns:
- DataFrame
Projected data, structured as follows:
1st column: ID, with the same name and data type as
data
for projection.other columns with name DISCRIMINANT_i, where i iterates from 1 to the number of elements in
features
, data type DOUBLE.
Inherited Methods from PALBase
Besides those methods mentioned above, the LinearDiscriminantAnalysis class also inherits methods from PALBase class, please refer to PAL Base for more details.