pearsonr_matrix
- hana_ml.algorithms.pal.stats.pearsonr_matrix(data, cols=None)
Computes a correlation matrix using Pearson's correlation coefficient.
- Parameters:
- dataDataFrame
DataFrame containing the data.
- colslist of str, optional
List of column names to analyze.
If not provided, it defaults to all columns.
- Returns:
- DataFrame
Pearson's correlation coefficient between any two data samples (columns).
ID, type NVARCHAR. The values of this column are the column names from
cols
.Correlation coefficient columns, type DOUBLE, named after the columns in
cols
. The correlation coefficient between variables X and Y is in column X, in the row with ID value Y.
Examples
Dataset to be analyzed:
>>> df.collect() X Y 0 1 2.4 1 5 3.5 2 3 8.9 3 10 -1.4 4 -4 -3.5 5 11 32.8
Compute the Pearson's correlation coefficient matrix:
>>> result = pearsonr_matrix(data=df) >>> result.collect() ID X Y 0 X 1 0.592707653621 1 Y 0.592707653621 1