covariance_matrix
- hana_ml.algorithms.pal.stats.covariance_matrix(data, cols=None)
Computes the covariance matrix.
- Parameters:
- dataDataFrame
DataFrame containing the data.
- colslist of str, optional
List of column names to analyze.
If not provided, it defaults to all columns.
- Returns:
- DataFrame
Covariance between any two data samples (columns).
ID, type NVARCHAR. The values of this column are the column names from
cols
.Covariance columns, type DOUBLE, named after the columns in
cols
. The covariance between variables X and Y is in column X, in the row with ID value Y.
Examples
Dataset to be analyzed:
>>> df.collect() X Y 0 1 2.4 1 5 3.5 2 3 8.9 3 10 -1.4 4 -4 -3.5 5 11 32.8
Compute the covariance matrix:
>>> result = covariance_matrix(data=df)
Output:
>>> result.collect() ID X Y 0 X 31.866667 44.473333 1 Y 44.473333 176.677667