covariance_matrix

hana_ml.algorithms.pal.stats.covariance_matrix(data, cols=None)

Computes the covariance matrix.

Parameters:

dataDataFrame

DataFrame containing the data.

colslist of str, optional

List of column names to analyze.

If not provided, it defaults to all columns.

Returns:

DataFrame

Covariance between any two data samples (columns).

ID, type NVARCHAR. The values of this column are the column names from cols.

Covariance columns, type DOUBLE, named after the columns in cols. The covariance between variables X and Y is in column X, in the row with ID value Y.

Examples

Dataset to be analyzed:

>>> df.collect()
    X     Y
 1   2.4
 5   3.5
 3   8.9
10  -1.4
-4  -3.5
11  32.8

Compute the covariance matrix:

>>> result = covariance_matrix(data=df)

Output:

>>> result.collect()
  ID          X           Y
0  X  31.866667   44.473333
1  Y  44.473333  176.677667