covariance_matrix

hana_ml.algorithms.pal.stats.covariance_matrix(data, cols=None)

Computes the covariance matrix.

Parameters:
dataDataFrame

DataFrame containing the data.

colslist of str, optional

List of column names to analyze.

If not provided, it defaults to all columns.

Returns:
DataFrame

Covariance between any two data samples (columns).

  • ID, type NVARCHAR. The values of this column are the column names from cols.

  • Covariance columns, type DOUBLE, named after the columns in cols. The covariance between variables X and Y is in column X, in the row with ID value Y.

Examples

Dataset to be analyzed:

>>> df.collect()
    X     Y
0   1   2.4
1   5   3.5
2   3   8.9
3  10  -1.4
4  -4  -3.5
5  11  32.8

Compute the covariance matrix:

>>> result = covariance_matrix(data=df)

Outputs:

>>> result.collect()
  ID          X           Y
0  X  31.866667   44.473333
1  Y  44.473333  176.677667