factor_analysis

hana_ml.algorithms.pal.stats.factor_analysis(data, key, factor_num, col=None, method=None, rotation=None, score=None, matrix=None, kappa=None)

Factor analysis is a statistical method that tries to extract a low number of unobserved variables, i.e. factors, that can best describe the covariance pattern of a larger set of observed variables.

Parameters:

DataFrame containing the data.

keystr

Name of the ID column.

factor_numint

Number of factors.

colstr or a list of str, optional

Name of the feature columns.

method{'principle_component'}, optional

Specifies the method used for factor analysis.

Currently SAP HANA PAL only supports the principal component method.

rotation{'non', 'varimax', 'promax'}, optional

Default to 'varimax'.

score{'non', 'regression'}, optional

Specifies the method to compute factor scores.

Default to 'regression'.

matrix{'covariance', 'correlation'}, optional

Uses cor matrix to perform factor analysis.

Default to 'correlation'.

kappafloat, optional

Power of promax rotation. (Only valid when rotation is promax.)

Default to 4.

Returns:
DataFrame

DataFrame 1: Eigenvalues, structured as follows:

• FACTOR_ID: factor id.

• EIGENVALUE: Eigenvalue (i.e. variance explained).

• VAR_PROP: Variance proportion to the total variance explained.

• CUM_VAR_PROP: Cumulative variance proportion to the total variance explained.

DataFrame 2: Variance explanation, structured as follows:

• FACTOR_ID: factor id.

• VAR: Variance explained without rotation .

• VAR_PROP: Variance proportion to the total variance explained without rotation.

• CUM_VAR_PROP: Cumulative variance proportion to the total variance explained without rotation.

• ROT_VAR: Variance explained with rotation.

• ROT_VAR_PROP: Variance proportion to the total variance explained with rotation. Note that there is no rotated variance proportion when performing oblique rotation since the rotated factors are correlated.

• ROT_CUM_VAR_PROP: Cumulative variance proportion to the total variance explained with rotation.

DataFrame 3: Communalities, structured as follows:

• NAME: variable name.

• OBERVED_VARS: Communalities of observed variable.

• FACTOR_ID: Factor id.

• FACTOR_ID: Factor id.

DataFrame 6: Structure, structured as follows:

• FACTOR_ID: Factor id.

• STRUCTURE+OBSERVED_VARS: Structure matrix. It is empty when rotation is not oblique.

DataFrame 7: Rotation, structured as follows:

• ROTATION: rotation

• ROTATION_ + i (i sequences from 1 to number of columns in OBSERVED_VARS (in input table) : Rotation matrix.

DataFrame 8: Factor correlation, structured as follows:

• FACTOR_ID: Factor id.

• FACTOR_ + i (i sequences from 1 to number of columns in OBSERVED_VARS (in input table): Factor correlation matrix. It is empty when rotation is not oblique.

DataFrame 9: Score model, structured as follows:

• NAME: Factor id, MEAN, SD.

• OBSERVED_VARS (in input table) column name: Score coefficients, means and standard deviations of observed variables.

DataFrame 10: Scores, structured as follows:

• FACTOR_ID: Factor id.

• FACTOR_ + i (i sequences from 1 to number of columns in OBSERVED_VARS(in input table)): scores.

DataFrame 11: Statistics, placeholder for future features, structured as follows:

• STAT_NAME: statistic name.

• STAT_VALUE: statistic value.

Examples

Original data:

```>>> df.collect()
ID   X1   X2   X3   X4   X5   X6
0    1  1.0  1.0  3.0  3.0  1.0  1.0
1    2  1.0  2.0  3.0  3.0  1.0  1.0
2    3  1.0  1.0  3.0  4.0  1.0  1.0
3    4  1.0  1.0  3.0  3.0  1.0  2.0
4    5  1.0  1.0  3.0  3.0  1.0  1.0
5    6  1.0  1.0  1.0  1.0  3.0  3.0
6    7  1.0  2.0  1.0  1.0  3.0  3.0
7    8  1.0  1.0  1.0  2.0  3.0  3.0
8    9  1.0  2.0  1.0  1.0  3.0  4.0
9   10  1.0  1.0  1.0  1.0  3.0  3.0
10  11  3.0  3.0  1.0  1.0  1.0  1.0
11  12  3.0  4.0  1.0  1.0  1.0  1.0
12  13  3.0  3.0  1.0  2.0  1.0  1.0
13  14  3.0  3.0  1.0  1.0  1.0  2.0
14  15  3.0  3.0  1.0  1.0  1.0  1.0
15  16  4.0  4.0  5.0  5.0  6.0  6.0
16  17  5.0  6.0  4.0  6.0  4.0  5.0
17  18  6.0  5.0  6.0  4.0  5.0  4.0
```

Apply the factor_analysis function:

```>>> res = factor_analysis(data, key='ID', factor_num=2,
rotation='promax',
matrix='correlation')
>>> res[0].collect()
FACTOR_ID  EIGENVALUE  VAR_PROP  CUM_VAR_PROP
0  FACTOR_1    3.696031  0.616005      0.616005
1  FACTOR_2    1.073114  0.178852      0.794858
2  FACTOR_3    1.000774  0.166796      0.961653
3  FACTOR_4    0.161003  0.026834      0.988487
4  FACTOR_5    0.040961  0.006827      0.995314
5  FACTOR_6    0.028116  0.004686      1.000000
```