factor_analysis
- hana_ml.algorithms.pal.stats.factor_analysis(data, key, factor_num, col=None, method=None, rotation=None, score=None, matrix=None, kappa=None)
Factor analysis is a statistical method that tries to extract a low number of unobserved variables, i.e. factors, that can best describe the covariance pattern of a larger set of observed variables.
- Parameters:
- dataDataFrame
DataFrame containing the data.
- keystr
Name of the ID column.
- factor_numint
Number of factors.
- colstr or a list of str, optional
Name of the feature columns.
- method{'principle_component'}, optional
Specifies the method used for factor analysis.
Currently SAP HANA PAL only supports the principal component method.
- rotation{'non', 'varimax', 'promax'}, optional
Specifies the rotation to be performed on loadings.
Default to 'varimax'.
- score{'non', 'regression'}, optional
Specifies the method to compute factor scores.
Default to 'regression'.
- matrix{'covariance', 'correlation'}, optional
Uses cor matrix to perform factor analysis.
Default to 'correlation'.
- kappafloat, optional
Power of promax rotation. (Only valid when rotation is promax.)
Default to 4.
- Returns:
- DataFrames
DataFrame 1: Eigenvalues, structured as follows:
FACTOR_ID: factor id.
EIGENVALUE: Eigenvalue (i.e. variance explained).
VAR_PROP: Variance proportion to the total variance explained.
CUM_VAR_PROP: Cumulative variance proportion to the total variance explained.
DataFrame 2: Variance explanation, structured as follows:
FACTOR_ID: factor id.
VAR: Variance explained without rotation .
VAR_PROP: Variance proportion to the total variance explained without rotation.
CUM_VAR_PROP: Cumulative variance proportion to the total variance explained without rotation.
ROT_VAR: Variance explained with rotation.
ROT_VAR_PROP: Variance proportion to the total variance explained with rotation. Note that there is no rotated variance proportion when performing oblique rotation since the rotated factors are correlated.
ROT_CUM_VAR_PROP: Cumulative variance proportion to the total variance explained with rotation.
DataFrame 3: Communalities, structured as follows:
NAME: variable name.
OBERVED_VARS: Communalities of observed variable.
DataFrame 4: Loadings, structured as follows:
FACTOR_ID: Factor id.
LOADINGs_+OBSERVED_VARs: loadings.
DataFrame 5: Rotated loadings, structured as follows:
FACTOR_ID: Factor id.
ROT_LOADINGS_+OBSERVED_VARs: rotated loadings.
DataFrame 6: Structure, structured as follows:
FACTOR_ID: Factor id.
STRUCTURE+OBSERVED_VARS: Structure matrix. It is empty when rotation is not oblique.
DataFrame 7: Rotation, structured as follows:
ROTATION: rotation
ROTATION_ + i (i sequences from 1 to number of columns in OBSERVED_VARS (in input table) : Rotation matrix.
DataFrame 8: Factor correlation, structured as follows:
FACTOR_ID: Factor id.
FACTOR_ + i (i sequences from 1 to number of columns in OBSERVED_VARS (in input table): Factor correlation matrix. It is empty when rotation is not oblique.
DataFrame 9: Score model, structured as follows:
NAME: Factor id, MEAN, SD.
OBSERVED_VARS (in input table) column name: Score coefficients, means and standard deviations of observed variables.
DataFrame 10: Scores, structured as follows:
FACTOR_ID: Factor id.
FACTOR_ + i (i sequences from 1 to number of columns in OBSERVED_VARS(in input table)): scores.
DataFrame 11: Statistics.
Examples
Original data:
>>> df.collect() ID X1 X2 X3 X4 X5 X6 0 1 1.0 1.0 3.0 3.0 1.0 1.0 1 2 1.0 2.0 3.0 3.0 1.0 1.0 ... 17 18 6.0 5.0 6.0 4.0 5.0 4.0
>>> res = factor_analysis(data=df, key='ID', factor_num=2, rotation='promax', matrix='correlation') >>> res[0].collect() FACTOR_ID EIGENVALUE VAR_PROP CUM_VAR_PROP 0 FACTOR_1 3.696031 0.616005 0.616005 1 FACTOR_2 1.073114 0.178852 0.794858 ... 5 FACTOR_6 0.028116 0.004686 1.000000