hanaml.FactorAnalysis is a R wrapper for SAP HANA PAL Factor Analysis.

hanaml.FactorAnalysis(
  data,
  key,
  factor.num,
  cols = NULL,
  method = NULL,
  rotation = NULL,
  score = NULL,
  matrix = NULL,
  kappa = NULL
)

Arguments

data

DataFrame
DataFrame containting the data.

key

character
Name of the ID column.

factor.num

integer
number of factors used to explain the covariance structure of the dataset. It should be choosen between 1 and the number of variables.

cols

list/vector of characters, optional
Name of data columns that need to be analyzed.
If it is not provided, it defaults all non-key columns of data.

method

{"pcm"}, optional
Specifies method used for factor analysis. Currently PAL only supports the principal component method.
Defaults to "pcm".

rotation

{"none", "varimax", "promax"}, optional
Specifies method used to rotate the loadings

  • "none"

  • "varimax"

  • "promax"

Defaults to "varimax".

score

{"none", "regression"}, optional
Specifies method to compute factor scores:

  • "none"

  • "regression"

Defaults to "regression".

matrix

{"covariance", "correlation"}, optional

  • "covariance" use covariance matrix to perform factor analysis

  • "correlation" use correlation matrix to perform factor analysis

Defaults to "correlation".

kappa

double, optional
only valid when rotation = "promax" specifies power of promax rotation.
Defaults to 4.

Value

Returns a list of DataFrames:

  • DataFrame 1
    Sampling results, structured as follows:

    • FACTOR_ID: factor id.

    • EIGENVALUE: Eigenvalue (i.e. variance explained).

    • VAR_PROP: Variance proportion to the total variance explained.

    • CUM_VAR_PROP: Cumulative variance proportion to the total variance explained.

  • DataFrame 2
    Variance explanation, structured as follows:

    • FACTOR_ID: factor id.

    • VAR: Variance explained without rotation.

    • VAR_PROP: Variance proportion to the total variance explained without rotation.

    • CUM_VAR_PROP: Cumulative variance proportion to the total variance explained without rotation.

    • ROT_VAR: Variance explained with rotation

    • ROT_VAR_PROP: Variance proportion to the total variance explained with rotation.Note that there is no rotated variance proportion when performing oblique rotation since the rotated factors are correlated.

    • ROT_CUM_VAR_PROP: Cumulative variance proportion to the total variance explained with rotation.

  • DataFrame 3

    • NAME: Variable name.

    • OBERVED_VARS: Communalities of observed variable.

  • DataFrame 4

    • FACTOR_ID: Factor id.

    • LOADINGs_+OBSERVED_VARs: loadings.

  • DataFrame 5

    • FACTOR_ID: Factor id.

    • ROT_LOADINGS_+OBSERVED_VARs: rotated loadings.

  • DataFrame 6

    • FACTOR_ID: Factor id.

    • STRUCTURE+OBSERVED_VARS: Structure matrix. It is empty when rotation is not oblique.

  • DataFrame 7

    • ROTATION: rotation

    • ROTATION_ + i (i sequences from 1 to number of columns in OBSERVED_VARS (in input table): Rotation matrix.

  • DataFrame 8

    • FACTOR_ID: Factor id

    • FACTOR_ + i (i sequences from 1 to number of columns in OBSERVED_VARS (in input table): Factor correlation matrix. It is empty when rotation is not oblique.

  • DataFrame 9

    • NAME: Factor id, MEAN, SD

    • OBSERVED_VARS (in input table) column name: Score coefficients, means and standard deviations of observed variables.

  • DataFrame 10

    • FACTOR_ID: Factor id

    • FACTOR_ + i (i sequences from 1 to number of columns in OBSERVED_VARS (in input table)): scores

  • DataFrame 11 Placeholder for future features:

    • STAT_NAME: statistic name.

    • STAT_VALUE: statistic value.

Details

Factor Analysis is a statistical method used to extract a low number of latent factors that can best describe the correlations of a large set of observed variables.

Examples

Input DataFrame data:


>data$Head(6)$Collect()
  ID  X1  X2  X3  X4  X5  X6
1  1   1   1   3   3   1   1
2  2   1   2   3   3   1   1
3  3   1   1   3   4   1   1
4  4   1   1   3   3   1   2

Call the function:


> fa <- hanaml.FactorAnalysis(data=data,
                              factor.num=2,
                              method="pcm",
                              rotation="promax",
                              score="regression",
                              matrix="correlation",
                              kappa=4)

Output:


> fa[[1]]$Collect()
  FACTOR_ID EIGENVALUE    VAR_PROP CUM_VAR_PROP
1  FACTOR_1 3.69603077 0.616005129    0.6160051
2  FACTOR_2 1.07311448 0.178852413    0.7948575
3  FACTOR_3 1.00077409 0.166795682    0.9616532
4  FACTOR_4 0.16100348 0.026833913    0.9884871
5  FACTOR_5 0.04096116 0.006826860    0.9953140
6  FACTOR_6 0.02811601 0.004686002    1.0000000