Perform the chi-squared test of independence to tell whether two variables are independent from each other.

hanaml.ChisqIndependence(data, key, observed.data = NULL, correction = NULL)

Arguments

data

DataFrame
DataFrame containting the data.

key

character
Name of the ID column.

observed.data

character, optional
Names of the observed data columns. If not given, it defaults to all non-ID columns.

correction

logical, optional
If TRUE, and the degrees of freedom is 1, apply Yates's correction for continuity. The effect of the correction is to adjust each observed value by 0.5 towards the corresponding expected value.
Defaults to FALSE.

Value

Returns a list of 2 DataFrame:

  • DataFrame 1
    The expected count table, structured as follows:

    • ID column, with same name and type as data's ID column.

    • Expected count columns, named by prepending Expected_ to each observed.data column name, type DOUBLE. There will be as many columns here as there are observed.data columns.

  • DataFrame 2
    Statistical outputs including the calculated chi-squared value, degrees of freedom and p-value, structured as follows:

    • STAT_NAME: type NVARCHAR(100), name of statistics

    • STAT_VALUE: type DOUBLE, value of statistics

Examples

Input DataFrame data:


> data$Collect()
       ID  X1    X2  X3    X4
1    male  25  23.0  11  14.0
2  female  41  20.0  18   6.0

Call the function:


> result <- hanaml.ChisqIndependence(data, key="ID")

Expected output:


> result[[1]]$Collect()
       ID  EXPECTED_X1  EXPECTED_X2  EXPECTED_X3  EXPECTED_X4
1    male    30.493671    19.867089    13.398734     9.240506
2  female    35.506329    23.132911    15.601266    10.759494