hanaml.ChisqIndependence.Rd
Perform the chi-squared test of independence to tell whether two variables are independent from each other.
hanaml.ChisqIndependence(data, key, observed.data = NULL, correction = NULL)
DataFrame
DataFrame containting the data.
character
Name of the ID column.
character, optional
Names of the observed data columns.
If not given, it defaults to all non-ID columns.
logical, optional
If TRUE, and the degrees of freedom is 1, apply
Yates's correction for continuity. The effect of
the correction is to adjust each observed value by 0.5
towards the corresponding expected value.
Defaults to FALSE.
Returns a list of 2 DataFrame:
DataFrame 1
The expected count table, structured as follows:
ID column, with same name and type as data's ID column.
Expected count columns, named by prepending Expected_ to each observed.data column name, type DOUBLE. There will be as many columns here as there are observed.data columns.
DataFrame 2
Statistical outputs including the calculated chi-squared value,
degrees of freedom and p-value, structured as follows:
STAT_NAME: type NVARCHAR(100), name of statistics
STAT_VALUE: type DOUBLE, value of statistics
Input DataFrame data:
> data$Collect()
ID X1 X2 X3 X4
1 male 25 23.0 11 14.0
2 female 41 20.0 18 6.0
Call the function:
> result <- hanaml.ChisqIndependence(data, key="ID")
Expected output:
> result[[1]]$Collect()
ID EXPECTED_X1 EXPECTED_X2 EXPECTED_X3 EXPECTED_X4
1 male 30.493671 19.867089 13.398734 9.240506
2 female 35.506329 23.132911 15.601266 10.759494