Chi-squared Goodness-of-fit(GoF) — hanaml.ChisqGoF • hana.ml.r

Perform the chi-squared goodness-of-fit(GoF) test to tell whether or not an observed distribution differs from an expected chi-squared distribution.

hanaml.ChisqGoF(data, key, observed.data = NULL, expected.freq = NULL)

Arguments

data: DataFrame
DataFrame containting the data.
key: character
Name of the ID column.
observed.data: character, optional
Name of column for counts of actual observations belonging to each category. If not given, it defaults to the first non-ID column of data.
expected.freq: character, optional
Name of the expected frequency column. If not given, it defaults to the first non-ID, non-observed.data columns.

Value

Returns a list of 2 DataFrame:

Comparsion between the actual counts and the expected counts : DataFrame
structured as follows:
- ID column, with same name and type as data's ID column.
- Observed data column, with same name as data's observed.data column, but always with type DOUBLE.
- EXPECTED column, type DOUBLE, expected count in each category.
- RESIDUAL column, type DOUBLE, the difference between the observed counts and the expected counts.
Statistical outputs : DataFrame
including the calculated chi-squared value, degrees of freedom and p-value, structured as follows:
- STAT_NAME: type NVARCHAR(100), name of statistics.
- STAT_VALUE: type DOUBLE, value of statistics.

Examples

Input DataFrame data:


 > data$Collect()
    ID  OBSERVED    P
 1   0     519.0  0.3
 2   1     364.0  0.2
 3   2     363.0  0.2
 4   3     200.0  0.1
 5   4     212.0  0.1
 6   5     193.0  0.1

Create chisquaredfit instance:


 > result <- hanaml.ChisqGoF(data, key = "ID")

Output:


  > result[[1]]$Collect()
     ID  OBSERVED  EXPECTED  RESIDUAL
  1   0     519.0     555.3     -36.3
  2   1     364.0     370.2      -6.2
  3   2     363.0     370.2      -7.2
  4   3     200.0     185.1      14.9
  5   4     212.0     185.1      26.9
  6   5     193.0     185.1       7.9