Perform the chi-squared goodness-of-fit(GoF) test to tell whether or not an observed distribution differs from an expected chi-squared distribution.

hanaml.ChisqGoF(data, key, observed.data = NULL, expected.freq = NULL)

Arguments

data

DataFrame
DataFrame containting the data.

key

character
Name of the ID column.

observed.data

character, optional
Name of column for counts of actual observations belonging to each category. If not given, it defaults to the first non-ID column of data.

expected.freq

character, optional
Name of the expected frequency column. If not given, it defaults to the first non-ID, non-observed.data columns.

Value

Returns a list of 2 DataFrame:

  • Comparsion between the actual counts and the expected counts : DataFrame
    structured as follows:

    • ID column, with same name and type as data's ID column.

    • Observed data column, with same name as data's observed.data column, but always with type DOUBLE.

    • EXPECTED column, type DOUBLE, expected count in each category.

    • RESIDUAL column, type DOUBLE, the difference between the observed counts and the expected counts.

  • Statistical outputs : DataFrame
    including the calculated chi-squared value, degrees of freedom and p-value, structured as follows:

    • STAT_NAME: type NVARCHAR(100), name of statistics.

    • STAT_VALUE: type DOUBLE, value of statistics.

Examples

Input DataFrame data:


 > data$Collect()
    ID  OBSERVED    P
 1   0     519.0  0.3
 2   1     364.0  0.2
 3   2     363.0  0.2
 4   3     200.0  0.1
 5   4     212.0  0.1
 6   5     193.0  0.1

Create chisquaredfit instance:


 > result <- hanaml.ChisqGoF(data, key = "ID")
 

Output:


  > result[[1]]$Collect()
     ID  OBSERVED  EXPECTED  RESIDUAL
  1   0     519.0     555.3     -36.3
  2   1     364.0     370.2      -6.2
  3   2     363.0     370.2      -7.2
  4   3     200.0     185.1      14.9
  5   4     212.0     185.1      26.9
  6   5     193.0     185.1       7.9