f_oneway_repeated
- hana_ml.algorithms.pal.stats.f_oneway_repeated(data, subject_id, measures=None, multcomp_method=None, significance_level=None, se_type=None)
Performs one-way repeated measures analysis of variance, along with Mauchly's Test of Sphericity and post hoc multiple comparison tests.
- Parameters
- dataDataFrame
DataFrame containing the data.
- subject_idstr
Name of the subject ID column. The algorithm treats each row of the data table as a different subject. Hence there should be no duplicate subject IDs in this column.
- measureslist of str, optional
Names of the groups (measures).
If not provided, defaults to all non-subject_id columns.
- multcomp_method{'tukey-kramer', 'bonferroni', 'dunn-sidak', 'scheffe', 'fisher-lsd'}, optional
Method used to perform multiple comparison tests.
Defaults to 'bonferroni'.
- significance_levelfloat, optional
The significance level when the function calculates the confidence interval in multiple comparison tests.
Values must be greater than 0 and less than 1.
Defaults to 0.05.
- se_type{'all-data', 'two-group'}
- Type of standard error used in multiple comparison tests.
'all-data': computes the standard error from all data. It has more power if the assumption of sphericity is true, especially with small data sets.
'two-group': computes the standard error from only the two groups being compared. It doesn't assume sphericity.
Defaults to 'two-group'.
- Returns
- DataFrame
Statistics for each group, structured as follows:
GROUP, type NVARCHAR(256), group name.
VALID_SAMPLES, type INTEGER, number of valid samples.
MEAN, type DOUBLE, group mean.
SD, type DOUBLE, group standard deviation.
Mauchly test results, structured as follows:
STAT_NAME, type NVARCHAR(100), names of test result quantities.
STAT_VALUE, type DOUBLE, values of test result quantities.
Computed results, structured as follows:
VARIABILITY_SOURCE, type NVARCHAR(100), source of variability, divided into group, error and subject portions.
SUM_OF_SQUARES, type DOUBLE, sum of squares.
DEGREES_OF_FREEDOM, type DOUBLE, degrees of freedom.
MEAN_SQUARES, type DOUBLE, mean squares.
F_RATIO, type DOUBLE, calculated as mean square between groups divided by mean square of error.
P_VALUE, type DOUBLE, associated p-value from the F-distribution.
P_VALUE_GG, type DOUBLE, p-value of Greenhouse-Geisser correction.
P_VALUE_HF, type DOUBLE, p-value of Huynh-Feldt correction.
P_VALUE_LB, type DOUBLE, p-value of lower bound correction.
Multiple comparison results, structured as follows:
FIRST_GROUP, type NVARCHAR(256), the name of the first group to conduct pairwise test on.
SECOND_GROUP, type NVARCHAR(256), the name of the second group to conduct pairwise test on.
MEAN_DIFFERENCE, type DOUBLE, mean difference between the two groups.
SE, type DOUBLE, standard error computed from all data or compared two groups, depending on
se_type
.P_VALUE, type DOUBLE, p-value.
CI_LOWER, type DOUBLE, the lower limit of the confidence interval.
CI_UPPER, type DOUBLE, the upper limit of the confidence interval.
Examples
Data df:
>>> df.collect() ID MEASURE1 MEASURE2 MEASURE3 MEASURE4 0 1 8.0 7.0 1.0 6.0 1 2 9.0 5.0 2.0 5.0 2 3 6.0 2.0 3.0 8.0 3 4 5.0 3.0 1.0 9.0 4 5 8.0 4.0 5.0 8.0 5 6 7.0 5.0 6.0 7.0 6 7 10.0 2.0 7.0 2.0 7 8 12.0 6.0 8.0 1.0
Perform the function:
>>> stats, mtest, anova, mult_comp = f_oneway_repeated( ... data=df, ... subject_id='ID', ... multcomp_method='bonferroni', ... significance_level=0.05, ... se_type='two-group')
Outputs:
>>> stats.collect() GROUP VALID_SAMPLES MEAN SD 0 MEASURE1 8 8.125 2.232071 1 MEASURE2 8 4.250 1.832251 2 MEASURE3 8 4.125 2.748376 3 MEASURE4 8 5.750 2.915476 >>> mtest.collect() STAT_NAME STAT_VALUE 0 Mauchly's W 0.136248 1 Chi-Square 11.405981 2 df 5.000000 3 pValue 0.046773 4 Greenhouse-Geisser Epsilon 0.532846 5 Huynh-Feldt Epsilon 0.665764 6 Lower bound Epsilon 0.333333 >>> anova.collect() VARIABILITY_SOURCE SUM_OF_SQUARES DEGREES_OF_FREEDOM MEAN_SQUARES 0 Group 83.125 3.0 27.708333 1 Subject 17.375 7.0 2.482143 2 Error 153.375 21.0 7.303571 F_RATIO P_VALUE P_VALUE_GG P_VALUE_HF P_VALUE_LB 0 3.793806 0.02557 0.062584 0.048331 0.092471 1 NaN NaN NaN NaN NaN 2 NaN NaN NaN NaN NaN >>> mult_comp.collect() FIRST_GROUP SECOND_GROUP MEAN_DIFFERENCE SE P_VALUE CI_LOWER 0 MEASURE1 MEASURE2 3.875 0.811469 0.012140 0.924655 1 MEASURE1 MEASURE3 4.000 0.731925 0.005645 1.338861 2 MEASURE1 MEASURE4 2.375 1.792220 1.000000 -4.141168 3 MEASURE2 MEASURE3 0.125 1.201747 1.000000 -4.244322 4 MEASURE2 MEASURE4 -1.500 1.336306 1.000000 -6.358552 5 MEASURE3 MEASURE4 -1.625 1.821866 1.000000 -8.248955 CI_UPPER 0 6.825345 1 6.661139 2 8.891168 3 4.494322 4 3.358552 5 4.998955