ttest_paired

hana_ml.algorithms.pal.stats.ttest_paired(data, col1=None, col2=None, mu=0, test_type='two_sides', conf_level=0.95)

Performs the t-test for the mean difference of two sets of paired samples.

Parameters:
dataDataFrame

DataFrame containing the data.

col1str, optional

Name of the column for sample1.

If not given, defaults to the first column.

col2str, optional

Name of the column for sample2.

If not given, defaults to the first non-col1 column.

mufloat, optional

Hypothesized difference between two underlying population means.

Defaults to 0.

test_type{'two_sides', 'less', 'greater'}, optional

The alternative hypothesis type.

Defaults to 'two_sides'.

conf_levelfloat, optional

Confidence level for alternative hypothesis confidence interval.

Defaults to 0.95.

Returns:
DataFrame

Statistics results.

Examples

Original data:

>>> df.collect()
    X1    X2
0  1.0  10.0
1  2.0  12.0
2  4.0  11.0
3  7.0  15.0
4  3.0  10.0

perform Paired Sample T-Test:

>>> ttest_paired(data=df).collect()
                STAT_NAME  STAT_VALUE
0                 t-value  -14.062884
1       degree of freedom    4.000000
2                 p-value    0.000148
3  _PAL_MEAN_DIFFERENCES_   -8.200000
4        confidence level    0.950000
5              lowerLimit   -9.818932
6              upperLimit   -6.581068