ttest_ind

hana_ml.algorithms.pal.stats.ttest_ind(data, col1=None, col2=None, mu=0, test_type='two_sides', var_equal=False, conf_level=0.95)

Perform the T-test for the mean difference of two independent samples.

Parameters:

dataDataFrame

DataFrame containing the data.

col1str, optional

Name of the column for sample1.

If not given, it defaults to the first column.

col2str, optional

Name of the column for sample2.

If not given, it defaults to the first non-col1 column.

mufloat, optional

Hypothesized difference between the two underlying population means.

Defaults to 0.

test_type{'two_sides', 'less', 'greater'}, optional

The alternative hypothesis type.

Defaults to 'two_sides'.

var_equalbool, optional

Controls whether to assume that the two samples have equal variance.

Defaults to False.

conf_levelfloat, optional

Confidence level for alternative hypothesis confidence interval.

Defaults to 0.95.

Returns:

DataFrame: Statistics results.

Examples

Original data:

>>> df.collect()
    X1    X2
1.0  10.0
2.0  12.0
4.0  11.0
7.0  15.0
NaN  10.0

Perform Independent Sample T-Test:

>>> ttest_ind(data=df).collect()
           STAT_NAME  STAT_VALUE
          t-value   -5.013774
degree of freedom    5.649757
          p-value    0.002875
    _PAL_MEAN_X1_    3.500000
    _PAL_MEAN_X2_   11.600000
 confidence level    0.950000
       lowerLimit  -12.113278
       upperLimit   -4.086722