biostats.two_sample_t_test#

biostats.two_sample_t_test(data, variable, between, group, kind='equal variances')[source]#

Test whether the mean values of a variable are different in two groups.

Parameters:
datapandas.DataFrame

The input data. Must contain at least one numeric column and one categorical column.

variablestr

The numeric variable that we want to calculate mean values of.

betweenstr

The categorical variable that specifies which group the samples belong to. Maximum 20 groups.

grouplist

List of the two groups to be compared.

kindstr
  • “equal variances” : The normal two-sample t-test which assumes variances of the two groups are equal.

  • “unequal variances” : The variant model in which variances of the two groups can be unequal. Also called Welch’s t-test.

Returns:
summarypandas.DataFrame

The estimations, standard errors, and confidence intervals of the mean values in the two groups, as well as the difference between them.

resultpandas.DataFrame

The degree of freedom, t statistic, and p-value of the test.

See also

paired_t_test

Compare the mean values between two paired groups.

one_way_anova

Compare the mean values between more than two groups.

wilcoxon_rank_sum_test

The non-parametric version of two-sample t-test.

Examples

>>> import biostats as bs
>>> data = bs.dataset("two_sample_t_test.csv")
>>> data
    Value Time
0      69  2pm
1      70  2pm
2      66  2pm
3      63  2pm
4      68  2pm
5      70  2pm
6      69  2pm
7      67  2pm
8      62  2pm
9      63  2pm
10     76  2pm
11     59  2pm
12     62  2pm
13     62  2pm
14     75  2pm
15     62  2pm
16     72  2pm
17     63  2pm
18     68  5pm
19     62  5pm
20     67  5pm
21     68  5pm
22     69  5pm
23     67  5pm
24     61  5pm
25     59  5pm
26     62  5pm
27     61  5pm
28     69  5pm
29     66  5pm
30     62  5pm
31     62  5pm
32     61  5pm
33     70  5pm

We want to test whether value is different between 2pm and 5pm.

>>> summary, result = bs.two_sample_t_test(data=data, variable="Value", between="Time", group=["2pm", "5pm"], kind="equal variances")
>>> summary
             Estimate  Std. Error  95% CI: Lower  95% CI: Upper
2pm         66.555556    1.152497      64.123999      68.987112
5pm         64.625000    0.916856      62.670768      66.579232
Difference   1.930556    1.497923      -1.120613       4.981725

The mean values of the two groups and the difference between them are given.

>>> result
       D.F.  t Statistic  p-value      
Model    32     1.288822   0.2067  <NA>

The p-value > 0.05, so there is no significant difference between the two groups.