biostats.one_way_anova#

biostats.one_way_anova(data, variable, between)[source]#

Test whether the mean values of a variable are different between several groups.

Parameters:
datapandas.DataFrame

The input data. Must contain at least one numeric column and one categorical column.

variablestr

The numeric variable that we want to calculate mean values of.

betweenstr

The categorical variable that specifies which group the samples belong to. Maximum 20 groups.

Returns:
summarypandas.DataFrame

The counts, mean values, standard deviations, and confidence intervals of each group.

resultpandas.DataFrame

The degree of freedom, sum of squares, mean of squares, F statistic, and p-value of the test.

See also

one_way_ancova

Test whether the mean values are different between groups, when another variable is controlled.

two_way_anova

Test whether the mean values are different between groups, when classified in two ways.

kruskal_wallis_test

The non-parametric version of one-way ANOVA.

Examples

>>> import biostats as bs
>>> data = bs.dataset("one_way_anova.csv")
>>> data
    Length    Location
0   0.0571   Tillamook
1   0.0813   Tillamook
2   0.0831   Tillamook
3   0.0976   Tillamook
4   0.0817   Tillamook
5   0.0859   Tillamook
6   0.0735   Tillamook
7   0.0659   Tillamook
8   0.0923   Tillamook
9   0.0836   Tillamook
10  0.0873     Newport
11  0.0662     Newport
12  0.0672     Newport
13  0.0819     Newport
14  0.0749     Newport
15  0.0649     Newport
16  0.0835     Newport
17  0.0725     Newport
18  0.0974  Petersburg
19  0.1352  Petersburg
20  0.0817  Petersburg
21  0.1016  Petersburg
22  0.0968  Petersburg
23  0.1064  Petersburg
24  0.1050  Petersburg
25  0.1033     Magadan
26  0.0915     Magadan
27  0.0781     Magadan
28  0.0685     Magadan
29  0.0677     Magadan
30  0.0697     Magadan
31  0.0764     Magadan
32  0.0689     Magadan
33  0.0703   Tvarminne
34  0.1026   Tvarminne
35  0.0956   Tvarminne
36  0.0973   Tvarminne
37  0.1039   Tvarminne
38  0.1045   Tvarminne

We want to test whether the mean values of Length in each Location are different.

>>> summary, result = bs.one_way_anova(data=data, variable="Length", between="Location")
>>> summary
     Location  Count      Mean  Std. Deviation  95% CI: Lower  95% CI: Upper
1   Tillamook     10  0.080200        0.011963       0.071642       0.088758
2     Newport      8  0.074800        0.008597       0.067613       0.081987
3  Petersburg      7  0.103443        0.016209       0.088452       0.118434
4     Magadan      8  0.078012        0.012945       0.067190       0.088835
5   Tvarminne      6  0.095700        0.012962       0.082098       0.109302

The mean values of Length and their 95% confidence intervals in each group are given.

>>> result
          D.F.  Sum Square  Mean Square  F Statistic   p-value     
Location     4    0.004520     0.001130     7.121019  0.000281  ***
Residual    34    0.005395     0.000159          NaN       NaN  NaN

The p-value < 0.001, so the mean values of Length in each group are significantly different.