biostats.one_way_anova#

biostats.one_way_anova(data, variable, between)[source]#

Test whether the mean values of a variable are different between several groups.

Parameters:

datapandas.DataFrame: The input data. Must contain at least one numeric column and one categorical column.
variablestr: The numeric variable that we want to calculate mean values of.
betweenstr: The categorical variable that specifies which group the samples belong to. Maximum 20 groups.

Returns:

summarypandas.DataFrame: The counts, mean values, standard deviations, and confidence intervals of each group.
resultpandas.DataFrame: The degree of freedom, sum of squares, mean of squares, F statistic, and p-value of the test.

See also

one_way_ancova: Test whether the mean values are different between groups, when another variable is controlled.
two_way_anova: Test whether the mean values are different between groups, when classified in two ways.
kruskal_wallis_test: The non-parametric version of one-way ANOVA.

Examples

>>> import biostats as bs
>>> data = bs.dataset("one_way_anova.csv")
>>> data
    Length    Location
 0.0571   Tillamook
 0.0813   Tillamook
 0.0831   Tillamook
 0.0976   Tillamook
 0.0817   Tillamook
 0.0859   Tillamook
 0.0735   Tillamook
 0.0659   Tillamook
 0.0923   Tillamook
 0.0836   Tillamook
0.0873     Newport
0.0662     Newport
0.0672     Newport
0.0819     Newport
0.0749     Newport
0.0649     Newport
0.0835     Newport
0.0725     Newport
0.0974  Petersburg
0.1352  Petersburg
0.0817  Petersburg
0.1016  Petersburg
0.0968  Petersburg
0.1064  Petersburg
0.1050  Petersburg
0.1033     Magadan
0.0915     Magadan
0.0781     Magadan
0.0685     Magadan
0.0677     Magadan
0.0697     Magadan
0.0764     Magadan
0.0689     Magadan
0.0703   Tvarminne
0.1026   Tvarminne
0.0956   Tvarminne
0.0973   Tvarminne
0.1039   Tvarminne
0.1045   Tvarminne

We want to test whether the mean values of Length in each Location are different.

>>> summary, result = bs.one_way_anova(data=data, variable="Length", between="Location")
>>> summary
     Location  Count      Mean  Std. Deviation  95% CI: Lower  95% CI: Upper
1   Tillamook     10  0.080200        0.011963       0.071642       0.088758
2     Newport      8  0.074800        0.008597       0.067613       0.081987
3  Petersburg      7  0.103443        0.016209       0.088452       0.118434
4     Magadan      8  0.078012        0.012945       0.067190       0.088835
5   Tvarminne      6  0.095700        0.012962       0.082098       0.109302

The mean values of Length and their 95% confidence intervals in each group are given.

>>> result
          D.F.  Sum Square  Mean Square  F Statistic   p-value     
Location     4    0.004520     0.001130     7.121019  0.000281  ***
Residual    34    0.005395     0.000159          NaN       NaN  NaN

The p-value < 0.001, so the mean values of Length in each group are significantly different.