biostats.kruskal_wallis_test#

biostats.kruskal_wallis_test(data, variable, between)[source]#

Test whether the mean values of a variable are different between several groups with nonparametric methods.

Parameters:
datapandas.DataFrame

The input data. Must contain at least one numeric column and one categorical column.

variablestr

The numeric variable that we want to calculate mean values of.

betweenstr

The categorical variable that specifies which group the samples belong to. Maximum 20 groups.

Returns:
summarypandas.DataFrame

The counts, mean values, standard deviations, minimums, first quartiles, medians, third quartiles, and maximums of the variable in each group.

resultpandas.DataFrame

The degree of freedom, chi-square statistic, and p-value of the test.

See also

one_way_anova

The parametric version of Kruskal-Wallis test.

Examples

>>> import biostats as bs
>>> data = bs.dataset("kruskal_wallis_test.csv")
>>> data
    Value    Group
0       1  Group.1
1       2  Group.1
2       3  Group.1
3       4  Group.1
4       5  Group.1
5       6  Group.1
6       7  Group.1
7       8  Group.1
8       9  Group.1
9      46  Group.1
10     47  Group.1
11     48  Group.1
12     49  Group.1
13     50  Group.1
14     51  Group.1
15     52  Group.1
16     53  Group.1
17    342  Group.1
18     10  Group.2
19     11  Group.2
20     12  Group.2
21     13  Group.2
22     14  Group.2
23     15  Group.2
24     16  Group.2
25     17  Group.2
26     18  Group.2
27     37  Group.2
28     58  Group.2
29     59  Group.2
30     60  Group.2
31     61  Group.2
32     62  Group.2
33     63  Group.2
34     64  Group.2
35    193  Group.2
36     19  Group.3
37     20  Group.3
38     21  Group.3
39     22  Group.3
40     23  Group.3
41     24  Group.3
42     25  Group.3
43     26  Group.3
44     27  Group.3
45     28  Group.3
46     65  Group.3
47     66  Group.3
48     67  Group.3
49     68  Group.3
50     69  Group.3
51     70  Group.3
52     71  Group.3
53     72  Group.3

We want to test whether the mean values of Value in each Group are different.

>>> summary, result = bs.kruskal_wallis_test(data=data, variable="Value", between="Group")
>>> summary
     Group  Count  Mean  Std. Deviation  Minimum  1st Quartile  Median  3rd Quartile  Maximum
1  Group.1     18  43.5       77.775128        1          5.25    27.5         49.75      342
2  Group.2     18  43.5       43.694461       10         14.25    27.5         60.75      193
3  Group.3     18  43.5       23.167548       19         23.25    27.5         67.75       72

The mean values and some descriptive statistics of each group are given.

>>> result
       D.F.  Chi Square   p-value   
Model     2    7.355331  0.025282  *

The p-value < 0.05, so the mean values of Value in each group are significantly different.