biostats.wilcoxon_rank_sum_test#

biostats.wilcoxon_rank_sum_test(data, variable, between, group)[source]#

Test whether the mean values of a variable are different in two groups with nonparametric methods.

Parameters:
datapandas.DataFrame

The input data. Must contain at least one numeric column and one categorical column.

variablestr

The numeric variable that we want to calculate mean values of.

betweenstr

The categorical variable that specifies which group the samples belong to. Maximum 20 groups.

grouplist

List of the two groups to be compared.

Returns:
summarypandas.DataFrame

The counts, mean values, standard deviations, minimums, first quartiles, medians, third quartiles, and maximums of the variable in the two groups.

resultpandas.DataFrame

The rank sums, z statistic, and p-values of the normal and exact tests.

See also

wilcoxon_signed_rank_test

Compare the mean values between two paired groups with nonparametric methods.

kruskal_wallis_test

Compare the mean values between more than two groups with nonparametric methods.

two_sample_t_test

The parametric version of Wilcoxon rank-sum test.

Examples

>>> import biostats as bs
>>> data = bs.dataset("wilcoxon_rank_sum_test.csv")
>>> data
    Value Time
0      69  2pm
1      70  2pm
2      66  2pm
3      63  2pm
4      68  2pm
5      70  2pm
6      69  2pm
7      67  2pm
8      62  2pm
9      63  2pm
10     76  2pm
11     59  2pm
12     62  2pm
13     62  2pm
14     75  2pm
15     62  2pm
16     72  2pm
17     63  2pm
18     68  5pm
19     62  5pm
20     67  5pm
21     68  5pm
22     69  5pm
23     67  5pm
24     61  5pm
25     59  5pm
26     62  5pm
27     61  5pm
28     69  5pm
29     66  5pm
30     62  5pm
31     62  5pm
32     61  5pm
33     70  5pm

We want to test whether value is different between 2pm and 5pm with nonparametric methods.

>>> summary, result = bs.wilcoxon_rank_sum_test(data=data, variable="Value", between="Time", group=["2pm", "5pm"])
>>> summary
     Count       Mean  Std. Deviation  Minimum  1st Quartile  Median  3rd Quartile  Maximum
2pm     18  66.555556        4.889632       59         62.25    66.5         69.75       76
5pm     16  64.625000        3.667424       59         61.75    64.0         68.00       70

The mean values and some descriptive statistics of the two groups are given.

>>> result
        Rank Sum  z Statistic   p-value      
Normal       357     1.444746  0.148529  <NA>
Exact        357          NaN       NaN  <NA>

The p-value > 0.05, so there is no significant difference between the two groups.