biostats.wilcoxon_rank_sum_test#

biostats.wilcoxon_rank_sum_test(data, variable, between, group)[source]#

Test whether the mean values of a variable are different in two groups with nonparametric methods.

Parameters:

datapandas.DataFrame: The input data. Must contain at least one numeric column and one categorical column.
variablestr: The numeric variable that we want to calculate mean values of.
betweenstr: The categorical variable that specifies which group the samples belong to. Maximum 20 groups.
grouplist: List of the two groups to be compared.

Returns:

summarypandas.DataFrame: The counts, mean values, standard deviations, minimums, first quartiles, medians, third quartiles, and maximums of the variable in the two groups.
resultpandas.DataFrame: The rank sums, z statistic, and p-values of the normal and exact tests.

See also

wilcoxon_signed_rank_test: Compare the mean values between two paired groups with nonparametric methods.
kruskal_wallis_test: Compare the mean values between more than two groups with nonparametric methods.
two_sample_t_test: The parametric version of Wilcoxon rank-sum test.

Examples

>>> import biostats as bs
>>> data = bs.dataset("wilcoxon_rank_sum_test.csv")
>>> data
    Value Time
    69  2pm
    70  2pm
    66  2pm
    63  2pm
    68  2pm
    70  2pm
    69  2pm
    67  2pm
    62  2pm
    63  2pm
   76  2pm
   59  2pm
   62  2pm
   62  2pm
   75  2pm
   62  2pm
   72  2pm
   63  2pm
   68  5pm
   62  5pm
   67  5pm
   68  5pm
   69  5pm
   67  5pm
   61  5pm
   59  5pm
   62  5pm
   61  5pm
   69  5pm
   66  5pm
   62  5pm
   62  5pm
   61  5pm
   70  5pm

We want to test whether value is different between 2pm and 5pm with nonparametric methods.

>>> summary, result = bs.wilcoxon_rank_sum_test(data=data, variable="Value", between="Time", group=["2pm", "5pm"])
>>> summary
     Count       Mean  Std. Deviation  Minimum  1st Quartile  Median  3rd Quartile  Maximum
2pm     18  66.555556        4.889632       59         62.25    66.5         69.75       76
5pm     16  64.625000        3.667424       59         61.75    64.0         68.00       70

The mean values and some descriptive statistics of the two groups are given.

>>> result
        Rank Sum  z Statistic   p-value      
Normal       357     1.444746  0.148529  <NA>
Exact        357          NaN       NaN  <NA>

The p-value > 0.05, so there is no significant difference between the two groups.