biostats.fisher_exact_test#
- biostats.fisher_exact_test(data, variable_1, variable_2, kind='count')[source]#
Test whether there is an association between two categorical variables.
- Parameters:
- data
pandas.DataFrame
The input data. Must contain at least two categorical columns.
- variable_1
str
The first categorical variable. Maximum 10 groups.
- variable_2
str
The second categorical variable. Switching the two variables will not change the result of Fisher exact test. Maximum 10 groups.
- kind
str
The way to summarize the contingency table.
“count” : Count the frequencies of occurance.
“vertical” : Calculate proportions vertically, so that the sum of each column equals 1.
“horizontal” : Calculate proportions horizontally, so that the sum of each row equals 1.
“overall” : Calculate overall proportions, so that the sum of the whole table equals 1.
- data
- Returns:
- summary
pandas.DataFrame
The contingency table of the two categorical variables.
- result
pandas.DataFrame
The p-value of the test.
- summary
See also
chi_square_test
The normal approximation version of Fisher exact test.
binomial_test
Test the difference between the observed and expected proportion of a variable.
Notes
Warning
Fisher exact test calculates the exact p-value by iterating through all the possible distributions, so it may consume lots of time when the size of data is huge. For larger data,
chi_square_test()
is recommended.Examples
>>> import biostats as bs >>> data = bs.dataset("fisher_exact_test.csv") >>> data Frequency Result 0 Monthly Undamaged 1 Monthly Damaged 2 Monthly Damaged 3 Monthly Damaged 4 Monthly Undamaged .. ... ... 95 Monthly Undamaged 96 Weekly Undamaged 97 Monthly Damaged 98 Quarterly Undamaged 99 Monthly Undamaged
We want to test whether there is an association between Frequency and Result.
>>> summary, result = bs.fisher_exact_test(data=data, variable_1="Frequency", variable_2="Result", kind="horizontal") >>> summary Damaged Undamaged Daily 0.04 0.96 Monthly 0.56 0.44 Quarterly 0.44 0.56 Weekly 0.20 0.80
The proportions of Damaged in different Frequency are given.
>>> result p-value Model 0.000123 ***
The p-value < 0.001, so there is a significant association between Frequency and Result. That is, the proportions of Damaged are different between the four Frequency.