biostats.chi_square_test#
- biostats.chi_square_test(data, variable_1, variable_2, kind='count')[source]#
Test whether there is an association between two categorical variables.
- Parameters:
- data
pandas.DataFrame
The input data. Must contain at least two categorical columns.
- variable_1
str
The first categorical variable. Maximum 20 groups.
- variable_2
str
The second categorical variable. Maximum 20 groups. Switching the two variables will not change the result of chi-square test.
- kind
str
The way to summarize the contingency table.
“count” : Count the frequencies of occurance.
“vertical” : Calculate proportions vertically, so that the sum of each column equals 1.
“horizontal” : Calculate proportions horizontally, so that the sum of each row equals 1.
“overall” : Calculate overall proportions, so that the sum of the whole table equals 1.
- data
- Returns:
- summary
pandas.DataFrame
The contingency table of the two categorical variables.
- result
pandas.DataFrame
The degree of freedom, chi-square statistic, and p-value of the test.
- summary
See also
fisher_exact_test
The exact version of chi-square test.
chi_square_test_fit
Test the difference between the observed and expected proportion of a variable.
mantel_haenszel_test
Test the association between two categorical variables in stratified data.
Examples
>>> import biostats as bs >>> data = bs.dataset("chi_square_test.csv") >>> data Genotype Health 0 ins-del disease 1 ins-ins disease 2 ins-del disease 3 ins-ins disease 4 ins-del no_disease ... ... ... 2254 ins-ins no_disease 2255 ins-del disease 2256 ins-del disease 2257 ins-ins disease 2258 ins-ins no_disease
We want to test whether there is an association between Genotype and Health.
>>> summary, result = bs.chi_square_test(data=data, variable_1="Genotype", variable_2="Health", kind="horizontal") >>> summary disease no_disease del-del 0.814159 0.185841 ins-del 0.792276 0.207724 ins-ins 0.750698 0.249302
The proportions of disease in different Genotype are given.
>>> result D.F. Chi Square p-value Normal 2 7.259386 0.026524 *
The p-value < 0.05, so there is a significant association between Genotype and Health. That is, the proportions of disease are different between the three Genotype.