biostats.contingency#
- biostats.contingency(data, variable_1, variable_2, kind='count')[source]#
Compute the contingency table of two categorical variables.
- Parameters:
- data
pandas.DataFrame
The input data. Must contain at least two categorical columns.
- variable_1
str
The first categorical variable. Maximum 20 groups.
- variable_2
str
The second categorical variable. Maximum 20 groups.
- kind
str
The way to summarize the contingency table.
“count” : Count the frequencies of occurance.
“vertical” : Calculate proportions vertically, so that the sum of each column equals 1.
“horizontal” : Calculate proportions horizontally, so that the sum of each row equals 1.
“overall” : Calculate overall proportions, so that the sum of the whole table equals 1.
- data
- Returns:
- result
pandas.DataFrame
The contingency table of the two categorical variables.
- result
See also
categorical
Compute descriptive statistics of a categorical variable.
chi_square_test
Test whether there is an association between two categorical variables.
Examples
>>> import biostats as bs >>> data = bs.dataset("contingency.csv") >>> data Genotype Health 0 ins-del disease 1 ins-ins disease 2 ins-del disease 3 ins-ins disease 4 ins-del no_disease ... ... ... 2254 ins-ins no_disease 2255 ins-del disease 2256 ins-del disease 2257 ins-ins disease 2258 ins-ins no_disease
We want to compute the contingency table of Genotype and Health.
>>> result = bs.contingency(data=data, variable_1="Genotype", variable_2="Health", kind="count") >>> result disease no_disease del-del 184 42 ins-del 759 199 ins-ins 807 268