biostats.contingency#

biostats.contingency(data, variable_1, variable_2, kind='count')[source]#

Compute the contingency table of two categorical variables.

Parameters:
datapandas.DataFrame

The input data. Must contain at least two categorical columns.

variable_1str

The first categorical variable. Maximum 20 groups.

variable_2str

The second categorical variable. Maximum 20 groups.

kindstr

The way to summarize the contingency table.

  • “count” : Count the frequencies of occurance.

  • “vertical” : Calculate proportions vertically, so that the sum of each column equals 1.

  • “horizontal” : Calculate proportions horizontally, so that the sum of each row equals 1.

  • “overall” : Calculate overall proportions, so that the sum of the whole table equals 1.

Returns:
resultpandas.DataFrame

The contingency table of the two categorical variables.

See also

categorical

Compute descriptive statistics of a categorical variable.

chi_square_test

Test whether there is an association between two categorical variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("contingency.csv")
>>> data
     Genotype      Health
0     ins-del     disease
1     ins-ins     disease
2     ins-del     disease
3     ins-ins     disease
4     ins-del  no_disease
...       ...         ...
2254  ins-ins  no_disease
2255  ins-del     disease
2256  ins-del     disease
2257  ins-ins     disease
2258  ins-ins  no_disease

We want to compute the contingency table of Genotype and Health.

>>> result = bs.contingency(data=data, variable_1="Genotype", variable_2="Health", kind="count")
>>> result
         disease  no_disease
del-del      184          42
ins-del      759         199
ins-ins      807         268