biostats.epidemiologic_study#

biostats.epidemiologic_study(data, disease, disease_target, factor, factor_target)[source]#

Compute some common statistics of an epidemiologic study.

Parameters:

datapandas.DataFrame: The input data. Must contain at least two categorical columns.
diseasestr: The variable specifying the disease. Maximum 20 groups.
disease_targetstr: The group of the disease variable that is considered “positive”.
factorstr: The variable specifying the factor. Maximum 20 groups.
factor_targetstr: The group of the factor variable that is considered “positive”.

Returns:

summarypandas.DataFrame: The contingency table of the disease and factor.
resultpandas.DataFrame: The values and confidence intervals of risk difference, risk ratio (relative risk), odds ratio, and attributable risk.

See also

screening_test: Compute some common statistics of a screening test.
contingency: Compute the contingency table of two categorical variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("epidemiologic_study.csv")
>>> data
             MI Diabetes
0     Not occur       No
1     Not occur       No
2     Not occur       No
3     Not occur       No
4     Not occur       No
...         ...      ...
2993  Not occur       No
2994  Not occur       No
2995  Not occur       No
2996  Not occur       No
2997  Not occur       No

We want to compute the risk ratio, odds ratio and so on of the epidemiologic study that investigates the relation between MI and Diabetes.

>>> summary, result = bs.epidemiologic_study(data=data, disease="MI", disease_target="Occur", factor="Diabetes", factor_target="Yes")
>>> summary
              MI (+)  MI (-)
Diabetes (+)      48     183
Diabetes (-)     210    2557

The contingency table of MI and Diabetes is given.

>>> result
                   Estimation  95% CI: Lower  95% CI: Upper
Risk Difference      0.131898       0.078654       0.185141
Risk Ratio           2.737910       2.062282       3.634880
Odds Ratio           3.193755       2.256038       4.521233
Attributable Risk    0.118094       0.078925       0.173051

The values and confidence intervals of risk ratio, odds ratio and so on are computed.