biostats.epidemiologic_study#
- biostats.epidemiologic_study(data, disease, disease_target, factor, factor_target)[source]#
Compute some common statistics of an epidemiologic study.
- Parameters:
- data
pandas.DataFrame
The input data. Must contain at least two categorical columns.
- disease
str
The variable specifying the disease. Maximum 20 groups.
- disease_target
str
The group of the disease variable that is considered “positive”.
- factor
str
The variable specifying the factor. Maximum 20 groups.
- factor_target
str
The group of the factor variable that is considered “positive”.
- data
- Returns:
- summary
pandas.DataFrame
The contingency table of the disease and factor.
- result
pandas.DataFrame
The values and confidence intervals of risk difference, risk ratio (relative risk), odds ratio, and attributable risk.
- summary
See also
screening_test
Compute some common statistics of a screening test.
contingency
Compute the contingency table of two categorical variables.
Examples
>>> import biostats as bs >>> data = bs.dataset("epidemiologic_study.csv") >>> data MI Diabetes 0 Not occur No 1 Not occur No 2 Not occur No 3 Not occur No 4 Not occur No ... ... ... 2993 Not occur No 2994 Not occur No 2995 Not occur No 2996 Not occur No 2997 Not occur No
We want to compute the risk ratio, odds ratio and so on of the epidemiologic study that investigates the relation between MI and Diabetes.
>>> summary, result = bs.epidemiologic_study(data=data, disease="MI", disease_target="Occur", factor="Diabetes", factor_target="Yes") >>> summary MI (+) MI (-) Diabetes (+) 48 183 Diabetes (-) 210 2557
The contingency table of MI and Diabetes is given.
>>> result Estimation 95% CI: Lower 95% CI: Upper Risk Difference 0.131898 0.078654 0.185141 Risk Ratio 2.737910 2.062282 3.634880 Odds Ratio 3.193755 2.256038 4.521233 Attributable Risk 0.118094 0.078925 0.173051
The values and confidence intervals of risk ratio, odds ratio and so on are computed.