biostats.screening_test#

biostats.screening_test(data, disease, disease_target, test, test_target)[source]#

Compute some common statistics of a screening test.

Parameters:

datapandas.DataFrame: The input data. Must contain at least two categorical columns.
diseasestr: The variable specifying the disease (or condition). Maximum 20 groups.
disease_targetstr: The group of the disease variable that is considered “positive”.
teststr: The variable specifying the test (or symptom). Maximum 20 groups.
test_targetstr: The group of the test variable that is considered “positive”.

Returns:

summarypandas.DataFrame: The contingency table of true positive, true negative, false positive, and false negative.
resultpandas.DataFrame: The values and confidence intervals of sensitivity (recall), specificity (selectivity), positive predictive value (precision), negative predictive value, accuracy, and prevalence.

See also

epidemiologic_study: Compute some common statistics of an epidemiologic study.
contingency: Compute the contingency table of two categorical variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("screening_test.csv")
>>> data
      Cancer  PSA Test
0    Present  Negative
1     Absent  Negative
2    Present  Positive
3     Absent  Negative
4    Present  Positive
..       ...       ...
232  Present  Negative
233  Present  Positive
234  Present  Negative
235   Absent  Negative
236  Present  Positive

We want to compute the sensitivity, specificity and so on of the screening test that detect Cancer from PSA Test.

>>> summary, result = bs.screening_test(data=data, disease="Cancer", disease_target="Present", test="PSA Test", test_target="Positive")
>>> summary
              Cancer (+)  Cancer (-)
PSA Test (+)          92          27
PSA Test (-)          46          72

The contingency table of TP (true positive), TN, FP and FN is given.

>>> result
             Estimation  95% CI: Lower  95% CI: Upper
Sensitivity    0.666667       0.584443       0.739862
Specificity    0.727273       0.632291       0.805276
Positive PV    0.773109       0.690014       0.839123
Negative PV    0.610169       0.520027       0.693365
Accuracy       0.691983       0.630534       0.747308
Prevalence     0.582278       0.518666       0.643266

The values and confidence intervals of sensitivity, specificity and so on are computed.