biostats.screening_test#

biostats.screening_test(data, disease, disease_target, test, test_target)[source]#

Compute some common statistics of a screening test.

Parameters:
datapandas.DataFrame

The input data. Must contain at least two categorical columns.

diseasestr

The variable specifying the disease (or condition). Maximum 20 groups.

disease_targetstr

The group of the disease variable that is considered “positive”.

teststr

The variable specifying the test (or symptom). Maximum 20 groups.

test_targetstr

The group of the test variable that is considered “positive”.

Returns:
summarypandas.DataFrame

The contingency table of true positive, true negative, false positive, and false negative.

resultpandas.DataFrame

The values and confidence intervals of sensitivity (recall), specificity (selectivity), positive predictive value (precision), negative predictive value, accuracy, and prevalence.

See also

epidemiologic_study

Compute some common statistics of an epidemiologic study.

contingency

Compute the contingency table of two categorical variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("screening_test.csv")
>>> data
      Cancer  PSA Test
0    Present  Negative
1     Absent  Negative
2    Present  Positive
3     Absent  Negative
4    Present  Positive
..       ...       ...
232  Present  Negative
233  Present  Positive
234  Present  Negative
235   Absent  Negative
236  Present  Positive

We want to compute the sensitivity, specificity and so on of the screening test that detect Cancer from PSA Test.

>>> summary, result = bs.screening_test(data=data, disease="Cancer", disease_target="Present", test="PSA Test", test_target="Positive")
>>> summary
              Cancer (+)  Cancer (-)
PSA Test (+)          92          27
PSA Test (-)          46          72

The contingency table of TP (true positive), TN, FP and FN is given.

>>> result
             Estimation  95% CI: Lower  95% CI: Upper
Sensitivity    0.666667       0.584443       0.739862
Specificity    0.727273       0.632291       0.805276
Positive PV    0.773109       0.690014       0.839123
Negative PV    0.610169       0.520027       0.693365
Accuracy       0.691983       0.630534       0.747308
Prevalence     0.582278       0.518666       0.643266

The values and confidence intervals of sensitivity, specificity and so on are computed.