biostats.simple_logistic_regression#

biostats.simple_logistic_regression(data, x, y, target)[source]#

Fit an equation that predicts a dichotomous categorical variable from a numeric variable.

Parameters:
datapandas.DataFrame

The input data. Must contain at least one categorical column and one numeric column.

xstr

The predictor variable. Must be numeric.

ystr

The response variable. Must be categorical. Maximum 20 groups.

targetstr or int or float

The target group of the categorical variable.

Returns:
summarypandas.DataFrame

The coefficients of the fitted equation, along with the confidence intervals, standard errors, z statistics, and p-values.

resultpandas.DataFrame

The pseudo R-squared and p-value of the fitted model.

See also

multiple_logistic_regression

Fit an equation that predicts a dichotomous categorical variable from other variables.

ordered_logistic_regression

Fit an equation that predicts an ordered categorical variable from other variables.

multinomial_logistic_regression

Fit an equation that predicts a multinomial categorical variable from other variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("simple_logistic_regression.csv")
>>> data
    Continuous Factor
0         62.0      A
1         63.0      A
2         64.0      A
3         65.0      A
4         66.0      A
5         67.0      A
6         68.0      A
7         69.0      A
8         70.0      A
9         71.0      A
10        72.0      A
11        73.0      A
12        74.0      A
13        75.0      A
14        72.5      B
15        73.5      B
16        74.5      B
17        75.0      B
18        76.0      B
19        77.0      B
20        78.0      B
21        79.0      B
22        80.0      B
23        81.0      B
24        82.0      B
25        83.0      B
26        84.0      B
27        85.0      B
28        86.0      B

We want to fit an equation that predicts Factor from Continuous.

>>> summary, result = bs.simple_logistic_regression(data=data, x="Continuous", y="Factor", target="B")
>>> summary
            Coefficient  95% CI: Lower  95% CI: Upper  Std. Error  z Statistic   p-value   
Intercept    -66.498134    -129.959199      -3.037069   32.378689    -2.053762  0.039999  *
Continuous     0.902667       0.042352       1.762982    0.438945     2.056449  0.039739  *

The coefficients of the fitted equation, along with confidence intervals and p-values are given.

>>> result
       Pseudo R-Squared       p-value     
Model          0.697579  1.200433e-07  ***

The p-value < 0.001, so there is a significant relation between the predictor and response variables.