biostats.simple_logistic_regression#

biostats.simple_logistic_regression(data, x, y, target)[source]#

Fit an equation that predicts a dichotomous categorical variable from a numeric variable.

Parameters:

datapandas.DataFrame: The input data. Must contain at least one categorical column and one numeric column.
xstr: The predictor variable. Must be numeric.
ystr: The response variable. Must be categorical. Maximum 20 groups.
targetstr or int or float: The target group of the categorical variable.

Returns:

summarypandas.DataFrame: The coefficients of the fitted equation, along with the confidence intervals, standard errors, z statistics, and p-values.
resultpandas.DataFrame: The pseudo R-squared and p-value of the fitted model.

See also

multiple_logistic_regression: Fit an equation that predicts a dichotomous categorical variable from other variables.
ordered_logistic_regression: Fit an equation that predicts an ordered categorical variable from other variables.
multinomial_logistic_regression: Fit an equation that predicts a multinomial categorical variable from other variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("simple_logistic_regression.csv")
>>> data
    Continuous Factor
       62.0      A
       63.0      A
       64.0      A
       65.0      A
       66.0      A
       67.0      A
       68.0      A
       69.0      A
       70.0      A
       71.0      A
      72.0      A
      73.0      A
      74.0      A
      75.0      A
      72.5      B
      73.5      B
      74.5      B
      75.0      B
      76.0      B
      77.0      B
      78.0      B
      79.0      B
      80.0      B
      81.0      B
      82.0      B
      83.0      B
      84.0      B
      85.0      B
      86.0      B

We want to fit an equation that predicts Factor from Continuous.

>>> summary, result = bs.simple_logistic_regression(data=data, x="Continuous", y="Factor", target="B")
>>> summary
            Coefficient  95% CI: Lower  95% CI: Upper  Std. Error  z Statistic   p-value   
Intercept    -66.498134    -129.959199      -3.037069   32.378689    -2.053762  0.039999  *
Continuous     0.902667       0.042352       1.762982    0.438945     2.056449  0.039739  *

The coefficients of the fitted equation, along with confidence intervals and p-values are given.

>>> result
       Pseudo R-Squared       p-value     
Model          0.697579  1.200433e-07  ***

The p-value < 0.001, so there is a significant relation between the predictor and response variables.