biostats.multiple_logistic_regression#

biostats.multiple_logistic_regression(data, x_numeric, x_categorical, y, target)[source]#

Fit an equation that predicts a dichotomous categorical variable from other variables.

Parameters:

datapandas.DataFrame: The input data. Must contain at least one categorical column and several other columns (can be either numeric or categorical).
x_numericlist: The list of predictor variables that are numeric.
x_categoricallist: The list of predictor variables that are categorical. Maximum 20 groups.
ystr: The response variable. Must be categorical. Maximum 20 groups.
targetstr or int or float: The target group of the categorical variable.

Returns:

summarypandas.DataFrame: The coefficients of the fitted equation, along with the confidence intervals, standard errors, z statistics, and p-values.
resultpandas.DataFrame: The pseudo R-squared and p-value of the fitted model.

See also

ordered_logistic_regression: Fit an equation that predicts an ordered categorical variable from other variables.
multinomial_logistic_regression: Fit an equation that predicts a multinomial categorical variable from other variables.
multiple_linear_regression: Fit an equation that predicts a numeric variable from other variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("multiple_logistic_regression.csv")
>>> data
    Upland  Migr    Mass  Indiv  Insect  Wood  Status
0        0     1  9600.0     29      12     0       1
1        0     1  5000.0     85       0     0       1
2        0     1  3360.0      8       0     0       1
3        0     3  2517.0     10      12     0       0
4        0     3  3170.0      7       0     0       0
..     ...   ...     ...    ...     ...   ...     ...
74       0     1    23.6     29      12     1       1
75       0     1    20.7      9      12     0       0
76       0     3    31.0      2      12     1       0
77       0     2    36.9      2       8     0       0
78       0     2   106.5      2      12     0       0

We want to fit an equation that predicts Status from Upland, Migr, Mass, Indiv, Insect, and Wood.

>>> summary, result = bs.multiple_logistic_regression(data=data, x_numeric=["Upland", "Migr", "Mass", "Indiv", "Insect", "Wood"], x_categorical=[], y="Status", target=1)
>>> summary
           Coefficient  95% CI: Lower  95% CI: Upper  Std. Error  z Statistic   p-value     
Intercept    -3.549648      -7.631768       0.532472    2.082753    -1.704306  0.088324  NaN
Upland       -4.548429      -8.608058      -0.488800    2.071277    -2.195954  0.028095    *
Migr         -1.818405      -3.450219      -0.186591    0.832573    -2.184077  0.028957    *
Mass          0.001903       0.000521       0.003284    0.000705     2.699675  0.006941   **
Indiv         0.013706       0.006120       0.021292    0.003870     3.541316  0.000398  ***
Insect        0.239472      -0.029721       0.508666    0.137346     1.743566  0.081235  NaN
Wood          1.813444      -0.755285       4.382174    1.310601     1.383674  0.166458  NaN

The coefficients of the fitted equation, along with confidence intervals and p-values are given.

>>> result
       Pseudo R-Squared       p-value     
Model           0.67443  1.125372e-11  ***

The p-value < 0.001, so there is a significant relation between the predictor and response variables.