biostats.multiple_logistic_regression#

biostats.multiple_logistic_regression(data, x_numeric, x_categorical, y, target)[source]#

Fit an equation that predicts a dichotomous categorical variable from other variables.

Parameters:
datapandas.DataFrame

The input data. Must contain at least one categorical column and several other columns (can be either numeric or categorical).

x_numericlist

The list of predictor variables that are numeric.

x_categoricallist

The list of predictor variables that are categorical. Maximum 20 groups.

ystr

The response variable. Must be categorical. Maximum 20 groups.

targetstr or int or float

The target group of the categorical variable.

Returns:
summarypandas.DataFrame

The coefficients of the fitted equation, along with the confidence intervals, standard errors, z statistics, and p-values.

resultpandas.DataFrame

The pseudo R-squared and p-value of the fitted model.

See also

ordered_logistic_regression

Fit an equation that predicts an ordered categorical variable from other variables.

multinomial_logistic_regression

Fit an equation that predicts a multinomial categorical variable from other variables.

multiple_linear_regression

Fit an equation that predicts a numeric variable from other variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("multiple_logistic_regression.csv")
>>> data
    Upland  Migr    Mass  Indiv  Insect  Wood  Status
0        0     1  9600.0     29      12     0       1
1        0     1  5000.0     85       0     0       1
2        0     1  3360.0      8       0     0       1
3        0     3  2517.0     10      12     0       0
4        0     3  3170.0      7       0     0       0
..     ...   ...     ...    ...     ...   ...     ...
74       0     1    23.6     29      12     1       1
75       0     1    20.7      9      12     0       0
76       0     3    31.0      2      12     1       0
77       0     2    36.9      2       8     0       0
78       0     2   106.5      2      12     0       0

We want to fit an equation that predicts Status from Upland, Migr, Mass, Indiv, Insect, and Wood.

>>> summary, result = bs.multiple_logistic_regression(data=data, x_numeric=["Upland", "Migr", "Mass", "Indiv", "Insect", "Wood"], x_categorical=[], y="Status", target=1)
>>> summary
           Coefficient  95% CI: Lower  95% CI: Upper  Std. Error  z Statistic   p-value     
Intercept    -3.549648      -7.631768       0.532472    2.082753    -1.704306  0.088324  NaN
Upland       -4.548429      -8.608058      -0.488800    2.071277    -2.195954  0.028095    *
Migr         -1.818405      -3.450219      -0.186591    0.832573    -2.184077  0.028957    *
Mass          0.001903       0.000521       0.003284    0.000705     2.699675  0.006941   **
Indiv         0.013706       0.006120       0.021292    0.003870     3.541316  0.000398  ***
Insect        0.239472      -0.029721       0.508666    0.137346     1.743566  0.081235  NaN
Wood          1.813444      -0.755285       4.382174    1.310601     1.383674  0.166458  NaN

The coefficients of the fitted equation, along with confidence intervals and p-values are given.

>>> result
       Pseudo R-Squared       p-value     
Model           0.67443  1.125372e-11  ***

The p-value < 0.001, so there is a significant relation between the predictor and response variables.