biostats.multiple_logistic_regression#
- biostats.multiple_logistic_regression(data, x_numeric, x_categorical, y, target)[source]#
Fit an equation that predicts a dichotomous categorical variable from other variables.
- Parameters:
- data
pandas.DataFrame
The input data. Must contain at least one categorical column and several other columns (can be either numeric or categorical).
- x_numeric
list
The list of predictor variables that are numeric.
- x_categorical
list
The list of predictor variables that are categorical. Maximum 20 groups.
- y
str
The response variable. Must be categorical. Maximum 20 groups.
- target
str
orint
orfloat
The target group of the categorical variable.
- data
- Returns:
- summary
pandas.DataFrame
The coefficients of the fitted equation, along with the confidence intervals, standard errors, z statistics, and p-values.
- result
pandas.DataFrame
The pseudo R-squared and p-value of the fitted model.
- summary
See also
ordered_logistic_regression
Fit an equation that predicts an ordered categorical variable from other variables.
multinomial_logistic_regression
Fit an equation that predicts a multinomial categorical variable from other variables.
multiple_linear_regression
Fit an equation that predicts a numeric variable from other variables.
Examples
>>> import biostats as bs >>> data = bs.dataset("multiple_logistic_regression.csv") >>> data Upland Migr Mass Indiv Insect Wood Status 0 0 1 9600.0 29 12 0 1 1 0 1 5000.0 85 0 0 1 2 0 1 3360.0 8 0 0 1 3 0 3 2517.0 10 12 0 0 4 0 3 3170.0 7 0 0 0 .. ... ... ... ... ... ... ... 74 0 1 23.6 29 12 1 1 75 0 1 20.7 9 12 0 0 76 0 3 31.0 2 12 1 0 77 0 2 36.9 2 8 0 0 78 0 2 106.5 2 12 0 0
We want to fit an equation that predicts Status from Upland, Migr, Mass, Indiv, Insect, and Wood.
>>> summary, result = bs.multiple_logistic_regression(data=data, x_numeric=["Upland", "Migr", "Mass", "Indiv", "Insect", "Wood"], x_categorical=[], y="Status", target=1) >>> summary Coefficient 95% CI: Lower 95% CI: Upper Std. Error z Statistic p-value Intercept -3.549648 -7.631768 0.532472 2.082753 -1.704306 0.088324 NaN Upland -4.548429 -8.608058 -0.488800 2.071277 -2.195954 0.028095 * Migr -1.818405 -3.450219 -0.186591 0.832573 -2.184077 0.028957 * Mass 0.001903 0.000521 0.003284 0.000705 2.699675 0.006941 ** Indiv 0.013706 0.006120 0.021292 0.003870 3.541316 0.000398 *** Insect 0.239472 -0.029721 0.508666 0.137346 1.743566 0.081235 NaN Wood 1.813444 -0.755285 4.382174 1.310601 1.383674 0.166458 NaN
The coefficients of the fitted equation, along with confidence intervals and p-values are given.
>>> result Pseudo R-Squared p-value Model 0.67443 1.125372e-11 ***
The p-value < 0.001, so there is a significant relation between the predictor and response variables.