biostats.multinomial_logistic_regression#
- biostats.multinomial_logistic_regression(data, x_numeric, x_categorical, y, baseline)[source]#
Fit an equation that predicts a multinomial categorical variable from other variables.
- Parameters:
- data
pandas.DataFrame
The input data. Must contain at least one categorical column and several other columns (can be either numeric or categorical).
- x_numeric
list
The list of predictor variables that are numeric.
- x_categorical
list
The list of predictor variables that are categorical. Maximum 20 groups.
- y
str
The response variable. Must be categorical. Maximum 20 groups.
- baseline
str
orint
orfloat
The baseline group of the categorical variable.
- data
- Returns:
- summary
pandas.DataFrame
The coefficients of the fitted equation, along with the confidence intervals, standard errors, z statistics, and p-values.
- result
pandas.DataFrame
The pseudo R-squared and p-value of the fitted model.
- summary
See also
multiple_logistic_regression
Fit an equation that predicts a dichotomous categorical variable from other variables.
ordered_logistic_regression
Fit an equation that predicts an ordered categorical variable from other variables.
multiple_linear_regression
Fit an equation that predicts a numeric variable from other variables.
Examples
>>> import biostats as bs >>> data = bs.dataset("multinomial_logistic_regression.csv") >>> data write ses prog 0 35 low vocation 1 33 middle general 2 39 high vocation 3 37 low vocation 4 31 middle vocation .. ... ... ... 195 65 high academic 196 63 middle vocation 197 67 middle academic 198 65 middle academic 199 62 middle academic
We want to fit an equation that predicts prog from write and ses.
>>> summary, result = bs.multinomial_logistic_regression(data=data, x_numeric=["write"], x_categorical=["ses"], y="prog", baseline="academic") >>> summary Coefficient 95% CI: Lower 95% CI: Upper Std. Error z Statistic p-value vocation NaN NaN NaN NaN NaN NaN NaN Intercept 4.235530 1.874390 6.596670 1.204685 3.515881 4.382977e-04 *** ses (low) 0.982670 -0.184619 2.149960 0.595567 1.649975 9.894813e-02 NaN ses (middle) 1.274063 0.272309 2.275818 0.511109 2.492744 1.267601e-02 * write -0.113603 -0.157153 -0.070052 0.022220 -5.112653 3.176650e-07 *** NaN NaN NaN NaN NaN NaN NaN general NaN NaN NaN NaN NaN NaN NaN Intercept 1.689354 -0.715399 4.094108 1.226938 1.376887 1.685473e-01 NaN ses (low) 1.162832 0.154980 2.170684 0.514219 2.261354 2.373737e-02 * ses (middle) 0.629541 -0.281897 1.540979 0.465028 1.353770 1.758099e-01 NaN write -0.057928 -0.099893 -0.015964 0.021411 -2.705551 6.819115e-03 ** NaN NaN NaN NaN NaN NaN NaN
The coefficients of the fitted equation, along with confidence intervals and p-values are given.
>>> result Pseudo R-Squared p-value Model 0.118155 1.063001e-08 ***
The p-value < 0.001, so there is a significant relation between the predictor and response variables.