biostats.multinomial_logistic_regression#

biostats.multinomial_logistic_regression(data, x_numeric, x_categorical, y, baseline)[source]#

Fit an equation that predicts a multinomial categorical variable from other variables.

Parameters:

datapandas.DataFrame: The input data. Must contain at least one categorical column and several other columns (can be either numeric or categorical).
x_numericlist: The list of predictor variables that are numeric.
x_categoricallist: The list of predictor variables that are categorical. Maximum 20 groups.
ystr: The response variable. Must be categorical. Maximum 20 groups.
baselinestr or int or float: The baseline group of the categorical variable.

Returns:

summarypandas.DataFrame: The coefficients of the fitted equation, along with the confidence intervals, standard errors, z statistics, and p-values.
resultpandas.DataFrame: The pseudo R-squared and p-value of the fitted model.

See also

multiple_logistic_regression: Fit an equation that predicts a dichotomous categorical variable from other variables.
ordered_logistic_regression: Fit an equation that predicts an ordered categorical variable from other variables.
multiple_linear_regression: Fit an equation that predicts a numeric variable from other variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("multinomial_logistic_regression.csv")
>>> data
     write     ses      prog
0       35     low  vocation
1       33  middle   general
2       39    high  vocation
3       37     low  vocation
4       31  middle  vocation
..     ...     ...       ...
195     65    high  academic
196     63  middle  vocation
197     67  middle  academic
198     65  middle  academic
199     62  middle  academic

We want to fit an equation that predicts prog from write and ses.

>>> summary, result = bs.multinomial_logistic_regression(data=data, x_numeric=["write"], x_categorical=["ses"], y="prog", baseline="academic")
>>> summary
              Coefficient  95% CI: Lower  95% CI: Upper  Std. Error  z Statistic       p-value     
vocation              NaN            NaN            NaN         NaN          NaN           NaN  NaN
Intercept        4.235530       1.874390       6.596670    1.204685     3.515881  4.382977e-04  ***
ses (low)        0.982670      -0.184619       2.149960    0.595567     1.649975  9.894813e-02  NaN
ses (middle)     1.274063       0.272309       2.275818    0.511109     2.492744  1.267601e-02    *
write           -0.113603      -0.157153      -0.070052    0.022220    -5.112653  3.176650e-07  ***
                      NaN            NaN            NaN         NaN          NaN           NaN  NaN
general               NaN            NaN            NaN         NaN          NaN           NaN  NaN
Intercept        1.689354      -0.715399       4.094108    1.226938     1.376887  1.685473e-01  NaN
ses (low)        1.162832       0.154980       2.170684    0.514219     2.261354  2.373737e-02    *
ses (middle)     0.629541      -0.281897       1.540979    0.465028     1.353770  1.758099e-01  NaN
write           -0.057928      -0.099893      -0.015964    0.021411    -2.705551  6.819115e-03   **
                      NaN            NaN            NaN         NaN          NaN           NaN  NaN

The coefficients of the fitted equation, along with confidence intervals and p-values are given.

>>> result
       Pseudo R-Squared       p-value     
Model          0.118155  1.063001e-08  ***

The p-value < 0.001, so there is a significant relation between the predictor and response variables.