biostats.ordered_logistic_regression#

biostats.ordered_logistic_regression(data, x_numeric, x_categorical, y, order)[source]#

Fit an equation that predicts an ordered categorical variable from other variables.

Parameters:

datapandas.DataFrame: The input data. Must contain at least one categorical column and several other columns (can be either numeric or categorical).
x_numericlist: The list of predictor variables that are numeric.
x_categoricallist: The list of predictor variables that are categorical. Maximum 20 groups.
ystr: The response variable. Must be categorical. Maximum 20 groups.
orderdict: The order of groups in the categorical variable.

Returns:

summarypandas.DataFrame: The coefficients of the fitted equation, along with the confidence intervals, standard errors, z statistics, and p-values.
resultpandas.DataFrame: The pseudo R-squared and p-value of the fitted model.

See also

multiple_logistic_regression: Fit an equation that predicts a dichotomous categorical variable from other variables.
multinomial_logistic_regression: Fit an equation that predicts a multinomial categorical variable from other variables.
multiple_linear_regression: Fit an equation that predicts a numeric variable from other variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("ordered_logistic_regression.csv")
>>> data
     pared  public   gpa            apply
0        0       0  3.26      very likely
1        1       0  3.21  somewhat likely
2        1       1  3.94         unlikely
3        0       0  2.81  somewhat likely
4        0       0  2.53  somewhat likely
..     ...     ...   ...              ...
395      0       0  3.70         unlikely
396      0       0  2.63         unlikely
397      0       0  2.25  somewhat likely
398      0       0  3.26  somewhat likely
399      0       0  3.52      very likely

We want to fit an equation that predicts apply from pared, public, and gpa.

>>> summary, result = bs.ordered_logistic_regression(data=data, x_numeric=["pared", "public", "gpa"], x_categorical=[], y="apply", 
...     order={"unlikely":1, "somewhat likely":2, "very likely":3})
>>> summary
                               Coefficient  95% CI: Lower  95% CI: Upper  Std. Error  z Statistic   p-value     
pared                             1.047678       0.526740       1.568616    0.265789     3.941761  0.000081  ***
public                           -0.058675      -0.642471       0.525121    0.297861    -0.196987  0.843838  NaN
gpa                               0.615740       0.104912       1.126568    0.260631     2.362495  0.018152    *
unlikely / somewhat likely        2.203303       0.675441       3.731164         NaN          NaN       NaN  NaN
somewhat likely / very likely     4.298752       2.466471       6.182776         NaN          NaN       NaN  NaN

The coefficients of the fitted equation, along with confidence intervals and p-values are given.

>>> result
       Pseudo R-Squared       p-value     
Model           0.67443  1.125372e-11  ***

The p-value < 0.001, so there is a significant relation between the predictor and response variables.