biostats.ordered_logistic_regression#

biostats.ordered_logistic_regression(data, x_numeric, x_categorical, y, order)[source]#

Fit an equation that predicts an ordered categorical variable from other variables.

Parameters:
datapandas.DataFrame

The input data. Must contain at least one categorical column and several other columns (can be either numeric or categorical).

x_numericlist

The list of predictor variables that are numeric.

x_categoricallist

The list of predictor variables that are categorical. Maximum 20 groups.

ystr

The response variable. Must be categorical. Maximum 20 groups.

orderdict

The order of groups in the categorical variable.

Returns:
summarypandas.DataFrame

The coefficients of the fitted equation, along with the confidence intervals, standard errors, z statistics, and p-values.

resultpandas.DataFrame

The pseudo R-squared and p-value of the fitted model.

See also

multiple_logistic_regression

Fit an equation that predicts a dichotomous categorical variable from other variables.

multinomial_logistic_regression

Fit an equation that predicts a multinomial categorical variable from other variables.

multiple_linear_regression

Fit an equation that predicts a numeric variable from other variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("ordered_logistic_regression.csv")
>>> data
     pared  public   gpa            apply
0        0       0  3.26      very likely
1        1       0  3.21  somewhat likely
2        1       1  3.94         unlikely
3        0       0  2.81  somewhat likely
4        0       0  2.53  somewhat likely
..     ...     ...   ...              ...
395      0       0  3.70         unlikely
396      0       0  2.63         unlikely
397      0       0  2.25  somewhat likely
398      0       0  3.26  somewhat likely
399      0       0  3.52      very likely

We want to fit an equation that predicts apply from pared, public, and gpa.

>>> summary, result = bs.ordered_logistic_regression(data=data, x_numeric=["pared", "public", "gpa"], x_categorical=[], y="apply", 
...     order={"unlikely":1, "somewhat likely":2, "very likely":3})
>>> summary
                               Coefficient  95% CI: Lower  95% CI: Upper  Std. Error  z Statistic   p-value     
pared                             1.047678       0.526740       1.568616    0.265789     3.941761  0.000081  ***
public                           -0.058675      -0.642471       0.525121    0.297861    -0.196987  0.843838  NaN
gpa                               0.615740       0.104912       1.126568    0.260631     2.362495  0.018152    *
unlikely / somewhat likely        2.203303       0.675441       3.731164         NaN          NaN       NaN  NaN
somewhat likely / very likely     4.298752       2.466471       6.182776         NaN          NaN       NaN  NaN

The coefficients of the fitted equation, along with confidence intervals and p-values are given.

>>> result
       Pseudo R-Squared       p-value     
Model           0.67443  1.125372e-11  ***

The p-value < 0.001, so there is a significant relation between the predictor and response variables.