biostats.multiple_linear_regression#

biostats.multiple_linear_regression(data, x_numeric, x_categorical, y)[source]#

Fit an equation that predicts a numeric variable from other variables.

Parameters:
datapandas.DataFrame

The input data. Must contain at least one numeric column and several other columns (can be either numeric or categorical).

x_numericlist

The list of predictor variables that are numeric.

x_categoricallist

The list of predictor variables that are categorical. Maximum 20 groups.

ystr

The response variable. Must be numeric.

Returns:
summarypandas.DataFrame

The coefficients of the fitted equation, along with the confidence intervals, standard errors, t statistics, and p-values.

resultpandas.DataFrame

The R-squared, adjusted R-squared, F statistic, and p-value of the fitted model.

See also

multiple_logistic_regression

Fit an equation that predicts a categorical variable from other variables.

correlation_matrix

Compute the correlation coefficients between every two variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("multiple_linear_regression.csv")
>>> data
    Acerage  Maxdepth   NO3  Longnose
0      2528        80  2.28        13
1      3333        83  5.34        12
2     19611        96  0.99        54
3      3570        56  5.44        19
4      1722        43  5.66        37
..      ...       ...   ...       ...
63     6311        46  0.64         2
64     1450        60  2.96        26
65     4106        96  2.62        20
66    10274        90  5.45        38
67      510        82  5.25        19

We want to fit an equation that predicts Longnose from Acerage, Maxdepth, and NO3.

>>> summary, result = bs.multiple_linear_regression(data=data, x_numeric=["Acerage", "Maxdepth", "NO3"], x_categorical=[], y="Longnose")
>>> summary
           Coefficient  95% CI: Lower  95% CI: Upper  Std. Error  t Statistic   p-value     
Intercept   -23.829067     -54.342374       6.684240   15.273992    -1.560107  0.123666  NaN
Acerage       0.001988       0.000641       0.003334    0.000674     2.947947  0.004461   **
Maxdepth      0.336605      -0.018134       0.691344    0.177571     1.895610  0.062529  NaN
NO3           8.673044       3.132716      14.213372    2.773312     3.127323  0.002654   **

The coefficients of the fitted equation, along with confidence intervals and p-values are given.

>>> result
       R-Squared  Adj. R-Squared  F Statistic   p-value     
Model   0.279826        0.246068     8.289157  0.000097  ***

The p-value < 0.001, so there is a significant relation between the predictor and response variables.