biostats.multiple_linear_regression#
- biostats.multiple_linear_regression(data, x_numeric, x_categorical, y)[source]#
Fit an equation that predicts a numeric variable from other variables.
- Parameters:
- data
pandas.DataFrame
The input data. Must contain at least one numeric column and several other columns (can be either numeric or categorical).
- x_numeric
list
The list of predictor variables that are numeric.
- x_categorical
list
The list of predictor variables that are categorical. Maximum 20 groups.
- y
str
The response variable. Must be numeric.
- data
- Returns:
- summary
pandas.DataFrame
The coefficients of the fitted equation, along with the confidence intervals, standard errors, t statistics, and p-values.
- result
pandas.DataFrame
The R-squared, adjusted R-squared, F statistic, and p-value of the fitted model.
- summary
See also
multiple_logistic_regression
Fit an equation that predicts a categorical variable from other variables.
correlation_matrix
Compute the correlation coefficients between every two variables.
Examples
>>> import biostats as bs >>> data = bs.dataset("multiple_linear_regression.csv") >>> data Acerage Maxdepth NO3 Longnose 0 2528 80 2.28 13 1 3333 83 5.34 12 2 19611 96 0.99 54 3 3570 56 5.44 19 4 1722 43 5.66 37 .. ... ... ... ... 63 6311 46 0.64 2 64 1450 60 2.96 26 65 4106 96 2.62 20 66 10274 90 5.45 38 67 510 82 5.25 19
We want to fit an equation that predicts Longnose from Acerage, Maxdepth, and NO3.
>>> summary, result = bs.multiple_linear_regression(data=data, x_numeric=["Acerage", "Maxdepth", "NO3"], x_categorical=[], y="Longnose") >>> summary Coefficient 95% CI: Lower 95% CI: Upper Std. Error t Statistic p-value Intercept -23.829067 -54.342374 6.684240 15.273992 -1.560107 0.123666 NaN Acerage 0.001988 0.000641 0.003334 0.000674 2.947947 0.004461 ** Maxdepth 0.336605 -0.018134 0.691344 0.177571 1.895610 0.062529 NaN NO3 8.673044 3.132716 14.213372 2.773312 3.127323 0.002654 **
The coefficients of the fitted equation, along with confidence intervals and p-values are given.
>>> result R-Squared Adj. R-Squared F Statistic p-value Model 0.279826 0.246068 8.289157 0.000097 ***
The p-value < 0.001, so there is a significant relation between the predictor and response variables.