biostats.multiple_linear_regression#
- biostats.multiple_linear_regression(data, x_numeric, x_categorical, y)[source]#
- Fit an equation that predicts a numeric variable from other variables. - Parameters:
- datapandas.DataFrame
- The input data. Must contain at least one numeric column and several other columns (can be either numeric or categorical). 
- x_numericlist
- The list of predictor variables that are numeric. 
- x_categoricallist
- The list of predictor variables that are categorical. Maximum 20 groups. 
- ystr
- The response variable. Must be numeric. 
 
- data
- Returns:
- summarypandas.DataFrame
- The coefficients of the fitted equation, along with the confidence intervals, standard errors, t statistics, and p-values. 
- resultpandas.DataFrame
- The R-squared, adjusted R-squared, F statistic, and p-value of the fitted model. 
 
- summary
 - See also - multiple_logistic_regression
- Fit an equation that predicts a categorical variable from other variables. 
- correlation_matrix
- Compute the correlation coefficients between every two variables. 
 - Examples - >>> import biostats as bs >>> data = bs.dataset("multiple_linear_regression.csv") >>> data Acerage Maxdepth NO3 Longnose 0 2528 80 2.28 13 1 3333 83 5.34 12 2 19611 96 0.99 54 3 3570 56 5.44 19 4 1722 43 5.66 37 .. ... ... ... ... 63 6311 46 0.64 2 64 1450 60 2.96 26 65 4106 96 2.62 20 66 10274 90 5.45 38 67 510 82 5.25 19 - We want to fit an equation that predicts Longnose from Acerage, Maxdepth, and NO3. - >>> summary, result = bs.multiple_linear_regression(data=data, x_numeric=["Acerage", "Maxdepth", "NO3"], x_categorical=[], y="Longnose") >>> summary Coefficient 95% CI: Lower 95% CI: Upper Std. Error t Statistic p-value Intercept -23.829067 -54.342374 6.684240 15.273992 -1.560107 0.123666 NaN Acerage 0.001988 0.000641 0.003334 0.000674 2.947947 0.004461 ** Maxdepth 0.336605 -0.018134 0.691344 0.177571 1.895610 0.062529 NaN NO3 8.673044 3.132716 14.213372 2.773312 3.127323 0.002654 ** - The coefficients of the fitted equation, along with confidence intervals and p-values are given. - >>> result R-Squared Adj. R-Squared F Statistic p-value Model 0.279826 0.246068 8.289157 0.000097 *** - The p-value < 0.001, so there is a significant relation between the predictor and response variables.