biostats.simple_linear_regression#

biostats.simple_linear_regression(data, x, y)[source]#

Fit an equation that predicts a numeric variable from another numeric variable.

Parameters:
datapandas.DataFrame

The input data. Must contain at least two numeric columns.

xstr

The predictor variable. Must be numeric.

ystr

The response variable. Must be numeric.

Returns:
summarypandas.DataFrame

The coefficients of the fitted equation, along with the confidence intervals, standard errors, t statistics, and p-values.

resultpandas.DataFrame

The R-squared, adjusted R-squared, F statistic, and p-value of the fitted model.

See also

multiple_linear_regression

Fit an equation that predicts a numeric variable from other variables.

simple_logistic_regression

Fit an equation that predicts a dichotomous categorical variable from a numeric variable.

correlation

Test the correlation between two numeric variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("simple_linear_regression.csv")
>>> data
    Weight  Eggs
0     5.38    29
1     7.36    23
2     6.13    22
3     4.75    20
4     8.10    25
5     8.62    25
6     6.30    17
7     7.44    24
8     7.26    20
9     7.17    27
10    7.78    24
11    6.23    21
12    5.42    22
13    7.87    22
14    5.25    23
15    7.37    35
16    8.01    27
17    4.92    23
18    7.03    25
19    6.45    24
20    5.06    19
21    6.72    21
22    7.00    20
23    9.39    33
24    6.49    17
25    6.34    21
26    6.16    25
27    5.74    22

We want to fit an equation that predicts Eggs from Weight.

>>> summary, result = bs.simple_linear_regression(data=data, x="Weight", y="Eggs")
>>> summary
           Coefficient  95% CI: Lower  95% CI: Upper  Std. Error  t Statistic   p-value    
Intercept    12.689022       4.054035      21.324009    4.200858     3.020579  0.005598  **
Weight        1.601722       0.332202       2.871243    0.617612     2.593411  0.015401   *

The coefficients of the fitted equation, along with confidence intervals and p-values are given.

>>> result
       R-Squared  Adj. R-Squared  F Statistic   p-value   
Model   0.205519        0.174962      6.72578  0.015401  *

The p-value < 0.05, so there is a significant relation between the predictor and response variables.