biostats.simple_linear_regression#

biostats.simple_linear_regression(data, x, y)[source]#

Fit an equation that predicts a numeric variable from another numeric variable.

Parameters:

datapandas.DataFrame: The input data. Must contain at least two numeric columns.
xstr: The predictor variable. Must be numeric.
ystr: The response variable. Must be numeric.

Returns:

summarypandas.DataFrame: The coefficients of the fitted equation, along with the confidence intervals, standard errors, t statistics, and p-values.
resultpandas.DataFrame: The R-squared, adjusted R-squared, F statistic, and p-value of the fitted model.

See also

multiple_linear_regression: Fit an equation that predicts a numeric variable from other variables.
simple_logistic_regression: Fit an equation that predicts a dichotomous categorical variable from a numeric variable.
correlation: Test the correlation between two numeric variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("simple_linear_regression.csv")
>>> data
    Weight  Eggs
   5.38    29
   7.36    23
   6.13    22
   4.75    20
   8.10    25
   8.62    25
   6.30    17
   7.44    24
   7.26    20
   7.17    27
  7.78    24
  6.23    21
  5.42    22
  7.87    22
  5.25    23
  7.37    35
  8.01    27
  4.92    23
  7.03    25
  6.45    24
  5.06    19
  6.72    21
  7.00    20
  9.39    33
  6.49    17
  6.34    21
  6.16    25
  5.74    22

We want to fit an equation that predicts Eggs from Weight.

>>> summary, result = bs.simple_linear_regression(data=data, x="Weight", y="Eggs")
>>> summary
           Coefficient  95% CI: Lower  95% CI: Upper  Std. Error  t Statistic   p-value    
Intercept    12.689022       4.054035      21.324009    4.200858     3.020579  0.005598  **
Weight        1.601722       0.332202       2.871243    0.617612     2.593411  0.015401   *

The coefficients of the fitted equation, along with confidence intervals and p-values are given.

>>> result
       R-Squared  Adj. R-Squared  F Statistic   p-value   
Model   0.205519        0.174962      6.72578  0.015401  *

The p-value < 0.05, so there is a significant relation between the predictor and response variables.