biostats.correlation#

biostats.correlation(data, x, y)[source]#

Test whether there is a correlation between two numeric variables.

Parameters:
datapandas.DataFrame

The input data. Must contain at least two numeric columns.

xstr

The first numeric variable.

ystr

The second numeric variable. Switching the two variables will not change the result.

Returns:
summarypandas.DataFrame

The correlation coefficient and the confidence interval.

resultpandas.DataFrame

The degree of freedom, t statistic, and p-value of the test.

See also

correlation_matrix

Compute the correlation coefficients between every two variables.

simple_linear_regression

Fit an equation that predicts a numeric variable from another numeric variable.

spearman_rank_correlation

The non-parametric version of correlation test.

Examples

>>> import biostats as bs
>>> data = bs.dataset("correlation.csv")
>>> data
    Latitude  Species
0     39.217      128
1     38.800      137
2     39.467      108
3     38.958      118
4     38.600      135
5     38.583       94
6     39.733      113
7     38.033      118
8     38.900       96
9     39.533       98
10    39.133      121
11    38.317      152
12    38.333      108
13    38.367      118
14    37.200      157
15    37.967      125
16    37.667      114

We want to test whether there is a correlation between Latitude and Species.

>>> summary, result = bs.correlation(data=data, x="Latitude", y="Species")
>>> summary
             Coefficient  95% CI: Lower  95% CI: Upper
Correlation    -0.462884      -0.771814       0.022842

The correlation coefficient and the confidence interval are given.

>>> result
       D.F.  t Statistic   p-value      
Model    15    -2.022457  0.061336  <NA>

The p-value > 0.05, so there is no significant correlation between Latitude and Species.