biostats.correlation#
- biostats.correlation(data, x, y)[source]#
Test whether there is a correlation between two numeric variables.
- Parameters:
- data
pandas.DataFrame
The input data. Must contain at least two numeric columns.
- x
str
The first numeric variable.
- y
str
The second numeric variable. Switching the two variables will not change the result.
- data
- Returns:
- summary
pandas.DataFrame
The correlation coefficient and the confidence interval.
- result
pandas.DataFrame
The degree of freedom, t statistic, and p-value of the test.
- summary
See also
correlation_matrix
Compute the correlation coefficients between every two variables.
simple_linear_regression
Fit an equation that predicts a numeric variable from another numeric variable.
spearman_rank_correlation
The non-parametric version of correlation test.
Examples
>>> import biostats as bs >>> data = bs.dataset("correlation.csv") >>> data Latitude Species 0 39.217 128 1 38.800 137 2 39.467 108 3 38.958 118 4 38.600 135 5 38.583 94 6 39.733 113 7 38.033 118 8 38.900 96 9 39.533 98 10 39.133 121 11 38.317 152 12 38.333 108 13 38.367 118 14 37.200 157 15 37.967 125 16 37.667 114
We want to test whether there is a correlation between Latitude and Species.
>>> summary, result = bs.correlation(data=data, x="Latitude", y="Species") >>> summary Coefficient 95% CI: Lower 95% CI: Upper Correlation -0.462884 -0.771814 0.022842
The correlation coefficient and the confidence interval are given.
>>> result D.F. t Statistic p-value Model 15 -2.022457 0.061336 <NA>
The p-value > 0.05, so there is no significant correlation between Latitude and Species.