biostats.correlation_matrix#
- biostats.correlation_matrix(data, variable)[source]#
Compute the correlation coefficients between every two variables.
- Parameters:
- data
pandas.DataFrame
The input data. Must contain at least two numeric columns.
- variable
list
The list of numeric variables.
- data
- Returns:
- summary
pandas.DataFrame
The correlation matrix.
- summary
See also
correlation
Test whether there is a correlation between two numeric variables.
multiple_linear_regression
Fit an equation that predicts a numeric variable from other variables.
Examples
>>> import biostats as bs >>> data = bs.dataset("correlation_matrix.csv") >>> data Longnose Acerage DO2 Maxdepth NO3 SO4 Temp 0 13 2528 9.6 80 2.28 16.75 15.3 1 12 3333 8.5 83 5.34 7.74 19.4 2 54 19611 8.3 96 0.99 10.92 19.5 3 19 3570 9.2 56 5.44 16.53 17.0 4 37 1722 8.1 43 5.66 5.91 19.3 .. ... ... ... ... ... ... ... 63 2 6311 7.6 46 0.64 21.16 18.5 64 26 1450 7.9 60 2.96 8.84 18.6 65 20 4106 10.0 96 2.62 5.45 15.4 66 38 10274 9.3 90 5.45 24.76 15.0 67 19 510 6.7 82 5.25 14.19 26.5
We want to compute the correlation coefficients between every two variables in the data.
>>> summary = bs.correlation_matrix(data=data, variable=["Longnose","Acerage","DO2","Maxdepth","NO3","SO4","Temp"]) >>> summary Longnose Acerage Temp Maxdepth DO2 SO4 NO3 Longnose 1.000000 0.346506 0.139865 0.304980 0.136157 -0.017380 0.309233 Acerage 0.346506 1.000000 0.003541 0.258624 -0.022433 0.048776 -0.099528 Temp 0.139865 0.003541 1.000000 -0.004895 -0.318865 0.079792 -0.001596 Maxdepth 0.304980 0.258624 -0.004895 1.000000 -0.057570 -0.049872 0.036269 DO2 0.136157 -0.022433 -0.318865 -0.057570 1.000000 -0.072411 0.273426 SO4 -0.017380 0.048776 0.079792 -0.049872 -0.072411 1.000000 -0.087130 NO3 0.309233 -0.099528 -0.001596 0.036269 0.273426 -0.087130 1.000000
The correlation matrix is computed.