biostats.correlation_matrix#

biostats.correlation_matrix(data, variable)[source]#

Compute the correlation coefficients between every two variables.

Parameters:
datapandas.DataFrame

The input data. Must contain at least two numeric columns.

variablelist

The list of numeric variables.

Returns:
summarypandas.DataFrame

The correlation matrix.

See also

correlation

Test whether there is a correlation between two numeric variables.

multiple_linear_regression

Fit an equation that predicts a numeric variable from other variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("correlation_matrix.csv")
>>> data
    Longnose  Acerage   DO2  Maxdepth   NO3    SO4  Temp
0         13     2528   9.6        80  2.28  16.75  15.3
1         12     3333   8.5        83  5.34   7.74  19.4
2         54    19611   8.3        96  0.99  10.92  19.5
3         19     3570   9.2        56  5.44  16.53  17.0
4         37     1722   8.1        43  5.66   5.91  19.3
..       ...      ...   ...       ...   ...    ...   ...
63         2     6311   7.6        46  0.64  21.16  18.5
64        26     1450   7.9        60  2.96   8.84  18.6
65        20     4106  10.0        96  2.62   5.45  15.4
66        38    10274   9.3        90  5.45  24.76  15.0
67        19      510   6.7        82  5.25  14.19  26.5

We want to compute the correlation coefficients between every two variables in the data.

>>> summary = bs.correlation_matrix(data=data, variable=["Longnose","Acerage","DO2","Maxdepth","NO3","SO4","Temp"])
>>> summary
          Longnose   Acerage      Temp  Maxdepth       DO2       SO4       NO3
Longnose  1.000000  0.346506  0.139865  0.304980  0.136157 -0.017380  0.309233
Acerage   0.346506  1.000000  0.003541  0.258624 -0.022433  0.048776 -0.099528
Temp      0.139865  0.003541  1.000000 -0.004895 -0.318865  0.079792 -0.001596
Maxdepth  0.304980  0.258624 -0.004895  1.000000 -0.057570 -0.049872  0.036269
DO2       0.136157 -0.022433 -0.318865 -0.057570  1.000000 -0.072411  0.273426
SO4      -0.017380  0.048776  0.079792 -0.049872 -0.072411  1.000000 -0.087130
NO3       0.309233 -0.099528 -0.001596  0.036269  0.273426 -0.087130  1.000000

The correlation matrix is computed.