biostats.correlation_matrix#

biostats.correlation_matrix(data, variable)[source]#

Compute the correlation coefficients between every two variables.

Parameters:

datapandas.DataFrame: The input data. Must contain at least two numeric columns.
variablelist: The list of numeric variables.

Returns:

summarypandas.DataFrame: The correlation matrix.

See also

correlation: Test whether there is a correlation between two numeric variables.
multiple_linear_regression: Fit an equation that predicts a numeric variable from other variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("correlation_matrix.csv")
>>> data
    Longnose  Acerage   DO2  Maxdepth   NO3    SO4  Temp
0         13     2528   9.6        80  2.28  16.75  15.3
1         12     3333   8.5        83  5.34   7.74  19.4
2         54    19611   8.3        96  0.99  10.92  19.5
3         19     3570   9.2        56  5.44  16.53  17.0
4         37     1722   8.1        43  5.66   5.91  19.3
..       ...      ...   ...       ...   ...    ...   ...
63         2     6311   7.6        46  0.64  21.16  18.5
64        26     1450   7.9        60  2.96   8.84  18.6
65        20     4106  10.0        96  2.62   5.45  15.4
66        38    10274   9.3        90  5.45  24.76  15.0
67        19      510   6.7        82  5.25  14.19  26.5

We want to compute the correlation coefficients between every two variables in the data.

>>> summary = bs.correlation_matrix(data=data, variable=["Longnose","Acerage","DO2","Maxdepth","NO3","SO4","Temp"])
>>> summary
          Longnose   Acerage      Temp  Maxdepth       DO2       SO4       NO3
Longnose  1.000000  0.346506  0.139865  0.304980  0.136157 -0.017380  0.309233
Acerage   0.346506  1.000000  0.003541  0.258624 -0.022433  0.048776 -0.099528
Temp      0.139865  0.003541  1.000000 -0.004895 -0.318865  0.079792 -0.001596
Maxdepth  0.304980  0.258624 -0.004895  1.000000 -0.057570 -0.049872  0.036269
DO2       0.136157 -0.022433 -0.318865 -0.057570  1.000000 -0.072411  0.273426
SO4      -0.017380  0.048776  0.079792 -0.049872 -0.072411  1.000000 -0.087130
NO3       0.309233 -0.099528 -0.001596  0.036269  0.273426 -0.087130  1.000000

The correlation matrix is computed.