biostats.principal_component_analysis#
- biostats.principal_component_analysis(data, x, transform=None)[source]#
Find the linear combination of a set of variables to manifest the variation of data.
- Parameters:
- data
pandas.DataFrame
The input data. Must contain at least one numeric column.
- x
list
The list of numeric variables to be analyzed.
- transform
dict
The data to be transformed. Optional.
- data
- Returns:
- summary
pandas.DataFrame
The counts, mean values, standard deviations, and variances of each variable.
- result
pandas.DataFrame
The coefficients and intercepts of the linear combinations, as well as the proportions of variation explained by each dimension.
- transformation
pandas.DataFrame
The new coordinates of the data to be transformed.
- summary
See also
factor_analysis
Find the underlying factors of a set of variables.
linear_discriminant_analysis
Find the linear combination of a set of variables to distinguish between groups.
Examples
>>> import biostats as bs >>> data = bs.dataset("principal_component_analysis.csv") >>> data Murder Assault UrbanPop Rape 0 13.2 236 58 21.2 1 10.0 263 48 44.5 2 8.1 294 80 31.0 3 8.8 190 50 19.5 4 9.0 276 91 40.6 5 7.9 204 78 38.7 6 3.3 110 77 11.1 7 5.9 238 72 15.8 8 15.4 335 80 31.9 9 17.4 211 60 25.8 10 5.3 46 83 20.2 11 2.6 120 54 14.2 12 10.4 249 83 24.0 13 7.2 113 65 21.0 14 2.2 56 57 11.3 15 6.0 115 66 18.0 16 9.7 109 52 16.3 17 15.4 249 66 22.2 18 2.1 83 51 7.8 19 11.3 300 67 27.8 20 4.4 149 85 16.3 21 12.1 255 74 35.1 22 2.7 72 66 14.9 23 16.1 259 44 17.1 24 9.0 178 70 28.2 25 6.0 109 53 16.4 26 4.3 102 62 16.5 27 12.2 252 81 46.0 28 2.1 57 56 9.5 29 7.4 159 89 18.8 30 11.4 285 70 32.1 31 11.1 254 86 26.1 32 13.0 337 45 16.1 33 0.8 45 44 7.3 34 7.3 120 75 21.4 35 6.6 151 68 20.0 36 4.9 159 67 29.3 37 6.3 106 72 14.9 38 3.4 174 87 8.3 39 14.4 279 48 22.5 40 3.8 86 45 12.8 41 13.2 188 59 26.9 42 12.7 201 80 25.5 43 3.2 120 80 22.9 44 2.2 48 32 11.2 45 8.5 156 63 20.7 46 4.0 145 73 26.2 47 5.7 81 39 9.3 48 2.6 53 66 10.8 49 6.8 161 60 15.6
We want to find the linear combination of the four variables to manifest the variation of data.
>>> summary, result, transformation = bs.principal_component_analysis(data=data, x=["Murder", "Assault", "UrbanPop", "Rape"], ... transform={"Murder":10.2, "Assault":211, "UrbanPop":67, "Rape":32.3}) >>> summary Count Mean Std. Deviation Variance Murder 50 7.788 4.355510 18.970465 Assault 50 170.760 83.337661 6945.165714 UrbanPop 50 65.540 14.474763 209.518776 Rape 50 21.232 9.366385 87.729159
Basic descriptive statistics of the four variables are calculated.
>>> result Murder Assault UrbanPop Rape Intercept Proportion Dimension 1 0.041704 0.995221 0.046336 0.075156 -174.901326 0.965534 Dimension 2 0.044822 0.058760 -0.976857 -0.200718 57.901952 0.027817 Dimension 3 0.079891 -0.067570 -0.200546 0.974081 3.378144 0.005800 Dimension 4 0.994922 -0.038938 0.058169 -0.072325 -3.376148 0.000849
The coefficients and intercepts to form the new dimensions are given. The proportions of variation explained by each dimension are also given.
>>> transformation Dimension 1 Dimension 2 Dimension 3 Dimension 4 Transformation 41.047766 -1.175146 7.962017 0.117308
The new coordinates of the data to be transformed.