biostats.factor_analysis#
- biostats.factor_analysis(data, x, factors, analyze=None)[source]#
Find the underlying factors of a set of variables.
- Parameters:
- data
pandas.DataFrame
The input data. Must contain at least two numeric columns.
- x
list
The list of numeric variables to be analyzed.
- factors
int
The number of factors.
- analyze
dict
The data to be analyzed. Optional.
- data
- Returns:
- summary
pandas.DataFrame
The uniqueness of each variable.
- result
pandas.DataFrame
The loadings of each variable, sum of squared loadings, proportion of variance, and cumulative proportion of variance of each factor.
- analysis
pandas.DataFrame
The factor scores of the data to be analyzed.
- summary
See also
principal_component_analysis
Find the linear combination of a set of variables to manifest the variation of data.
linear_discriminant_analysis
Find the linear combination of a set of variables to distinguish between groups.
Examples
>>> import biostats as bs >>> data = bs.dataset("factor_analysis.csv") >>> data Oil Density Crispy Fracture Hardness 0 16.5 2955 10 23 97 1 17.7 2660 14 9 139 2 16.2 2870 12 17 143 3 16.7 2920 10 31 95 4 16.3 2975 11 26 143 5 19.1 2790 13 16 189 6 18.4 2750 13 17 114 7 17.5 2770 10 26 63 8 15.7 2955 11 23 123 9 16.4 2945 11 24 132 10 18.0 2830 12 15 121 11 17.4 2835 12 18 172 12 18.4 2860 14 11 170 13 13.9 2965 12 19 169 14 15.8 2930 9 26 65 15 16.4 2770 15 16 183 16 18.9 2650 14 20 114 17 17.3 2890 12 17 142 18 16.7 2695 13 13 111 19 19.1 2755 14 10 140 20 13.7 3000 10 27 177 21 14.7 2980 10 20 133 22 18.1 2780 13 14 150 23 17.2 2705 8 27 113 24 18.7 2825 13 20 166 25 18.1 2875 12 15 150 26 16.6 2945 10 25 100 27 17.1 2920 10 25 123 28 17.4 2845 13 19 129 29 19.4 2645 12 18 68 30 15.9 3080 10 23 106 31 17.1 2825 10 28 131 32 15.5 3125 7 33 92 33 17.7 2780 13 22 141 34 15.9 2900 12 21 192 35 21.2 2570 14 13 105 36 19.5 2635 13 22 101 37 20.5 2725 14 16 145 38 17.0 2865 11 22 100 39 16.7 2975 10 26 105 40 16.8 2980 10 24 144 41 16.8 2870 12 20 123 42 16.3 2920 11 22 136 43 16.2 3100 8 27 140 44 18.1 2910 12 21 120 45 16.6 2865 11 25 120 46 16.4 2995 12 20 165 47 15.1 2925 10 29 118 48 21.1 2700 13 16 116 49 16.3 2845 10 26 75
We want to find the underlying factors of the five variables.
>>> summary, result, analysis = bs.factor_analysis(data=data, x=["Oil", "Density", "Crispy", "Fracture", "Hardness"], factors=2, ... analyze={"Oil":17.2, "Density":2830, "Crispy":12, "Fracture":19, "Hardness":121}) >>> summary Oil Density Crispy Fracture Hardness Uniqueness 0.322983 0.169086 0.04781 0.251765 0.398991
The uniqueness of each variable (proportion of variability that cannot be explained by the factors) are given.
>>> result Factor 1 Factor 2 Oil -0.822497 -0.022736 Density 0.911124 0.027689 Crispy -0.747793 0.626893 Fracture 0.653877 -0.566286 Hardness 0.095274 0.769371 NaN NaN SS Loadings 2.502476 1.306891 Proportion Var. 0.500495 0.261378 Cumulative Var. 0.500495 0.761873
The loadings (contribution of each original variable to the factor), SS Loadings (sum of squared loadings), Proportion Var (proportion of variance explained by each factor), and Cumulative Var (cumulative proportion of variance) are calculated.
>>> analysis Factor 1 Factor 2 Analysis -0.251185 0.090308
The factor scores of the data to be analyzed are calculated.