biostats.linear_discriminant_analysis#
- biostats.linear_discriminant_analysis(data, x, y, predict=None)[source]#
Find the linear combination of a set of variables to distinguish between groups.
- Parameters:
- data
pandas.DataFrame
The input data. Must contain at least one numeric column.
- x
list
The list of numeric variables to be analyzed.
- y
str
The categorical variable that specifies the groups to be distinguished. Maximum 20 groups.
- predict
dict
The data to be predicted. Optional.
- data
- Returns:
- summary
pandas.DataFrame
The mean values of each variable in each group.
- result
pandas.DataFrame
The coefficients and intercepts of the linear combinations, as well as the proportions of separation achieved by each dimension.
- prediction
pandas.DataFrame
The probabilities and results of the data to be predicted.
- summary
See also
factor_analysis
Find the underlying factors of a set of variables.
principal_component_analysis
Find the linear combination of a set of variables to manifest the variation of data.
Examples
>>> import biostats as bs >>> data = bs.dataset("linear_discriminant_analysis.csv") >>> data sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa .. ... ... ... ... ... 145 6.7 3.0 5.2 2.3 virginica 146 6.3 2.5 5.0 1.9 virginica 147 6.5 3.0 5.2 2.0 virginica 148 6.2 3.4 5.4 2.3 virginica 149 5.9 3.0 5.1 1.8 virginica
We want to find the linear combination of the four variables to distinguish between the three species.
>>> summary, result, prediction = bs.linear_discriminant_analysis(data=data, x=["sepal_length", "sepal_width", "petal_length" ,"petal_width"], y="species", ... predict={"sepal_length": 5.7, "sepal_width": 2.7, "petal_length": 4.0 ,"petal_width":1.4}) >>> summary sepal_length sepal_width petal_length petal_width setosa 5.006 3.428 1.462 0.246 versicolor 5.936 2.770 4.260 1.326 virginica 6.588 2.974 5.552 2.026
The mean values of each variable in each group are calculated.
>>> result sepal_length sepal_width petal_length petal_width Intercept Proportion Dimension 1 0.829378 1.534473 -2.201212 -2.810460 2.105106 0.991213 Dimension 2 0.024102 2.164521 -0.931921 2.839188 -6.661473 0.008787
The coefficients and intercepts to form the new dimensions are given. The proportions of separation achieved by each dimension are also given.
>>> prediction P(setosa) P(versicolor) P(virginica) Result Prediction 7.206674e-20 0.999792 0.000208 versicolor
The data is predicted to belong to versicolor.