biostats.chi_square_test_fit#

biostats.chi_square_test_fit(data, variable, expect)[source]#

Test whether the proportion of a categorical variable is different from the expected proportion.

Parameters:

datapandas.DataFrame: The input data. Must contain at least one categorical column.
variablestr: The categorical variable that we want to calculate the proportion of. Maximum 20 groups.
expectdict: The expected proportions of each group. The sum of the proportions will be automatically normalized to 1.

Returns:

summarypandas.DataFrame: The observed counts and proportions of each group, and the expected counts and proportions of each group.
resultpandas.DataFrame: The degree of freedom, chi-square statistic, and p-value of the test.

See also

binomial_test: The exact version of chi-square test (fit).
chi_square_test: Test the association between two categorical variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("chi_square_test_fit.csv")
>>> data
        Canopy
0      Douglas
1    Ponderosa
2    Ponderosa
3    Ponderosa
4      Douglas
..         ...
151    Douglas
152    Douglas
153  Ponderosa
154    Douglas
155    Douglas

We want to test whether the proportions of each Canopy are different from the expected proportions.

>>> summary, result = bs.chi_square_test_fit(data=data, variable="Canopy", expect={"Douglas":0.54, "Ponderosa":0.40, "Grand":0.05, "Western":0.01})
>>> summary
           Observe  Prop.(Obs.)  Expect  Prop.(Exp.)
Douglas         70     0.448718   84.24         0.54
Ponderosa       79     0.506410   62.40         0.40
Western          4     0.025641    1.56         0.01
Grand            3     0.019231    7.80         0.05

The observed and expected counts and proportions of each group are given.

>>> result
        D.F.  Chi Square   p-value    
Normal     3   13.593424  0.003514  **

The p-value < 0.01, so the observed proportions are significantly different from the expected proportions.