biostats.chi_square_test_fit#

biostats.chi_square_test_fit(data, variable, expect)[source]#

Test whether the proportion of a categorical variable is different from the expected proportion.

Parameters:
datapandas.DataFrame

The input data. Must contain at least one categorical column.

variablestr

The categorical variable that we want to calculate the proportion of. Maximum 20 groups.

expectdict

The expected proportions of each group. The sum of the proportions will be automatically normalized to 1.

Returns:
summarypandas.DataFrame

The observed counts and proportions of each group, and the expected counts and proportions of each group.

resultpandas.DataFrame

The degree of freedom, chi-square statistic, and p-value of the test.

See also

binomial_test

The exact version of chi-square test (fit).

chi_square_test

Test the association between two categorical variables.

Examples

>>> import biostats as bs
>>> data = bs.dataset("chi_square_test_fit.csv")
>>> data
        Canopy
0      Douglas
1    Ponderosa
2    Ponderosa
3    Ponderosa
4      Douglas
..         ...
151    Douglas
152    Douglas
153  Ponderosa
154    Douglas
155    Douglas

We want to test whether the proportions of each Canopy are different from the expected proportions.

>>> summary, result = bs.chi_square_test_fit(data=data, variable="Canopy", expect={"Douglas":0.54, "Ponderosa":0.40, "Grand":0.05, "Western":0.01})
>>> summary
           Observe  Prop.(Obs.)  Expect  Prop.(Exp.)
Douglas         70     0.448718   84.24         0.54
Ponderosa       79     0.506410   62.40         0.40
Western          4     0.025641    1.56         0.01
Grand            3     0.019231    7.80         0.05

The observed and expected counts and proportions of each group are given.

>>> result
        D.F.  Chi Square   p-value    
Normal     3   13.593424  0.003514  **

The p-value < 0.01, so the observed proportions are significantly different from the expected proportions.