Chi-Square Goodness of Fit Test

The chi-square goodness of fit test is a variation of the more general chi-square test. The setting for this test is a single categorical variable that can have many levels. Often in this situation, we will have a theoretical model in mind for a categorical variable. Through this model we expect certain proportions of the population to fall into each of these levels. A goodness of fit test determines how well the expected proportions in our theoretical model matches reality.

Null and Alternative Hypotheses

The null and alternative hypotheses for a goodness of fit test look different than some of our other hypothesis tests. One reason for this is that a chi-square goodness of fit test is a nonparametric method. This means that our test does not concern a single population parameter. Thus the null hypothesis does not state that a single parameter takes on a certain value.

We start with a categorical variable with n levels and let pi be the proportion of the population at level i. Our theoretical model has values of qi for each of the proportions. The statement of the null and alternative hypotheses are as follows:

  • H0: p1 = q1, p2 = q2, . . . pn = qn
  • Ha: For at least one i, pi is not equal to qi.

Actual and Expected Counts

The calculation of a chi-square statistic involves a comparison between actual counts of variables from the data in our simple random sample and the expected counts of these variables.

The actual counts come directly from our sample. The way that the expected counts are calculated depends upon the particular chi-square test that we are using.

For a goodness of fit test, we have a theoretical model for how our data should be proportioned. We simply multiply these proportions by the sample size n to obtain our expected counts.

Chi-square Statistic for Goodness of Fit

The chi-square statistic for goodness of fit test is determined by comparing the actual and expected counts for each level of our categorical variable. The steps to computing the chi-square statistic for a goodness of fit test are as follows:

  1. For each level, subtract the observed count from the expected count.
  2. Square each of these differences.
  3. Divide each of these squared differences by the corresponding expected value.
  4. Add all of the numbers from the previous step together. This is our chi-square statistic.

If our theoretical model matches the observed data perfectly, then the expected counts will show no deviation whatsoever from the observed counts of our variable. This will mean that we will have a chi-square statistic of zero. In any other situation, the chi-square statistic will be a positive number.

Degrees of Freedom

The number of degrees of freedom requires no difficult calculations. All that we need to do is subtract one from the number of levels of our categorical variable. This number will inform us on which of the infinite chi-square distributions we should use.

Chi-square Table and P-Value

The chi-square statistic that we calculated corresponds to a particular location on a chi-square distribution with the appropriate number of degrees of freedom.

The p-value determines the probability of obtaining a test statistic this extreme, assuming that the null hypothesis is true. We can use a table of values for a chi-square distribution to determine the p-value of our hypothesis test. If we have statistical software available, then this can be used to obtain a better estimate of the p-value.

Decision Rule

We make our decision on whether to reject the null hypothesis based upon a predetermined level of significance. If our p-value is less than or equal to this level of significance, then we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis.