The chi-square goodness of fit test is a useful to compare a theoretical model to observed data. This test is a type of the more general chi-square test. As with any topic in mathematics or statistics, it can be helpful to work through an example in order to understand what is happening, through an example of the chi-square goodness of fit test.

Consider a standard package of milk chocolate M&Ms. There are six different colors: red, orange, yellow, green, blue and brown. Suppose that we are curious about the distribution of these colors and ask, do all six colors occur in equal proportion? This is the type of question that can be answered with a goodness of fit test.

### Setting

We begin by noting the setting and why the goodness of fit test is appropriate. Our variable of color is categorical. There are six levels of this variable, corresponding to the six colors that are possible. We will assume that the M&Ms we count will be a simple random sample from the population of all M&Ms.

### Null and Alternative Hypotheses

The null and alternative hypotheses for our goodness of fit test reflect the assumption that we are making about the population. Since we are testing whether the colors occur in equal proportions, our null hypothesis will be that all colors occur in the same proportion. More formally, if *p _{1}* is the population proportion of red candies,

*p*is the population proportion of orange candies, and so on, then the null hypothesis is that

_{2}*p*=

_{1}*p*= . . . =

_{2}*p*= 1/6.

_{6}The alternative hypothesis is that at least one of the population proportions is not equal to 1/6.

### Actual and Expected Counts

The actual counts are the number of candies for each of the six colors. The expected count refers to what we would expect if the null hypothesis were true. We will let *n* be the size of our sample. The expected number of red candies is *p _{1} n * or

*n*/6. In fact, for this example, the expected number of candies for each of the six colors is simply

*n*times

*p*, or

_{i}*n*/6.

### Chi-square Statistic for Goodness of Fit

We will now calculate a chi-square statistic for a specific example. Suppose that we have a simple random sample of 600 M&M candies with the following distribution:

- 212 of the candies are blue.
- 147 of the candies are orange.
- 103 of the candies are green.
- 50 of the candies are red.
- 46 of the candies are yellow.
- 42 of the candies are brown.

If the null hypothesis were true, then the expected counts for each of these colors would be (1/6) x 600 = 100. We now use this in our calculation of the chi-square statistic.

We calculate the contribution to our statistic from each of the colors. Each is of the form (Actual – Expected)^{2}/Expected.:

- For blue we have (212 – 100)
^{2}/100 = 125.44 - For orange we have (147 – 100)
^{2}/100 = 22.09 - For green we have (103 – 100)
^{2}/100 = 0.09 - For red we have (50 – 100)
^{2}/100 = 25 - For yellow we have (46 – 100)
^{2}/100 = 29.16 - For brown we have (42 – 100)
^{2}/100 = 33.64

We then total all of these contributions and determine that our chi-square statistic is 125.44 + 22.09 + 0.09 + 25 +29.16 + 33.64 =235.42.

### Degrees of Freedom

The number of degrees of freedom for a goodness of fit test is simply one less than the number of levels of our variable. Since there were six colors, we have 6 – 1 = 5 degrees of freedom.

### Chi-square Table and P-Value

The chi-square statistic of 235.42 that we calculated corresponds to a particular location on a chi-square distribution with five degrees of freedom. We now need a p-value, to determines the probability of obtaining a test statistic at least as extreme as 235.42 while assuming that the null hypothesis is true.

Microsoft’s Excel can be used for this calculation. We find that our test statistic with five degrees of freedom has a p-value of 7.29 x 10^{-49}. This is an extremely small p-value.

### Decision Rule

We make our decision on whether to reject the null hypothesis based on the size of the p-value. Since we have a very miniscule p-value, we reject the null hypothesis. We conclude that M&Ms are not evenly distributed among the six different colors. A follow-up analysis could be used to determine a confidence interval for the population proportion of one particular color.