Science, Tech, Math › Math Example of a Chi-Square Goodness of Fit Test Share Flipboard Email Print Photo by Cathy Scola / Getty Images Math Statistics Statistics Tutorials Formulas Probability & Games Descriptive Statistics Inferential Statistics Applications Of Statistics Math Tutorials Geometry Arithmetic Pre Algebra & Algebra Exponential Decay Functions Worksheets By Grade Resources View More By Courtney Taylor Professor of Mathematics Ph.D., Mathematics, Purdue University M.S., Mathematics, Purdue University B.A., Mathematics, Physics, and Chemistry, Anderson University Courtney K. Taylor, Ph.D., is a professor of mathematics at Anderson University and the author of "An Introduction to Abstract Algebra." our editorial process Courtney Taylor Updated January 25, 2018 The chi-square goodness of fit test is a useful to compare a theoretical model to observed data. This test is a type of the more general chi-square test. As with any topic in mathematics or statistics, it can be helpful to work through an example in order to understand what is happening, through an example of the chi-square goodness of fit test. Consider a standard package of milk chocolate M&Ms. There are six different colors: red, orange, yellow, green, blue and brown. Suppose that we are curious about the distribution of these colors and ask, do all six colors occur in equal proportion? This is the type of question that can be answered with a goodness of fit test. Setting We begin by noting the setting and why the goodness of fit test is appropriate. Our variable of color is categorical. There are six levels of this variable, corresponding to the six colors that are possible. We will assume that the M&Ms we count will be a simple random sample from the population of all M&Ms. Null and Alternative Hypotheses The null and alternative hypotheses for our goodness of fit test reflect the assumption that we are making about the population. Since we are testing whether the colors occur in equal proportions, our null hypothesis will be that all colors occur in the same proportion. More formally, if p1 is the population proportion of red candies, p2 is the population proportion of orange candies, and so on, then the null hypothesis is that p1 = p2 = . . . = p6 = 1/6. The alternative hypothesis is that at least one of the population proportions is not equal to 1/6. Actual and Expected Counts The actual counts are the number of candies for each of the six colors. The expected count refers to what we would expect if the null hypothesis were true. We will let n be the size of our sample. The expected number of red candies is p1 n or n/6. In fact, for this example, the expected number of candies for each of the six colors is simply n times pi, or n/6. Chi-square Statistic for Goodness of Fit We will now calculate a chi-square statistic for a specific example. Suppose that we have a simple random sample of 600 M&M candies with the following distribution: 212 of the candies are blue.147 of the candies are orange.103 of the candies are green.50 of the candies are red.46 of the candies are yellow.42 of the candies are brown. If the null hypothesis were true, then the expected counts for each of these colors would be (1/6) x 600 = 100. We now use this in our calculation of the chi-square statistic. We calculate the contribution to our statistic from each of the colors. Each is of the form (Actual – Expected)2/Expected.: For blue we have (212 – 100)2/100 = 125.44For orange we have (147 – 100)2/100 = 22.09For green we have (103 – 100)2/100 = 0.09For red we have (50 – 100)2/100 = 25For yellow we have (46 – 100)2/100 = 29.16For brown we have (42 – 100)2/100 = 33.64 We then total all of these contributions and determine that our chi-square statistic is 125.44 + 22.09 + 0.09 + 25 +29.16 + 33.64 =235.42. Degrees of Freedom The number of degrees of freedom for a goodness of fit test is simply one less than the number of levels of our variable. Since there were six colors, we have 6 – 1 = 5 degrees of freedom. Chi-square Table and P-Value The chi-square statistic of 235.42 that we calculated corresponds to a particular location on a chi-square distribution with five degrees of freedom. We now need a p-value, to determines the probability of obtaining a test statistic at least as extreme as 235.42 while assuming that the null hypothesis is true. Microsoft’s Excel can be used for this calculation. We find that our test statistic with five degrees of freedom has a p-value of 7.29 x 10-49. This is an extremely small p-value. Decision Rule We make our decision on whether to reject the null hypothesis based on the size of the p-value. Since we have a very miniscule p-value, we reject the null hypothesis. We conclude that M&Ms are not evenly distributed among the six different colors. A follow-up analysis could be used to determine a confidence interval for the population proportion of one particular color.