# The Normal Approximation to the Binomial Distribution

Random variables with a binomial distribution are known to be discrete. This means that there are a countable number of outcomes that can occur in a binomial distribution, with separation between these outcomes. For instance, a binomial variable can take a value of three or four, but not a number in between three and four.

With the discrete character of a binomial distribution, it is somewhat surprising that a continuous random variable can be used to approximate a binomial distribution. For many binomial distributions, we can use a normal distribution to approximate our binomial probabilities.

This can be seen when looking at n coin tosses and letting X be the number of heads. In this situation, we have a binomial distribution with probability of success as p = 0.5. As we increase the number of tosses, we see that the probability histogram bears greater and greater resemblance to a normal distribution.

## Statement of the Normal Approximation

Every normal distribution is completely defined by two real numbers. These numbers are the mean, which measures the center of the distribution, and the standard deviation, which measures the spread of the distribution. For a given binomial situation we need to be able to determine which normal distribution to use.

The selection of the correct normal distribution is determined by the number of trials n in the binomial setting and the constant probability of success p for each of these trials. The normal approximation for our binomial variable is a mean of np and a standard deviation of (np(1 - p)0.5.

For example, suppose that we guessed on each of the 100 questions of a multiple-choice test, where each question had one correct answer out of four choices. The number of correct answers X is a binomial random variable with n = 100 and p = 0.25. Thus this random variable has mean of 100(0.25) = 25 and a standard deviation of (100(0.25)(0.75))0.5 = 4.33. A normal distribution with mean 25 and standard deviation of 4.33 will work to approximate this binomial distribution.

## When Is the Approximation Appropriate?

By using some mathematics it can be shown that there are a few conditions that we need to use a normal approximation to the binomial distribution. The number of observations n must be large enough, and the value of p so that both np and n(1 - p) are greater than or equal to 10. This is a rule of thumb, which is guided by statistical practice. The normal approximation can always be used, but if these conditions are not met then the approximation may not be that good of an approximation.

For example, if n = 100 and p = 0.25 then we are justified in using the normal approximation. This is because np = 25 and n(1 - p) = 75. Since both of these numbers are greater than 10, the appropriate normal distribution will do a fairly good job of estimating binomial probabilities.

## Why Use the Approximation?

Binomial probabilities are calculated by using a very straightforward formula to find the binomial coefficient. Unfortunately, due to the factorials in the formula, it can be very easy to run into computational difficulties with the binomial formula. The normal approximation allows us to bypass any of these problems by working with a familiar friend, a table of values of a standard normal distribution.

Many times the determination of a probability that a binomial random variable falls within a range of values is tedious to calculate. This is because to find the probability that a binomial variable X is greater than 3 and less than 10, we would need to find the probability that X equals 4, 5, 6, 7, 8 and 9, and then add all of these probabilities together. If the normal approximation can be used, we will instead need to determine the z-scores corresponding to 3 and 10, and then use a z-score table of probabilities for the standard normal distribution.