How to Use the Normal Approximation to a Binomial Distribution

A histogram of a binomial distribution. C.K.Taylor

The binomial distribution involves a discrete random variable. Probabilities in a binomial setting can be calculated in a straightforward way by using the formula for a binomial coefficient. While in theory this is an easy calculation, in practice it can become quite tedious or even computationally impossible to calculate binomial probabilities. These issues can be sidestepped by instead using a normal distribution to approximate a binomial distribution.

  We will see how to do this by going through the steps of a calculation.

Steps to Using the Normal Approximation

First we must determine if it is appropriate to use the normal approximation. Not every binomial distribution is the same. Some exhibit enough skewness that we cannot use a normal approximation. To check to see if the normal approximation should be used, we need to look at the value of p, which is the probability of a success, and n, which is the number of observations of our binomial variable.

In order to use the normal approximation we consider both np and n( 1 - p ). If both of these number are greater than or equal to 10, then we are justified in using the normal approximation. This is a general rule of thumb, and typically the larger the values of np and n( 1 - p ), the better is the approximation.

Comparison between Binomial and Normal

We will compare an exact binomial probability with that obtained by a normal approximation.

We consider the tossing of 20 coins and want to know the probability that five coins or less were heads. If X is the number of heads, then we want to find the value:

P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4) + P(X = 5).

The use of the binomial formula for each of these six probabilities shows us that the probability is 2.0695%.

We will now see how close our normal approximation will be to this value.

Checking the conditions, we see that both np and np(1 - p) are equal to 10. This shows that we can use the normal approximation in this case. We will utilize a normal distribution with mean of np = 20(0.5) = 10 and a standard deviation of (20(0.5)(0.5))0.5 = 2.236.

To determine the probability that X is less than or equal to 5 we need to find the z-score for 5 in the normal distribution that we are using. Thus z = (5 – 10)/2.236 = -2.236. By consulting a table of z-scores we see that the probability that z is less than or equal to -2.236 is 1.267%. This differs from the actual probability, but is within 0.8%.

Continuity Correction Factor

To improve our estimate, it is appropriate to introduce a continuity correction factor. This is used because a normal distribution is continuous whereas the binomial distribution is discrete. For a binomial random variable, a probability histogram for X = 5 will include a bar that goes from 4.5 to 5.5 and is centered at 5.

This means that for the above example, the probability that X is less than or equal to 5 for a binomial variable should be estimated by the probability that X is less than or equal to 5.5 for a continuous normal variable.

Thus z = (5.5 – 10)/2.236 = -2.013. The probability that z