Different Methods for Inference About the Mean

The normal distribution with two tails shaded.
The blue region illustrates a two-tailed hypothesis test. C.K.Taylor

One of the goals of inferential statistics is to start with a statistical sample, and from this state a range of values for a corresponding population parameter. If we wish to find the population mean, we call this process inference about the mean. There are a number of different procedures and probability distributions that we can use for inference about the mean: z-scores, t-scores, and bootstrapping.

This brings up the natural and obvious questions, "How do we know when to use a t test? Under what conditions can we use the normal distribution?" Although definitive answers to these questions are somewhat elusive, there are some rules of thumb that help us to decide which procedure to use for inference about a mean.

Normal Distribution

The normal distribution, or bell curve, can be used in a number of settings for statistical inference. Unfortunately most of these situations are rarely encountered in real life. This is because in order to use the normal distribution of z - scores for inference about the mean, we need to know the value of the population standard deviation. This is rarely the case in practice.

Assuming that we know the value of the population standard deviation, we use the normal distribution in either the following two situations:

  • Our population is normally distributed
  • Our sample size is larger than 30.

    It is important to reiterate that the above is a rule of thumb. If our data is close enough to being normally distributed, then sample sizes as small as 15 can be used to obtain accurate results. On the other hand, we will need to use a sample size that is much larger than 30 for data that is highly skewed.

    Student’s t Distribution

    The assumption that we know the value of the population standard deviation is unrealistic in practice. Fortunately we can use another test for inference about the mean. The Student t test is used when we do not know the value of the population standard deviation.

    The t distribution has a similar bell curved shape as the normal distribution. What makes this distribution different is that it has heavier tails. This allows for greater variability of our data, as more of it can be further from the mean than with the normal distribution.

    Assuming that we do not know the value of the population standard deviation, we use Student’s t distribution in either of the following two situations:

    • Our population is normally distributed
    • Our sample size is larger than 30.

     

    As with the considerations for the normal distribution, these should be thought of as general guidelines of what to do, not as firm absolutes.

    Bootstrapping

    The above techniques for inference about the mean both require that our sample size is larger than 30 or our population is normally distributed. In the even that neither of these conditions is satisfied, we must turn to other techniques. Bootstrapping is one example of these.

    In other words, bootstrapping can be used if the sample size is less than 30, and the population is extremely skewed from being normal.

    Checklist

    We should think of the above guidelines in terms of a checklist.

    1. Do we know the value of the population standard deviation?
      • If yes, go to 2.
      • If no, go to 3
    2. Ask the following:
      • Do we have a normal population? If so, then use the normal distribution.
      • Do we have a sample with more than 30? If so, then use the normal distribution.
      • Did we answer no to both of the above? If so, then use bootstrapping.
    3. Ask the following:
      • Do we have a normal population? If so, then use the t distribution.
      • Do we have a sample with more than 30? If so, then use the t distribution.
      • Did we answer no to both of the above? If so, then use bootstrapping.