Inferential statistics concerns the process of beginning with a statistical sample and then arriving at the value of a population parameter that is unknown. The unknown value is not determined directly. Rather we end up with an estimate that falls into a range of values. This range is known in mathematical terms an interval of real numbers, and is specifically referred to as a confidence interval.
Confidence intervals are all similar to one another in a few ways. Two-sided confidence intervals all have the same form:
Estimate ± Margin of Error
Similarities in confidence intervals also extend to the steps used to calculate confidence intervals. We will examine how to determine a two sided confidence interval for a population mean when the population standard deviation is unknown. An underlying assumption is that we are sampling from a normally distributed population.
Process for Confidence Interval for Mean – Unknown Sigma
We will work through a list of steps required to find our desired confidence interval. Although all of the steps are important, the first one is particularly so:
- Check Conditions: Begin by making sure that the conditions for our confidence interval have been met. We assume that the value of the population standard deviation, denoted by the Greek letter sigma σ, is unknown and that we are working with a normal distribution. We can relax the assumption that we have a normal distribution as long as our sample is large enough and has no outliers or extreme skewness.
- Calculate Estimate: We estimate our population parameter, in this case the population mean, by use of a statistic, in this case the sample mean. This involves forming a simple random sample from our population. Sometimes we can suppose that our sample is a simple random sample, even if it does not meet the strict definition.
- Critical Value: We obtain the critical value t^{*} that corresponds with our confidence level. These values are found by consulting a table of t-scores or by using software. If we use a table, we will need to know the number of degrees of freedom. The number of degrees of freedom is one less than the number of individuals in our sample.
- Margin of Error: Calculate the margin of error t^{*}s /√n, where n is the size of the simple random sample that we formed and s is the sample standard deviation, which we obtain from our statistical sample.
- Conclude: Finish by putting together the estimate and margin of error. This can be expressed as either Estimate ± Margin of Error or as Estimate - Margin of Error to Estimate + Margin of Error. In the statement of our confidence interval it is important to indicate the level of confidence. This is just as much a part of our confidence interval as numbers for the estimate and margin of error.
Example
To see how we can construct a confidence interval, we will work through an example. Suppose we know that the heights of a specific species of pea plants are normally distributed. A simple random sample of 30 pea plants has a mean height of 12 inches with a sample standard deviation of 2 inches.
What is a 90% confidence interval for the mean height for the entire population of pea plants?
We will work through the steps that were outlined above:
- Check Conditions: The conditions have been met as the population standard deviation is unknown and we are dealing with a normal distribution.
- Calculate Estimate: We have been told that we have a simple random sample of 30 pea plants. The mean height for this sample is 12 inches, so this is our estimate.
- Critical Value: Our sample has size of 30, and so there are 29 degrees of freedom. The critical value for confidence level of 90% is given by t^{*} = 1.699.
- Margin of Error: Now we use the margin of error formula and obtain a margin of error of t^{*}s /√n = (1.699)(2) /√(30) = 0.620.
- Conclude: We conclude by putting everything together. A 90% confidence interval for the population’s mean height score is 12 ± 0.62 inches. Alternatively we could state this confidence interval as 11.38 inches to 12.62 inches.
Practical Considerations
Confidence intervals of the above type are more realistic than other types that can be encountered in a statistics course. It is very rare to know the population standard deviation but not know the population mean. Here we assume that we do not know either of these population parameters.