The interquartile range rule is useful in detecting the presence of outliers. Outliers are individual values that fall outside of the overall pattern of the rest of the data. This definition is somewhat vague and subjective, so it is helpful to have a rule to help in considering if a data point truly is an outlier.

## The Interquartile Range

Any set of data can be described by its five number summary. These five numbers, in ascending order, consist of:

- The minimum, or lowest value of the dataset
- The first quartile
*Q*_{1}- this represents a quarter of the way through the list of all the data - The median of the data set - this represents the midpoint of the list of all of the data
- The third quartile
*Q*_{3}- this represents three quarters of the way through the list of all the data - The maximum, or highest value of the data set.

These five numbers can be used to tell us quite a bit about our data. For example, the range, which is just the minimum subtracted from the maximum, is one indicator of how to spread out the data set is.

Similar to the range, but less sensitive to outliers, is the interquartile range. The interquartile range is calculated in much the same way as the range. All that we do is subtract the first quartile from the third quartile:

IQR = *Q*_{3} – *Q*_{1}.

The interquartile range shows how the data is spread about the median. It is less susceptible than the range to outliers.

## Interquartile Rule for Outliers

The interquartile range can be used to help detect outliers. All that we need to do is to is the following:

- Calculate the interquartile range for our data
- Multiply the interquartile range (IQR) by the number 1.5
- Add 1.5 x (IQR) to the third quartile. Any number greater than this is a suspected outlier.
- Subtract 1.5 x (IQR) from the first quartile. Any number less than this is a suspected outlier.

It is important to remember that this is a rule of thumb and generally holds. In general, we should follow up in our analysis. Any potential outlier obtained by this method should be examined in the context of the entire set of data.

## Example

We will see this interquartile range rule at work with a numerical example. Suppose we have the following set of data: 1, 3, 4, 6, 7, 7, 8, 8, 10, 12, 17. The five number summary for this data set is minimum = 1, first quartile = 4, median = 7, third quartile = 10 and maximum = 17. We may look at the data and say that 17 is an outlier. But what does our interquartile range rule say?

We calculate the interquartile range to be

*Q*_{3} – *Q*_{1} = 10 – 4 = 6

We now multiply by 1.5 and have 1.5 x 6 = 9. Nine less than the first quartile is 4 – 9 = -5. No data is less than this. Nine more than the third quartile is 10 + 9 =19. No data is greater than this. Despite the maximum value being five more than the nearest data point, the interquartile range rule shows that it should probably not be considered an outlier for this data set.