Example of Bootstrapping

Teacher and Student
Steve Debenport / E+ / Getty Images

Bootstrapping is a powerful statistical technique. It is especially useful when the sample size that we are working with is small. Under usual circumstances, sample sizes of less than 40 cannot be dealt with by assuming a normal distribution or a t distribution. Bootstrap techniques work quite well with samples that have less than 40 elements. The reason for this is that bootstrapping involves resampling.

These kinds of techniques assume nothing about the distribution of our data.

Bootstrapping has become more popular as computing resources have become more readily available. This is because in order for bootstrapping to be practical a computer must be used. We will see how this works in the following example of bootstrapping.

Example

We begin with a statistical sample from a population that we know nothing about. Our goal will be a 90% confidence interval about the mean of the sample. Although other statistical techniques used to determine confidence intervals assume that we know the mean or standard deviation of our population, bootstrapping does not require anything other than the sample.

For purposes of our example, we will assume that the sample is 1, 2, 4, 4, 10.

Bootstrap Sample

We now resample with replacement from our sample to form what are known as bootstrap samples. Each bootstrap sample will have a size of five, just like our original sample.

Since we randomly selecting and then are replacing each value, the bootstrap samples may be different from the original sample and from each other.

For examples that we would run into in the real world, we would do this resampling hundreds if not thousands of times. In what follows below, we will see an example of 20 bootstrap samples:

  • 2, 1, 10, 4, 2
  • 4, 10, 10, 2, 4
  • 1, 4, 1, 4, 4
  • 4, 1, 1, 4, 10
  • 4, 4, 1, 4, 2
  • 4, 10, 10, 10, 4
  • 2, 4, 4, 2, 1
  • 2, 4, 1, 10, 4
  • 1, 10, 2, 10, 10
  • 4, 1, 10, 1, 10
  • 4, 4, 4, 4, 1
  • 1, 2, 4, 4, 2
  • 4, 4, 10, 10, 2
  • 4, 2, 1, 4, 4
  • 4, 4, 4, 4, 4
  • 4, 2, 4, 1, 1
  • 4, 4, 4, 2, 4
  • 10, 4, 1, 4, 4
  • 4, 2, 1, 1, 2
  • 10, 2, 2, 1, 1

Mean

Since we are using bootstrapping to calculate a confidence interval for the population mean, we now calculate the means of each of our bootstrap samples. These means, arranged in ascending order are: 2, 2.4, 2.6, 2.6, 2.8, 3, 3, 3.2, 3.4, 3.6, 3.8, 4, 4, 4.2, 4.6, 5.2, 6, 6, 6.6, 7.6.

Confidence Interval

We now obtain from our list of bootstrap sample means a confidence interval. Since we want a 90% confidence interval, we use the 95th and 5th percentiles as the endpoints of the intervals. The reason for this is that we split 100% - 90% = 10% in half so that we will have the middle 90% of all of the bootstrap sample means.

For our example above we have a confidence interval of 2.4 to 6.6.