Science, Tech, Math › Math What Is Bootstrapping in Statistics? Share Flipboard Email Print stevecoleimages / Getty Images Math Statistics Applications Of Statistics Statistics Tutorials Formulas Probability & Games Descriptive Statistics Inferential Statistics Math Tutorials Geometry Arithmetic Pre Algebra & Algebra Exponential Decay Functions Worksheets By Grade Resources View More By Courtney Taylor Professor of Mathematics Ph.D., Mathematics, Purdue University M.S., Mathematics, Purdue University B.A., Mathematics, Physics, and Chemistry, Anderson University Courtney K. Taylor, Ph.D., is a professor of mathematics at Anderson University and the author of "An Introduction to Abstract Algebra." our editorial process Courtney Taylor Updated January 13, 2019 Bootstrapping is a statistical technique that falls under the broader heading of resampling. This technique involves a relatively simple procedure but repeated so many times that it is heavily dependent upon computer calculations. Bootstrapping provides a method other than confidence intervals to estimate a population parameter. Bootstrapping very much seems to work like magic. Read on to see how it obtains its interesting name. An Explanation of Bootstrapping One goal of inferential statistics is to determine the value of a parameter of a population. It is typically too expensive or even impossible to measure this directly. So we use statistical sampling. We sample a population, measure a statistic of this sample, and then use this statistic to say something about the corresponding parameter of the population. For example, in a chocolate factory, we might want to guarantee that candy bars have a particular mean weight. It’s not feasible to weigh every candy bar that is produced, so we use sampling techniques to randomly choose 100 candy bars. We calculate the mean of these 100 candy bars and say that the population mean falls within a margin of error from what the mean of our sample is. Suppose that a few months later we want to know with greater accuracy -- or less of a margin of error -- what the mean candy bar weight was on the day that we sampled the production line. We cannot use today’s candy bars, as too many variables have entered the picture (different batches of milk, sugar and cocoa beans, different atmospheric conditions, different employees on the line, etc.). All that we have from the day that we are curious about are the 100 weights. Without a time machine back to that day, it would seem that the initial margin of error is the best that we can hope for. Fortunately, we can use the technique of bootstrapping. In this situation, we randomly sample with replacement from the 100 known weights. We then call this a bootstrap sample. Since we allow for replacement, this bootstrap sample most likely not identical to our initial sample. Some data points may be duplicated, and others data points from the initial 100 may be omitted in a bootstrap sample. With the help of a computer, thousands of bootstrap samples can be constructed in a relatively short time. An Example As mentioned, to truly use bootstrap techniques we need to use a computer. The following numerical example will help to demonstrate how the process works. If we begin with the sample 2, 4, 5, 6, 6, then all of the following are possible bootstrap samples: 2 ,5, 5, 6, 64, 5, 6, 6, 62, 2, 4, 5, 52, 2, 2, 4, 62, 2, 2, 2, 24,6, 6, 6, 6 History of the Technique Bootstrap techniques are relatively new to the field of statistics. The first use was published in a 1979 paper by Bradley Efron. As computing power has increased and becomes less expensive, bootstrap techniques have become more widespread. Why the Name Bootstrapping? The name “bootstrapping” comes from the phrase, “To lift himself up by his bootstraps.” This refers to something that is preposterous and impossible. Try as hard as you can, you cannot lift yourself into the air by tugging at pieces of leather on your boots. There is some mathematical theory that justifies bootstrapping techniques. However, the use of bootstrapping does feel like you are doing the impossible. Although it does not seem like you would be able to improve upon the estimate of a population statistic by reusing the same sample over and over again, bootstrapping can, in fact, do this.