Sampling With or Without Replacement

Candy corn
Candy corn. Henry Horenstein / Getty Images

Statistical sampling can be done in a number of different ways. In addition to the type of sampling method that we use, there is another question relating to what specifically happens to an individual that we have randomly selected.  This question that arises when sampling is, "After we select an individual and record the measurement of attribute we're studying, what do we do with the individual?"

There are two options:

  • We can replace the individual back into the pool that we are sampling from.
  • We can choose to not replace the individual. 

We can very easily see that these lead to two different situations.   In the first option, replacement leaves open the possibility that the individual is randomly chosen a second time.  For the second option, if we are working without replacement, then it is impossible to pick the same person twice.  We will see that this difference will affect the calculation of probabilities related to these samples.

Effect on Probabilities

To see how we handle replacement affects the calculation of probabilities, consider the following example question. What is the probability of drawing two aces from a standard deck of cards?

This question is ambiguous.  What happens once we draw the first card?  Do we put it back into the deck, or do we leave it out? 

We start with calculating the probability with replacement.

  There are four aces and 52 cards total, so the probability of drawing one ace is 4/52. If we replace this card and draw again, then the probability is again 4/52. These events are independent, so we multiply the probabilities (4/52) x (4/52) = 1/169, or approximately 0.592%.

Now we will compare this to the same situation, with the exception that we do not replace the cards.

  The probability of drawing an ace on the first draw is still 4/52. For the second card, we assume that an ace has been already drawn.  We must now calculate a conditional probability.  In other words, we need to know what the probability of drawing a second ace, given that the first card is also an ace.

There are now three aces remaining out of a total of 51 cards. So the conditional probability of a second ace after drawing an ace is 3/51.  The probability of drawing two aces without replacement is (4/52) x (3/51) = 1/221, or about 0.425%.

We see directly from the problem above that what we choose to do with replacement has bearing on the values of probabilities.  It can significantly change these values.

Population Sizes

There are some situations where sampling with or without replacement does not substantially change any probabilities.  Suppose that we are randomly choosing two people from a city with population of 50,000, of which 30,000 of these people are female.

If we sample with replacement, then the probability of choosing a female on the first selection is given by 30000/50000 = 60%.  The probability of a female on the second selection is still 60%.  The probability of both people being female is 0.6 x 0.6 = 0.36.

If we sample without replacement then the first probability is unaffected.  The second probability is now 29999/49999 = 0.5999919998. . ., which is extremely close to 60%.  The probability that both are female is 0.6 x 0.5999919998 = 0.359995.

The probabilities are technically different, however they are close enough to be nearly indistinguishable.  For this reason, many times even though we sample without replacement, we treat the selection of each individual as if they are independent of the other individuals in the sample.

Other Applications

There are other instances where we need to consider whether to sample with or without replacement. On example of this is bootstrapping. This statistical technique falls under the heading of a resampling technique.

In bootstrapping we start with a statistical sample of a population.

We then use computer software to compute bootstrap samples. In other words, the computer resamples with replacement from the initial sample.