# What Is Statistical Sampling?

## Populations and Censuses

Many times researchers want to know the answers to questions that are large in scope. For example:

• What did everyone in a particular country watch on television last night?
• Who does an electorate intend to vote for in an upcoming election?
• How many birds return from migration at a certain location?
• What percentage of the workforce is unemployed?

These kinds of questions are huge in the sense that they require us to keep track of millions of individuals.

Statistics simplifies these problems by using a technique called sampling. By conducting a statistical sample, our workload can be cut down immensely. Rather than tracking the behaviors of billions or millions, we only need to examine those of thousands or hundreds. As we will see, this simplification comes at a price.

## Populations and Censuses

The population of a statistical study is what we’re trying to find out something about. It consists of all of the individuals who are being examined. A population can really be anything. Californians, caribous, computers, cars or counties could all be considered populations, depending on the statistical question. Although most populations being researched are large, they do not necessarily have to be.

One strategy to research the population is to conduct a census. In a census, we examine each and every member of the population in our study. A prime example of this is the U.S. Census. Every ten years the Census Bureau sends a questionnaire to everyone in the country. Those who do not return the form are visited by census workers

Censuses are fraught with difficulties. They are typically expensive in terms of time and resources. In addition to this, it’s difficult to guarantee that everyone in the population has been reached. Other populations are even more difficult to conduct a census with. If we wanted to study the habits of stray dogs in the state of New York, good luck rounding up all of those transient canines.

## Samples

Since it’s normally either impossible or impractical to track down every member of a population, the next option available is to sample the population. A sample is any subset of a population, so its size can be small or large. We want a sample small enough to be manageable by our computing power, yet large enough to give us statistically significant results.

If a polling firm is trying to determine voter satisfaction with Congress, and its sample size is one, then the results are going to be meaningless (but easy to obtain). On the other hand, asking millions of people is going to consume too many resources. To strike a balance, polls of this type typically have sample sizes of around 1000.

## Random Samples

But having the right sample size is not enough to ensure good results. We want a sample that is representative of the population. Suppose we want to find out how many books the average American reads annually. We ask 2000 college students to keep track of what they read over the year, then check back with them after a year has gone by. We find the mean number of books read is 12, and then conclude that the average American reads 12 books a year.

The problem with this scenario is with the sample. A majority of college students are between 18-25 years old and are required by their instructors to read textbooks and novels. This is a poor representation of the average American. A good sample would contain people of different ages, from all walks of life, and from different regions of the country. To acquire such a sample we would need to compose it randomly so that every American has an equal probability of being in the sample.

## Types of Samples

The gold standard of statistical experiments is the simple random sample. In such a sample of size n individuals, every member of the population has the same likelihood of being selected for the sample, and every group of n individuals has the same likelihood of being selected. There are a variety of ways to sample a population. Some of the most common are: