Pros and Cons of Secondary Data Analysis

A Review of the Advantages and Disadvantages in Social Science Research

Touch screen analysing commerce
Laurence Dutton / Getty Images

In social science research, the terms primary data and secondary data are common parlance. Primary data is collected by a researcher or team of researchers for the specific purpose or analysis under consideration. Here, a research team conceives of and develops a research project, collects data designed to address specific questions, and performs their own analyses of the data they collected. In this case, the people involved in the data analysis are familiar with the research design and data collection process.

Secondary data analysis, on the other hand, is the use of data that was collected by someone else for some other purpose. In this case, the researcher poses questions that are addressed through the analysis of a data set that they were not involved in collecting. The data was not collected to answer the researcher’s specific research questions and was instead collected for another purpose. So, the same data set can actually be a primary data set to one researcher and a secondary data set to a different one.

Using Secondary Data

There are some important things that must be done before using secondary data in an analysis. Since the researcher did not collect the data, it's important for him to become familiar with the data set: how the data was collected, what the response categories are for each question, whether or not weights need to be applied during the analysis, whether or not clusters or stratification need to be accounted for, who the population of study was, and more.

A great deal of secondary data resources and data sets are available for sociological research, many of which are public and easily accessible. The United States Census, the General Social Survey, and the American Community Survey are some of the most commonly used secondary data sets available.

Advantages of Secondary Data Analysis

The biggest advantage of using secondary data is economics. Someone else has already collected the data, so the researcher does not have to devote money, time, energy and resources to this phase of research. Sometimes the secondary data set must be purchased, but the cost is almost always lower than the expense of collecting a similar data set from scratch, which usually entails salaries, travel and transportation, office space, equipment, and other overhead costs. In addition, since the data is already collected and usually cleaned and stored in electronic format, the researcher can spend most of her time analyzing the data instead of getting the data ready for analysis.

A second major advantage of using secondary data is the breadth of data available. The federal government conducts numerous studies on a large, national scale that individual researchers would have a difficult time collecting. Many of these data sets are also longitudinal, meaning that the same data has been collected from the same population over several different time periods. This allows researchers to look at trends and changes of phenomena over time.

A third important advantage of using secondary data is that the data collection process often maintains a level of expertise and professionalism that may not be present with individual researchers or small research projects. For example, data collection for many federal data sets is often performed by staff members who specialize in certain tasks and have many years of experience in that particular area and with that particular survey. Many smaller research projects do not have that level of expertise, as a lot of data is collected by students working part-time.

Disadvantages of Secondary Data Analysis

A major disadvantage of using secondary data is that it may not answer the researcher’s specific research questions or contain specific information that the researcher would like to have. It also may not have been collected in the geographic region or during the years desired, or the specific population that the researcher is interested in studying. Since the researcher did not collect the data, he has no control over what is contained in the data set. Often times this can limit the analysis or alter the original questions the researcher sought to answer.

A related problem is that the variables may have been defined or categorized differently than the researcher would have chosen. For example, age may have been collected in categories rather than as a continuous variable, or race may be defined as “White” and “Other” instead of containing categories for every major race.

Another significant disadvantage of using secondary data is that the researcher doesn't know exactly how the data collection process was done and how well it was carried out. The researcher is not usually privy to information about how seriously the data is affected by problems such as low response rate or respondent misunderstanding of specific survey questions. Sometimes this information is readily available, as is the case with many federal data sets. However, many other secondary data sets are not accompanied by this type of information and the analyst must learn to read between the lines and consider what problems might have colored the data collection process.