Science, Tech, Math › Social Sciences Pros and Cons of Secondary Data Analysis A Review of the Advantages and Disadvantages in Social Science Research Share Flipboard Email Print Laurence Dutton / Getty Images Social Sciences Sociology Research, Samples, and Statistics Key Concepts Major Sociologists Deviance & Crime News & Issues Recommended Reading Psychology Archaeology Economics Environment Ergonomics Maritime By Ashley Crossman Updated June 13, 2019 Secondary data analysis is the analysis of data that was collected by someone else. Below, we’ll review the definition of secondary data, how it can be used by researchers, and the pros and cons of this type of research. Key Takeaways: Secondary Data Analysis Primary data refers to data that researchers have collected themselves, while secondary data refers to data that was collected by someone else.Secondary data is available from a variety of sources, such as governments and research institutions.While using secondary data can be more economical, existing data sets may not answer all of a researcher’s questions. Comparison of Primary and Secondary Data In social science research, the terms primary data and secondary data are common parlance. Primary data is collected by a researcher or team of researchers for the specific purpose or analysis under consideration. Here, a research team conceives of and develops a research project, decides on a sampling technique, collects data designed to address specific questions, and performs their own analyses of the data they collected. In this case, the people involved in the data analysis are familiar with the research design and data collection process. Secondary data analysis, on the other hand, is the use of data that was collected by someone else for some other purpose. In this case, the researcher poses questions that are addressed through the analysis of a data set that they were not involved in collecting. The data was not collected to answer the researcher’s specific research questions and was instead collected for another purpose. This means that the same data set can actually be a primary data set to one researcher and a secondary data set to a different one. Using Secondary Data There are some important things that must be done before using secondary data in an analysis. Since the researcher did not collect the data, it's important for them to become familiar with the data set: how the data was collected, what the response categories are for each question, whether or not weights need to be applied during the analysis, whether or not clusters or stratification need to be accounted for, who the population of study was, and more. A great deal of secondary data resources and data sets are available for sociological research, many of which are public and easily accessible. The United States Census, the General Social Survey, and the American Community Survey are some of the most commonly used secondary data sets available. Advantages of Secondary Data Analysis The biggest advantage of using secondary data is that it can be more economical. Someone else has already collected the data, so the researcher does not have to devote money, time, energy and resources to this phase of research. Sometimes the secondary data set must be purchased, but the cost is almost always lower than the expense of collecting a similar data set from scratch, which usually entails salaries, travel and transportation, office space, equipment, and other overhead costs. In addition, since the data is already collected and usually cleaned and stored in electronic format, the researcher can spend most of their time analyzing the data instead of getting the data ready for analysis. A second major advantage of using secondary data is the breadth of data available. The federal government conducts numerous studies on a large, national scale that individual researchers would have a difficult time collecting. Many of these data sets are also longitudinal, meaning that the same data has been collected from the same population over several different time periods. This allows researchers to look at trends and changes of phenomena over time. A third important advantage of using secondary data is that the data collection process often maintains a level of expertise and professionalism that may not be present with individual researchers or small research projects. For example, data collection for many federal data sets is often performed by staff members who specialize in certain tasks and have many years of experience in that particular area and with that particular survey. Many smaller research projects do not have that level of expertise, as a lot of data is collected by students working part-time. Disadvantages of Secondary Data Analysis A major disadvantage of using secondary data is that it may not answer the researcher’s specific research questions or contain specific information that the researcher would like to have. It also may not have been collected in the geographic region or during the years desired, or with the specific population that the researcher is interested in studying. For example, a researcher who is interested in studying adolescents may find that the secondary data set only includes young adults. Additionally, since the researcher did not collect the data, they have no control over what is contained in the data set. Often times this can limit the analysis or alter the original questions the researcher sought to answer. For example, a researcher who is studying happiness and optimism might find that a secondary data set only includes one of these variables, but not both. A related problem is that the variables may have been defined or categorized differently than the researcher would have chosen. For example, age may have been collected in categories rather than as a continuous variable, or race may be defined as “white” and “other” instead of containing categories for every major race. Another significant disadvantage of using secondary data is that the researcher doesn't know exactly how the data collection process was done or how well it was carried out. The researcher is not usually privy to information about how seriously the data is affected by problems such as low response rate or respondent misunderstanding of specific survey questions. Sometimes this information is readily available, as is the case with many federal data sets. However, many other secondary data sets are not accompanied by this type of information and the analyst must learn to read between the lines in order to uncover any potential limitations of the data.