# Principal Components and Factor Analysis

Principal components analysis (PCA) and factor analysis (FA) are statistical techniques used for data reduction or structure detection. These two methods are applied to a single set of variables when the researcher is interested in discovering which variables in the set form coherent subsets that are relatively independent of one another. Variables that are correlated with one another but are largely independent of other sets of variables are combined into factors. These factors allow you to condense the number of variables in your analysis by combining several variables into one factor.

The specific goals of PCA or FA are to summarize patterns of correlations among observed variables, to reduce a large number of observed variables to a smaller number of factors, to provide a regression equation for an underlying process by using observed variables, or to test a theory about the nature of underlying processes.

## Example

Say, for example, a researcher is interested in studying the characteristics of graduate students. The researcher surveys a large sample of graduate students on personality characteristics such as motivation, intellectual ability, scholastic history, family history, health, physical characteristics, etc. Each of these areas is measured with several variables. The variables are then entered into the analysis individually and correlations among them are studied. The analysis reveals patterns of correlation among the variables that are thought to reflect the underlying processes affecting the behaviors of the graduate students. For example, several variables from the intellectual ability measures combine with some variables from the scholastic history measures to form a factor measuring intelligence. Similarly, variables from the personality measures may combine with some variables from the motivation and scholastic history measures to form a factor measuring the degree to which a student prefers to work independently – an independence factor.

## Steps of Principal Components Analysis and Factor Analysis

Steps in principal components analysis and factor analysis include:

• Select and measure a set of variables.
• Prepare the correlation matrix to perform either PCA or FA.
• Extract a set of factors from the correlation matrix.
• Determine the number of factors.
• If necessary, rotate the factors to increase interpretability.
• Interpret the results.
• Verify the factor structure by establishing the construct validity of the factors.

## Difference Between Principal Components Analysis and Factor Analysis

Principal Components Analysis and Factor Analysis are similar because both procedures are used to simplify the structure of a set of variables. However, the analyses differ in several important ways:

• In PCA, the components are calculated as linear combinations of the original variables. In FA, the original variables are defined as linear combinations of the factors.
• In PCA, the goal is to account for as much of the total variance in the variables as possible. The objective in FA is to explain the covariances or correlations among the variables.
• PCA is used to reduce the data into a smaller number of components. FA is used to understand what constructs underlie the data.

## Problems with Principal Components Analysis and Factor Analysis

One problem with PCA and FA is that there is no criterion variable against which to test the solution. In other statistical techniques such as discriminant function analysis, logistic regression, profile analysis, and multivariate analysis of variance, the solution is judged by how well it predicts group membership. In PCA and FA, there is no external criterion such as group membership against which to test the solution.

The second problem of PCA and FA is that, after extraction, there is an infinite number of rotations available, all accounting for the same amount of variance in the original data, but with the factor defined slightly different. The final choice is left to the researcher based on their assessment of its interpretability and scientific utility. Researchers often differ in opinion on which choice is the best.

A third problem is that FA is frequently used to “save” poorly conceived research. If no other statistical procedure is appropriate or applicable, the data can at least be factor analyzed. This leaves many to believe that the various forms of FA are associated with sloppy research.