Paired Data in Statistics

Measuring Two Variables Simultaneously in Individuals of a Given Population

Scatterplot with least squares regression line
A scatterplot and least squares regression line. C.K.Taylor

Paired data in statistics, often referred to as ordered pairs, refers to two variables in the individuals of a population that are linked together in order to determine the correlation between them. In order for a data set to be considered paired data, both of these data values must be attached or linked to one another and not considered separately.

The idea of paired data is contrasted with the usual association of one number to each data point as in other quantitative data sets in that each individual data point is associated with two numbers, providing a graph that allows statisticians to observe the relationship between these variables in a population.

This method of paired data is used when a study hopes to compare two variables in individuals of the population to draw some sort of conclusion about the observed correlation. When observing these data points, the order of the pairing is important because the first number is a measure of one thing while the second is a measure of something entirely different.

An Example of Paired Data

To see an example of paired data, suppose a teacher counts the number of homework assignments each student turned in for a particular unit and then pairs this number with each student’s percentage on the unit test. The pairs are as follows:

  • An individual who completed 10 assignments earned a 95% on his or her test. (10, 95%)
  • An individual who completed 5 assignments earned an 80% on his or her test. (5, 80%)
  • An individual who completed 9 assignments earned an 85% on his or her test. (9, 85%)
  • An individual who completed 2 assignments earned a 50% on his or her test. (2, 50%)
  • An individual who completed 5 assignments earned a 60% on his or her test. (5, 60%)
  • An individual who completed 3 assignments earned a 70% on his or her test. (3, 70%)

In each of these sets of paired data, we can see that the number of assignments always comes first in the ordered pair while the percentage earned on the test comes second, as seen in the first instance of (10, 95%).

While a statistical analysis of this data could also be used to calculate the average number of homework assignments completed or the average test score, there may be other questions to ask about the data. In this instance, the teacher wants to know if there is any connection between the number of homework assignments turned in and performance on the test, and the teacher would need to keep the data paired in order to answer this question.

Analyzing Paired Data

The statistical techniques of correlation and regression are used to analyzed paired data wherein the correlation coefficient quantifies how closely the data lie along a straight line and measures the strength of the linear relationship.

Regression, on the other hand, is used for several applications including determining which line fits best for our set of data. This line can then, in turn, be used to estimate or predict y values for values of x that were not part of our original data set.

There is a special type of graph that is especially well suited for paired data called a scatterplot. In this type of graph, one coordinate axis represents one quantity of the paired data while the other coordinate axis represents the other quantity of the paired data.

A scatterplot for the above data would have the x-axis denote the number of assignments turned in while the y-axis would denote the scores on the unit test.