The Differences Between Explanatory and Response Variables

Male college student studying
The number of hours a student studied is an explanatory variable and the score they receive on the test is a response variable. Hero Images / Getty Images

One of the many ways that variables in statistics can be classified is to consider the differences between explanatory and response variables. Although these variables are related, there are important distinctions between them. After defining these types of variables, we will see that the correct identification of these variables has a direct influence on other aspects of statistics, such as the construction of a scatterplot and the slope of a regression line.

Definitions of Explanatory and Response

We begin by looking at the definitions of these types of variables. A response variable is the particular quantity that we ask a question about in our study. An explanatory variable is any factor that can influence the response variable. While there can be many explanatory variables, we will primarily concern ourselves with a single explanatory variable.

A response variable may not be present in a study. The naming of this type of variable depends upon the questions that are being asked by a researcher. The conducting of an observational study would be an example of an instance when there is not a response variable. An experiment will have a response variable. The careful design of an experiment tries to establish that the changes in a response variable are directly caused by changes in the explanatory variables.

Example One

To explore these concepts we will examine a few examples.

For the first example, suppose that a researcher is interested in studying the mood and attitudes of a group of first-year college students. All first-year students are given a series of questions. These questions are designed to assess the degree of homesickness of a student. Students also indicate on the survey how far their college is from home.

One researcher who examines this data may just be interested in the types of student responses.  Perhaps the reason for this is to have an overall sense about the composition of a new freshman. In this case, there is not a response variable. This is because no one is seeing if the value of one variable influences the value of another.

Another researcher could use the same data to attempt to answer if students who came from further away had a greater degree of homesickness. In this case, the data pertaining to the homesickness questions are the values of a response variable, and the data that indicates the distance from home forms the explanatory variable.

Example Two

For the second example we might be curious if number of hours spent doing homework has an effect on the grade a student earns on an exam. In this case, because we are showing that the value of one variable changes the value of another, there is an explanatory and a response variable. The number of hours studied is the explanatory variable and the score on the test is the response variable.

Scatterplots and Variables

When we are working with paired quantitative data, it is appropriate to use a scatterplot. The purpose of this kind of graph is to demonstrate relationships and trends within the paired data.

We do not need to have both an explanatory and response variable. If this is the case, then either variable can plotted along either axis. However, in the event that there is a response and explanatory variable, then the explanatory variable is always plotted along the x or horizontal axis of a Cartesian coordinate system. The response variable is then plotted along the y axis.

Independent and Dependent

The distinction between explanatory and response variables is similar to another classification. Sometimes we refer to variables as being independent or dependent. The value of a dependent variable relies upon that of an independent variable. Thus a response variable corresponds to a dependent variable while an explanatory variable corresponds to an independent variable. This terminology is typically not used in statistics because the explanatory variable is not truly independent.

Instead the variable only takes on the values that are observed. We may have no control over the values of an explanatory variable.