The Meaning of Reliability in Sociology

Four Procedures for Assessing Reliability

Mother taking daughter's temperature
Paul Bradbury / Getty Images

Reliability is the degree to which a measurement instrument gives the same results each time that it is used, assuming that the underlying thing being measured does not change.

Key Takeaways: Reliability

  • If a measurement instrument provides similar results each time it is used (assuming that whatever is being measured stays the same over time), it is said to have high reliability.
  • Good measurement instruments should have both high reliability and high accuracy.
  • Four methods sociologists can use to assess reliability are the test-retest procedure, the alternate forms procedure, the split-halves procedure, and the internal consistency procedure.

An Example

Imagine that you’re trying to assess the reliability of a thermometer in your home. If the temperature in a room stays the same, a reliable thermometer will always give the same reading. A thermometer that lacks reliability would change even when the temperature does not. Note, however, that the thermometer does not have to be accurate in order to be reliable. It might always register three degrees too high, for example. Its degree of reliability has to do instead with the predictability of its relationship with whatever is being tested.

Methods to Assess Reliability

In order to assess reliability, the thing being measured must be measured more than once. For example, if you wanted to measure the length of a sofa to make sure it would fit through a door, you might measure it twice. If you get an identical measurement twice, you can be confident you measured reliably.

There are four procedures for assessing the reliability of a test. (Here, the term "test" refers to a group of statements on a questionnaire, an observer's quantitative or qualitative evaluation, or a combination of the two.)

The Test-Retest Procedure

Here, the same test is given two or more times. For example, you might create a questionnaire with a set of ten statements to assess confidence. These ten statements are then given to a subject twice at two different times. If the respondent gives similar answers both times, you can assume the questions assessed the subject's answers reliably.

One advantage of this method is that only one test needs to be developed for this procedure. However, there are a few downsides of the test-retest procedure. Events might occur between testing times that affect the respondents' answers; answers might change over time simply because people change and grow over time; and the subject might adjust to the test the second time around, think more deeply about the questions, and reevaluate their answers. For instance, in the example above, some respondents might have become more confident between the first and second testing session, which would make it more difficult to interpret the results of the test-retest procedure.

The Alternate Forms Procedure

In the alternate forms procedure (also called parallel forms reliability), two tests are given. For example, you might create two sets of five statements measuring confidence. Subjects would be asked to take each of the five-statement questionnaires. If the person gives similar answers for both tests, you can assume you measured the concept reliably. One advantage is that cueing will be less of a factor because the two tests are different. However, it's important to ensure that both alternate versions of the test are indeed measuring the same thing.

The Split-Halves Procedure

In this procedure, a single test is given once. A grade is assigned to each half separately and grades are compared from each half. For example, you might have one set of ten statements on a questionnaire to assess confidence. Respondents take the test and the questions are then split into two sub-tests of five items each. If the score on the first half mirrors the score on the second half, you can presume that the test measured the concept reliably. On the plus side, history, maturation, and cueing aren't at play. However, scores can vary greatly depending on the way in which the test is divided into halves.

The Internal Consistency Procedure

Here, the same test is administered once, and the score is based upon average similarity of responses. For example, in a ten-statement questionnaire to measure confidence, each response can be seen as a one-statement sub-test. The similarity in responses to each of the ten statements is used to assess reliability. If the respondent doesn't answer all ten statements in a similar way, then one can assume that the test is not reliable. One way that researchers can assess internal consistency is by using statistical software to calculate Cronbach’s alpha.

With the internal consistency procedure, history, maturation, and cueing aren't a consideration. However, the number of statements in the test can affect the assessment of reliability when assessing it internally.