The Meaning of Reliability in Sociology

The four procedures for assessing reliability

Mother taking daughter's temperature
Paul Bradbury / Getty Images

Reliability is the degree to which a measurement instrument gives the same results each time that it is used, assuming that the underlying thing being measured does not change. For example, if the temperature in a room stays the same, a reliable thermometer will always give the same reading. A thermometer that lacks reliability would change even when the temperature does not. Note, however, that the thermometer does not have to be accurate in order to be reliable. It might always register three degrees too high, for example. Its degree of reliability has to do instead with the predictability of its relationship with whatever is being tested.

Methods to Assess Reliability

In order to assess reliability, the thing being measured must be measured more than once. For example, if you wanted to measure the length of a sofa to make sure it would fit through a door, you might measure it twice. If you get an identical measurement twice, you can be confident you measured reliably.

There are four procedures for assessing reliability. The term "test" refers to a group of statements on a questionnaire, an observer's quantitative or qualitative evaluation, or a combination of the two.

1 - The Test-Retest Procedure

Here, the same test is given two or more times. For example, you might create a questionnaire with a set of ten statements to assess confidence. These ten statements are then given to a subject twice at two different times. If the respondent gives similar answers both times, you can assume the questions assessed the subject's answers reliably. On the plus side, only one test needs to be developed for this procedure. However, there are a few downsides: Events might occur between testing times that affect the respondents' answers and thus change their responses; answers might change over time simply because people change and grow over time; and the subject might adjust to the test the second time around, think more deeply about the questions and reevaluate the answers.

2 - The Alternative Forms Procedure

In this case, two tests are given two or more times. For example, you might create two sets of five statements for two different questionnaires measuring confidence. If the person gives similar answers for both tests each time, you can assume you measured the concept reliably. One advantage is that cueing will be less of a factor because the two tests are different. However, it's also possible the respondent will grow and mature between the timing of the two tests and that will account for differences in answers.

3 - The Split-Halves Procedure

In this procedure, a single test is given once. A grade is assigned to each half separately and grades are compared from each half. For example, you might have one set of ten statements on a questionnaire to assess confidence. Respondents take the test and the questions are then split into two sub-tests of five items each. If the score on the first half mirrors the score on the second half, you can presume that the test measured the concept reliably. On the plus side, history, maturation and cueing aren't at play. However, scores can vary greatly depending on the way in which the test is divided into halves.

4 - The Internal Consistency Procedure

Here, the same test is administered once, and the score is based upon average similarity of responses. For example, in a ten-statement questionnaire to measure confidence, each response comprises sub-test. The similarity in responses to each of the ten statements is used to assess reliability. If the respondent doesn't answer all ten statements in a similar way, then one can assume that the test is not reliable. Again, history, maturation and cueing aren't a consideration with this method. However, the number of statements in the test can affect the assessment of reliability when assessing it internally.