Given a sequence of data, one question that we may wonder is if the sequence occurred by chance phenomena, or if the data is not random. Randomness is hard to identify, as it is very difficult to simply look at data and determine whether or not it was produced by chance alone. One method that can be used to help determine if a sequence truly occurred by chance is called the runs test.

The runs test is a test of significance or hypothesis test. The procedure for this test is based upon a run, or a sequence, of data that have a particular trait. To understand how the runs test works, we must first examine the concept of a run.

## Sequences of Data

We will begin by looking at an example of runs. Consider the following sequence of random digits:

6 2 7 0 0 1 7 3 0 5 0 8 4 6 8 7 0 6 5 5

One way to classify these digits is to split them into two categories, either even (including the digits 0, 2, 4, 6 and 8) or odd (including the digits 1, 3, 5, 7 and 9). We will look at the sequence of random digits and denote the even numbers as E and odd numbers as O:

E E O E E O O E O E E E E E O E E O O

The runs are easier to see if we rewrite this so that all of the Os are together and all of the Es are together:

EE O EE OO E O EEEEE O EE OO

We count the number of blocks of even or odd numbers and see that there are a total of ten runs for the data. Four runs have length one, five have length two and one has length five

## Conditions

With any test of significance, it is important to know what conditions are necessary to conduct the test. For the runs test, we will be able to classify each data value from the sample into one of two categories. We will count the total number of runs relative to the number of the number of data values that fall into each category.

The test will be a two-sided test. The reason for this is that too few runs mean that there is likely not enough variation and the number of runs that would occur from a random process. Too many runs will result when a process alternates between the categories too frequently to be described by chance.

## Hypotheses and P-Values

Every test of significance has a null and an alternative hypothesis. For the runs test, the null hypothesis is that the sequence is a random sequence. The alternative hypothesis is that the sequence of sample data is not random.

Statistical software can calculate the p-value that corresponds to a particular test statistic. There are also tables that give critical numbers at a certain level of significance for the total number of runs.

## Runs Test Example

We will work through the following example to see how the runs test works. Suppose that for an assignment a student is asked to flip a coin 16 times and note the order of heads and tails that showed up. If we end up with this data set:

H T H H H T T H T T H T H T H H

We may ask if the student actually did his homework, or did he cheat and write down a series of H and T that look random? The runs test can help us. The assumptions are met for the runs test as the data can be classified into two groups, as either a head or a tail. We keep going by counting the number of runs. Regrouping, we see the following:

H T HHH TT H TT H T H T HH

There are ten runs for our data with seven tails are nine heads.

The null hypothesis is that the data is random. The alternative is that it is not random. For a level of significance of alpha equal to 0.05, we see by consulting the proper table that we reject the null hypothesis when the number of runs is either less than 4 or greater than 16. Since there are ten runs in our data, we fail to reject the null hypothesis H_{0}.

## Normal Approximation

The runs test is a useful tool to determine if a sequence is likely to be random or not. For a large data set, it is sometimes possible to use a normal approximation. This normal approximation requires us to use the number of elements in each category and then calculating the mean and standard deviation of the appropriate normal distribution.