One question that it is always important to ask in statistics is, “Is the observed result due to chance alone, or is it statistically significant?” One class of hypothesis tests, called permutation tests, allow us to test this question. The overview and steps of such a test are:

- We split our subjects into a control and an experimental group. The null hypothesis is that there is no difference between these two groups.
- Apply a treatment to the experimental group.
- Measure the response to the treatment
- Consider every possible configuration of the experimental group and the observed response.
- Calculate a p-value based upon our observed response relative to all of the potential experimental groups.

This is an outline of a permutation. To flesh of this outline, we will spend time looking at a worked out example of such a permutation test in great detail.

## Example

Suppose we are studying mice. In particular, we are interested in how quickly the mice finish a maze that they have never encountered before. We wish to provide evidence in favor of an experimental treatment. The goal is to demonstrate that mice in the treatment group will solve the maze more quickly than untreated mice.

We begin with our subjects: six mice. For convenience, the mice will be referred to by the letters A, B, C, D, E, F. Three of these mice are to be randomly selected for the experimental treatment, and the other three are put into a control group in which the subjects receive a placebo.

We will next randomly choose the order in which the mice are selected to run the maze. The time spent finishing the maze for all of the mice will be noted, and a mean of each group will be computed.

Suppose that our random selection has mice A, C, and E in the experimental group, with the other mice in the placebo control group. After the treatment has been implemented, we randomly choose the order for the mice to run through the maze.

The run times for each of the mice are:

- Mouse A runs the race in 10 seconds
- Mouse B runs the race in 12 seconds
- Mouse C runs the race in 9 seconds
- Mouse D runs the race in 11 seconds
- Mouse E runs the race in 11 seconds
- Mouse F runs the race in 13 seconds.

The average time to complete the maze for the mice in the experimental group is 10 seconds. The average time to complete the maze for those in the control group is 12 seconds.

We could ask a couple of questions. Is the treatment really the reason for the faster average time? Or were we just lucky in our selection of control and experimental group? The treatment may have had no effect and we randomly chose the slower mice to receive the placebo and faster mice to receive the treatment. A permutation test will help to answer these questions.

## Hypotheses

The hypotheses for our permutation test are:

- The null hypothesis is the statement of no effect. For this specific test, we have H
_{0}: There is no difference between treatment groups. The mean time to run the maze for all mice with no treatment is the same as the mean time for all mice with the treatment. - The alternative hypothesis is what we are trying to establish evidence in favor of. In this case, we would have H
_{a}: The mean time for all mice with the treatment will be faster than the mean time for all mice without the treatment.

## Permutations

There are six mice, and there are three places in the experimental group. This means that the number of possible experimental groups are given by the number of combinations C(6,3) = 6!/(3!3!) = 20. The remaining individuals would be part of the control group. So there are 20 different ways to randomly choose individuals into our two groups.

The assignment of A, C, and E to the experimental group was done randomly. Since there are 20 such configurations, the specific one with A, C, and E in the experimental group has a probability of 1/20 = 5% of occurring.

We need to determine all 20 configurations of the experimental group of the individuals in our study.

- Experimental group: A B C and Control group: D E F
- Experimental group: A B D and Control group: C E F
- Experimental group: A B E and Control group: C D F
- Experimental group: A B F and Control group: C D E
- Experimental group: A C D and Control group: B E F
- Experimental group: A C E and Control group: B D F
- Experimental group: A C F and Control group: B D E
- Experimental group: A D E and Control group: B C F
- Experimental group: A D F and Control group: B C E
- Experimental group: A E F and Control group: B C D
- Experimental group: B C D and Control group: A E F
- Experimental group: B C E and Control group: A D F
- Experimental group: B C F and Control group: A D E
- Experimental group: B D E and Control group: A C F
- Experimental group: B D F and Control group: A C E
- Experimental group: B E F and Control group: A C D
- Experimental group: C D E and Control group: A B F
- Experimental group: C D F and Control group: A B E
- Experimental group: C E F and Control group: A B D
- Experimental group: D E F and Control group: A B C

We then look at each configuration of experimental and control groups. We calculate the mean for each of the 20 permutations in the listing above. For example, for the first, A, B and C have times of 10, 12 and 9, respectively. The mean of these three numbers is 10.3333. Also in this first permutation, D, E and F have times of 11, 11 and 13, respectively. This has an average of 11.6666.

After calculating the mean of each group, we calculate the difference between these means. Each of the following corresponds to the difference between the experimental and control groups that were listed above.

- Placebo - Treatment = 1.333333333 seconds
- Placebo - Treatment = 0 seconds
- Placebo - Treatment = 0 seconds
- Placebo - Treatment = -1.333333333 seconds
- Placebo - Treatment = 2 seconds
- Placebo - Treatment = 2 seconds
- Placebo - Treatment = 0.666666667 seconds
- Placebo - Treatment = 0.666666667 seconds
- Placebo - Treatment = -0.666666667 seconds
- Placebo - Treatment = -0.666666667 seconds
- Placebo - Treatment = 0.666666667 seconds
- Placebo - Treatment = 0.666666667 seconds
- Placebo - Treatment = -0.666666667 seconds
- Placebo - Treatment = -0.666666667 seconds
- Placebo - Treatment = -2 seconds
- Placebo - Treatment = -2 seconds
- Placebo - Treatment = 1.333333333 seconds
- Placebo - Treatment = 0 seconds
- Placebo - Treatment = 0 seconds
- Placebo - Treatment = -1.333333333 seconds

## P-Value

Now we rank the differences between the means from each group that we noted above. We also tabulate the percentage of our 20 different configurations that are represented by each difference in means. For example, four of the 20 had no difference between the means of the control and treatment groups. This accounts for 20% of the 20 configurations noted above.

- -2 for 10%
- -1.33 for 10 %
- -0.667 for 20%
- 0 for 20 %
- 0.667 for 20%
- 1.33 for 10%
- 2 for 10%.

Here we compare this listing to our observed result. Our random selection of mice for the treatment and control groups resulted in an average difference of 2 seconds. We also see that this difference corresponds to 10% of all possible samples. The result is that for this study we have a p-value of 10%.