Permutation Test

Tags:

Permutation test is a way of getting p value using randomization without assuming a certain distribution of data.

The basic idea is simple. Suppose that we want to see if y = ax + b + error holds where x is 0 or 1. In other words, we’re interested if mean of y differs depending on x. In this case, null model is y = b + error, i.e., mean of y does not depend on x. Our test statistics is mean(y) for each of x.

Then the difference of mean in our alternative model is this:

> mean(y[x==1] - y[x==0])  // (1)

Now we want to know how significant this is. To do that, assume that null model is true (just in the same way we compute p value). If null model is true, then the current data shouldn’t be different even when x is shuffled while y is kept as it is.

> x = sample(x); mean(y[x==1] - y[x==0]) // (2)

If we repeat this lots of times, then we get a list of means we can observe purely by chance assuming that the null model is true. Then, “the number of means in the list whose absolute value(we need absolute value as this is two sided test) is larger than mean difference in alternative model” divided by “the length of list” is p value.

Knowing that mean() treats TRUE as 1 and FALSE as 0, and assuming that we got a list r containing the repeated output of (2), this is p value:

> mean(abs(r) > mean difference we computed at (1))/length(r)

For details, see:
Permutation tests: http://faculty.washington.edu/kenrice/sisg/SISG-08-06.pdf
Permutation test implementations in R:
http://stats.stackexchange.com/questions/6127/which-permutation-test-implementation-in-r-to-use-instead-of-t-tests-paired-and