Run test for testing randomness

One sample runs test examines if observations are random. Here, run menas the number of consecutive observations of one class from two categories.

For example,
MMMFFF: Observed three males then, three females. # of run = 2.
MFMF: Observed, man, wonman, man, then woman. # of run = 4.

This idea can be used for, say, checking time series randomness (or if it has periodicity).

As usual, H_0: Sample is random. If the sample is random, the number of run shouldn’t be too big or to small. The distribution of the number of runs can be computed using combinations, but it can be approximated by normal distribution if the size of data is large (>20).

Package tseries has runs.test() for this:

> runs.test(factor(sign(rnorm(100))))

	Runs Test

data:  factor(sign(rnorm(100))) 
Standard Normal = 0.8041, p-value = 0.4214
alternative hypothesis: two.sided 

The sign(rnorm(100)) should be random. As p-value > 0.05, we can not reject H_0, meaning that it is random.

But in case of -1, 1, -1, 1, -1, 1, …:

> runs.test(factor(rep(c(-1, 1), 50)))

	Runs Test

data:  factor(rep(c(-1, 1), 50)) 
Standard Normal = 9.8499, p-value < 2.2e-16
alternative hypothesis: two.sided

It’s not random.

Wald-Wolfowitz two-sample run test examine whether two samples came from the same distribution like we did using rank sum test. The basic idea, is to list samples from two set, order them, then check if it’s random.

So, given:

> x <- c(1, 3, 2, 3, 5, 6)
> y <- c(2, 3, 4, 2, 3, 7)

Combine and sort:

> z <- rbind(c(x, y), c(rep(0, 6), rep(1, 6)))
> z
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,]    1    3    2    3    5    6    2    3    4     2     3     7
[2,]    0    0    0    0    0    0    1    1    1     1     1     1
> z[2, order(z[1,])]
 [1] 0 0 1 1 0 0 1 1 1 0 0 1
> factor(z[2, order(z[1,])], labels=c("x", "y"))
 [1] x x y y x x y y y x x y
Levels: x y

Now we have sorted values and got factors. Feeding it to runs.test:

> runs.test(factor(z[2, order(z[1,])], labels=c("x", "y")))

	Runs Test

data:  factor(z[2, order(z[1, ])], labels = c("x", "y")) 
Standard Normal = -0.6055, p-value = 0.5448
alternative hypothesis: two.sided 

As p-value > 0.05, it’s random. Thus, we conclude that x and y have the same distribution.

References)
1. 배도선 외, 통계학 이론과 응용, 청문각.
2. 권세혁, 한남대학교 통계학과 강의노트
3. tseries package reference.
4. Lalmohan Bhar, Non parametric Tests.

Similar Posts:

Post a Comment

Your email is never published nor shared. Required fields are marked *